Parallel and Distributed Computing (MEI)

Logotipo e link para a Universidade do Minho

Parallel & Distributed Computing
MSc Informatics Eng
2011/2012
A. Proença

Vista da Rua do Souto no Séc. XVI

Lecture contents
at
Computing Systems & Performance

Week: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12

Last modified: 03 Dez 2011

departamento de informática

Announcements:

Work assignment: the details of the work to be performed are already available, see the homework of week 8 below. (25-Oct-11)

Top...

Week 1

04-Oct-11 (09h-13h) 4h

09h-11h
Lecture: Open presentation and discussion on key issues related to the overall course unit: motivation, goals, module organization, student expectations, previous years' students experiences.

11h-13h
Lecture: Overall presentation of the module CSP: goals, relationship with technological advances, contents, working methods.

Top...

Week 2

11-Oct-11 (9h-13h) 8h

09h-11h
Lecture: An overview of basic concepts required from their previous learning at the BSc level, with reading suggestions (from 2 books, mentioned here; an e-version of the book Computer Organization and Design, COD, is available (mentioned during the lecture), and the beta version of the CSAPP book is available at the webpage of the BSc course, also here). These concepts included numerical data representation, the ISA level, how a compiler converts C code into assembly/machine code, how performance was improved at the CPU level.

11h-13h
Lecture: An overview of the architecture evolution and the management software of the cluster SeARCH, including a visit to the physical machine (kindly presented by the PhD student Vitor Oliveira; slides here)

Homework / Reading assignment: Revision of the basic concepts mentioned in this lecture.

Top...

Week 3

18-Oct-11 (9h-11h) 10h

Lecture (with debate): A deeper overview of basic computer architecture concepts: performance of a CPU, improving CPU latency through ILP (namely pipeline issues and superscalarity), key issues on the organization of cache memory.

Reading suggestions: Lecture slides, COD book chapters 4 and 5 (5.1, 5.2, 5.3, 5.8, 5.10).

Homework: (i) to identify all AMD and Intel processors’ microarchitecture from Hammer and Core till the latest releases, and build a table with: year, max clock frequency, # pipeline stages, degree of superscalarity, # simultaneous threads, vector support , # cores, type/bandwidth of external interfaces; (ii) to identify the CPU generations at the SeARCH cluster. See the suggestions on the last lecture slide.

Top...

Week 4

25-Oct-11 (9h-13h) 14h

Lecture (with debate): A detailed example of the ILP implementation in the Intel P6 microarchitecture, as a predecessor of current Intel x86 microarchitectures: visualization of the loop iteration in a vector processing code, with a comparative analysis of the estimated and measured performance (in CPE).
Discussion on the results of previous homework with suggestions to improve the integration of the separate homeworks from the students.
Memory hierarchy: a global view on performance impact of slower memory devices versus computing units, and a revision of basic concepts on cache organization and structure.

Reading suggestions: Lecture slides on ILP and cache . To get all details on the optimization techniques and associated visualization of code execution we recommend Ch. 5 of the Bryant book (Optimizing Program Performance), namely on ILP. For cache memories we continue recommending COD book sections on chapter 5 as previously mentioned, and we also include these slides on cache memory from other lecturer.

Homework: (i) For the same AMD and Intel processors’ microarchitecture mentioned before, complete the table with data on memory hierarchy on-chip, namely: # of cache levels, and for each, size, structure (block size/placement, replacement policy, write policy), bandwidth to access lower levels; (ii) to complete the CPU generations at the SeARCH cluster. See the suggestions on supplied slides.

Top...

Week 5

01-Nov-11 (9h-13h) 14h

Public Holiday

Top...

Week 6

08-Nov-11 (9h-11h) 16h

Lecture: Memory hierarchy: Key issues on the design of cache memories; estimating impact on multi-level cache performance on hit time, miss rate and miss penalty, due to the key factors cache/block size, block placement/replacement, write techniques, latency/bandwidth; basic and advanced optimization techniques.

Reading suggestions: Lecture slides on memory hierarchy. The lecture material was based on the recently arrived textbook 5th edition of Computer Architecture: A Quantitative Approach, which is already available at the main university library, and we strongly recommend to read Annex B (a review on memory hierarchy) and Ch. 2 on Memory Hierarchy Design.

Reading assignment: The list of papers that is referenced here (please start reading the ReadMeFirst file) is for mandatory reading, under the terms that will be presented soon (is part of the module assessment). As the outcome, each student will have to prepare a short essay on the topic that will be assigned to him/her (4 pages), plus a brief presentation (8/10min) that each student should give on 06-Dez-11.

Top...

Week 7

15-Nov-11 (9h-11h) 18h

Lecture: Beyond Instruction-Level Parallelism (ILP): review of key challenging issues on improving CPI with pipelines; exploiting ILP with multiple-issue techniques; exploiting Thread-Level Parallelism (TLP) to improve uniprocessor performance; consequences of TLP on cache organization, namely on cache-coherence protocols (snooping and directory-based) and cache consistency.

Reading suggestions: Lecture slides on ILP-MThread. The lecture material was based on the recently arrived textbook 5th edition of Computer Architecture: A Quantitative Approach, and the relevant sections are on the last lecture slide. We also recommend this Tutorial on memory consistency.

Top...

Week 8

22-Nov-11 (9h-11h) 20h

Lecture: Data parallelism: vector architectures, SIMD extensions (MMX, SSE and similar) and introduction to graphical processing units (GPU).

Reading suggestions: Lecture slides on DataParallelism1. The lecture material was based on Ch.4 of the main textbook (see past week).

Homework: The details of the work to be performed and individually defended on the 6th December are here.

Top...

Week 9

22-Nov-11 (9h-11h & 15h-19h) 26h

Lecture: More on data parallelism, beyond vector and SIMD-extended architectures: a quick tour into the evolution of these architectures: the Keystone DSP chipp form TI, the Cell BE, the project Denver from NVidia, the future Intel/AMD hybrid cores, the FPGA. The GPU as a computing device and the CUDA as a programming model. Terminology, thread and memory models, ths NVidia family of CUDA-enabled devices. Innovations on the Fermi devices.

Webcast (from TACC, at Univ Texas in Austin): CUDA programming, part 1. Instructor: João Barbosa.

Reading suggestions: Lecture slides on DataParallelism2. The lecture material was based on Ch.4 of the main textbook (as past week) and also on the NVidia book recommended as textbook. We also recommend the Wen-mei course on CUDA at Univ. Illinois Urbain-Champaign, both slides and chapters 1 to 5 (these were drafts to the recommended textbook); you may find them at http://courses.engr.illinois.edu/ece498/al/Syllabus.html.

Top...

Week 10

30-Nov-11 (9h-13h) 36h

Aula teórica: Topologias de interligação de sistemas computacionais: evolução dos sistemas SMP e MPP (interligação de nós computacionais) para os requisitos actuais de interligação de cores em componentes multi-core. Análise da evolução do protótipo de projecto da Intel de 2006 para componentes many-core (Larabee) para o novo protótipo em silício criado em Nov-09 (SCC, Single-chip Cloud Computer).

Aula teórica: O problema da conectividade entre o CPU e a memória, e entre os diversos núcleos computacionais (cores). Análise das ligações off-chip actuais, das topologias de interligação entre componentes (chips e nós computacionais) e on-chip; introdução aos problemas de limitação de desempenho na interligação de cores e aos NoC's.

Sugestão de leituras: Os slides apresentados.

Sugestão de leituras: Os slides apresentados seguem na essência a secção 7.8 do Cap. 7 do livro recomendado (de Hennesy & Patterson). Recomenda-se ainda a leitura do artigo referenciado nos slides sobre NoC's.

Aula laboratorial: Análise de código desenvolvido com GPU, para execução em Fermi no cluster SeARCH.

Top...

Week 11

07-Dez-11 (9h-13h) 40h

Apresentação e discussão de trabalhos: Apresentação com discussão/debate colectivo de diversas alternativas de benchmarks para avaliar o desempenho de um nó computacional, na perspectivas dos diversos mecanismos de aceleração implementados nas arquitecturas. Algumas dicas sobre o teste de avaliação a realizar a 14-Dez-11 (template).

Top...

Week 12

14-Dez-11 (9h-13h) 44h

Avaliação: Teste escrito.

Trabalho para casa (no âmbito do Projecto Integrado): Apresentação do trabalho relacionado com a avaliação de desempenho da multiplicação de matrizes em diversas plataformas e condições ambientais (guião).

Top...

Week 13

26-Abr-11 (14h-15h) 45h

Aula teórica: Introdução e caracterização de unidades de computação livres de instruction set, as FPGA. Análise de modelos de programação.

Sugestão de leituras: Os slides apresentado, bem como slides sobre FCUDA e OpenCL em FPGA. Recomenda-se ainda os artigos sobre Heterogeneous Computing e Algorithmic Skeletons to Program FPGA.s.

Top...

Intellectual Property & Copyright

This publication - including its printing facility - and associated contents - that might include parcial reproductions of external works adequately cited - is exclusively aimed at the students registered at the Multi-core and Many-core Computing course at the Doctoral Program art MAP-i, at Universidade do Minho in 2010/2011, for personal use and e-learning purposes, and they do not aim any lucrative or commercial goal. Any other reproduction, either total or partial, by any support, medium or process, namely electronics, mechanical or photographic, including photocopy, modification of the contents, its public communication, its distribution through renting, without proper consent from the authors, is illegal and may incur in legal procedures.

Top...

Página mantida por aproenca<at>di.uminho.pt
Last modified: 03 Dez 2011