ECE 669,VIP, NTU720 -- Parallel Computer Architecture.
University of Massachusetts Amherst
Department of Electrical and Computer Engineering
[This page:Syllabus | Course Info | Schedule]
[Goto Homeworks | Project(optional)]Instructor: Csaba Andras Moritz, Associate Professor
Email: andras@ecs.umass.edu,
Phone: 413-545-2442
Office: Knowles, room KEB-309H
Secretary: June Daehler, phone: 413-545 3621
Office hours: by appointment, send email
Teaching Assistant: contact instructor
Abstract:
This course has its objective to train students to design and evaluate parallel architectures/systems for scientific/engineering/enterprise application domains. Will be covering the main architectural components in a parallel computer, including midrange SMPs, high-performance scalable distributed memory machines, and parallel single-chip designs; the programming models deployed on parallel machines (e.g., message passing, data-parallel, and shared memory models); and automatically parallelizing compilers (e.g., compiler techniques for memory disambiguation, mapping, partitioning, communication and computation scheduling and various parallelism extraction techniques). Recent emerging trends and ideas in parallel machine designs on a single-chip, proposed for billion transistor multiprocessor-on-a-chip architectures, are also discussed, including coarse-grained chip-multiprocessors, fine-grained reconfigurable tiled designs, as well as speculative parallel architectures. State-of-the-art parallel machine models are presented that can be used to explore the design space of current/future parallel systems, parallel algorithms, and to make the correct design choices for the grain size and balance of parallel computer systems.
Textbook:
David E. Culler, Jaswinder Pal Singh, Anoop Gupta; Parallel Computer Architectures. A Hardware/Software Approach.
Additional reading will be from:
1. Research Papers, will be posted online.
2. F Thomson Leighton; Introduction to Parallel Algorithms and Architecture.
3. Vipin Kumar, Ananth Grama, Anshul Gupta, George Karypis; Introduction to Parallel Computing.
4. Daniel Lenoski, Wolf-Dietrich Weber; Scalable Shared-Memory Multiprocessing.
Course Notes: provided online on the course website.
Requirements: two homeworks (based on cache and network multiprocessor simulators) and one exam, research project component is optional. The course will contain 25 lectures.
Grading (preliminary): 70% exam(s) and project, 30% Homeworks.Prerequisites: Introductory level Computer Architecture, basic Algorithms, some understanding of how compilers work.
Equipment:
On-Campus Computer Account: An ECS account is required or access to a SUN/Solaris computer. Off-campus students please contact VIP at UMASS for ECS accounts.
Course URL will be @: /ece/andras/courses/ECE669VIP/index.html
Event Topics Notes (PS or PDF formats) Additional
notes from classTextbook Reading Problems to Solve @ Home Additional Reading Homework (solutions should be sent to the TA.) Lecture 1 Introduction & Course Information Lecture 1 Notes on blue pad Ch1 See Notes Lecture 2 Fundamental Design Issues Lecture 2 Ch1 Lecture 3 Parallel Applications, Implementations under Various Programming Models Lecture 3 Ch2 for algorithms see books 2,3 Project out Lecture 4 Implementations under Various Programming Models Slides from previous lecture Ch2 Lecture 5 Parallel Programming Models, Commercial Applications, Lecture 5 Notes on blue pad Ch3 Implement a version of MxM in all 3 Prog. Models PrgLang , Active Messages, OpenMP[1] [2] Project plans due for those that selected to do the project **(read note at bottom of page) Lecture 6 Commercial Applications, MPI, Performance Aspects Slides from previous lecture Notes on Blue pad,
Sample1-PtToPtComm
Sample2-Scatter
Sample3-GatherCh3 Implement the equation solver from textbook in MPI MPI [1][2] Lecture 7 Architecture of Midrange Bus Based SMPs, Snoop-Based Multiprocessing Lecture7 Article Ch5 Lecture 8 Architecture of Midrange Bus Based SMPs, Snoop-Based Multiprocessing Slides from previous lecture Notes on blue pad Ch5 Homework 1 out Lecture 9 Snoop-Based Multiprocessing Lecture 9 Notes on blue pad Ch6 Lecture 10 Scalable Multiprocessors: Cache Coherence Lecture 10 Notes on blue pad Ch8 Lecture 11 Scalable Multiprocessors: Cache Coherence Slides from previous lecture Ch8 Lecture 12 Case Studies- Scalable Multiprocessors: Cache Coherence, limited pointer schemes Slides from previous lecture Ch8 Lecture 13 Project Related Discussion- Last Part of Cache Coherence, memory consistency models Slides from previous lecture Notes on blue pad Ch8 & papers Stenstrom Survey False sharing [2], Dash[3] Lecture 14 Interconnection Networks Lecture 14 Ch10 Lecture 15 Performance Modeling of Parallel Machines Slides from previous lecture Culler LogP [1][2], LogGP papers, MPI model Homework 2 out . Send to TA (email or paper copy) Late submissions will not be accepted.
Lecture 16 Exam Review Canceled (because of sickness.) Lecture 16 Exam Review and Network Topologies Review if you need the solution for the hw send me email, I can't put it up on the web! LoGPC, Frank LoPC, Agarwal KnC Exam Midterm Lecture 17 Unloaded Performance in K-ary N-cubes. same slides Notes on blue pad Off-campus exam date ! VIP office will send info! PROJECTS ARE DUE FOR THOSE WHO SELECTED THIS OPTION Lecture 18 Routing same slides Lecture19 Alewife res. paper Alewife paper [2] [3] Lecture20 Alewife res. paper Alewife paper [2] [3] Lecture21 Network & Resource Contention slides LoGPC, Frank LoPC, Agarwal KnC Lecture22
Estimating network contention in applications. The Dash multiprocessor same slides Notes on blue pad and Dash overview Daniel Lenoski, Wolf-Dietrich Weber; Scalable Shared-Memory Multiprocessing. (book) Lecture22 The Stanford Dash multiprocessor Lecture23 Microarchitectural trends. Discussing Micro-34 Overview Raw and Hydra. Lecture24 MLP vs ILP & Raw Compiler, slides material will be distributed in class.{download} RawCC papers [1][2][3], DeepC paper SUDS, HotPages Homework 2 is due May 7 Send to instructor (email or paper copy) Late submissions will not be accepted. Lecture25 Raw Architecture and Design Exploration. Billion transistor designs. slides Raw Design MS Thesis [2] SimpleFit paper Last lecture! Final exam review FINAL EXAM Additional readings: Alewife synchronization [1][2][3]
IEEE Computer 1998, Spec Issue on Billion Transistor Architectures