ECE 669,VIP, NTU720 -- Parallel Computer Architecture.
University of Massachusetts Amherst
Department of Electrical and Computer Engineering

[This page:Syllabus | Course Info | Schedule]
Homeworks | Project(optional)]

InstructorCsaba Andras Moritz, Associate Professor
Phone: 413-545-2442
Office: Knowles, room KEB-309H
Secretary:   June Daehler, phone: 413-545 3621
Office hours: by appointment, send email
Teaching Assistant: contact instructor

This course has its objective to train students to design and evaluate parallel architectures/systems for scientific/engineering/enterprise application domains. Will be covering the main architectural components in a parallel computer, including midrange SMPs, high-performance scalable distributed memory machines, and parallel single-chip designs; the programming models deployed on parallel machines (e.g., message passing, data-parallel, and shared memory models); and automatically parallelizing compilers (e.g., compiler techniques for memory disambiguation, mapping, partitioning, communication and computation scheduling and various parallelism extraction techniques). Recent emerging trends and ideas in parallel machine designs on a single-chip, proposed for billion transistor multiprocessor-on-a-chip architectures, are also discussed, including  coarse-grained chip-multiprocessors, fine-grained reconfigurable tiled designs, as well as speculative parallel architectures. State-of-the-art parallel machine models are presented that can be used to explore the design space of current/future parallel systems, parallel algorithms, and to make the correct design choices for the grain size and balance of parallel computer systems.

Course Info:


David E. Culler, Jaswinder Pal Singh, Anoop Gupta; Parallel Computer Architectures. A Hardware/Software Approach.

Additional reading will be from:
1. Research Papers, will be posted online.
2. F Thomson Leighton; Introduction to Parallel Algorithms and Architecture.
3. Vipin Kumar, Ananth Grama, Anshul Gupta, George Karypis; Introduction to Parallel Computing.
4. Daniel Lenoski, Wolf-Dietrich Weber; Scalable Shared-Memory Multiprocessing.

Course Notes: provided online on the course website.
Requirements:  two homeworks (based on cache and network multiprocessor simulators) and one exam, research project component is optional. The course will contain 25 lectures.
Grading (preliminary):  70% exam(s) and project, 30% Homeworks.

Prerequisites: Introductory level Computer Architecture, basic Algorithms, some understanding of how compilers work.


On-Campus Computer Account: An ECS account is required or access to a SUN/Solaris computer. Off-campus students please contact VIP at UMASS for ECS accounts.
Course URL will be @: /ece/andras/courses/ECE669VIP/index.html

Schedule (preliminary!):

Event    Topics Notes (PS or PDF formats) Additional
notes from class
Textbook Reading  Problems to Solve @ Home Additional Reading Homework (solutions should be sent to the TA.)
Lecture 1   Introduction & Course Information Lecture 1 Notes on blue pad Ch1 See Notes    
Lecture 2   Fundamental Design Issues Lecture 2   Ch1      
Lecture 3   Parallel Applications, Implementations under Various Programming Models Lecture 3   Ch2 for algorithms see books 2,3   Project out
Lecture 4    Implementations under Various Programming Models Slides from previous  lecture   Ch2      
Lecture 5    Parallel Programming Models, Commercial Applications, Lecture 5 Notes on blue pad Ch3 Implement a version of  MxM in all 3 Prog. Models PrgLang , Active Messages, OpenMP[1] [2] Project plans due for those that selected to do the project **(read note at bottom of page)
Lecture 6   Commercial Applications, MPI, Performance Aspects Slides from previous lecture Notes on Blue pad,
Ch3 Implement the equation solver from textbook in MPI MPI [1][2]  
Lecture 7   Architecture of Midrange Bus Based SMPs, Snoop-Based Multiprocessing Lecture7  Article Ch5      
Lecture 8   Architecture of Midrange Bus Based SMPs, Snoop-Based Multiprocessing Slides from previous lecture Notes on blue pad Ch5     Homework 1 out
Lecture 9   Snoop-Based Multiprocessing Lecture 9 Notes on blue pad Ch6      
Lecture 10   Scalable Multiprocessors: Cache Coherence Lecture 10 Notes on blue pad Ch8      
Lecture 11   Scalable Multiprocessors: Cache Coherence Slides from previous lecture   Ch8      
Lecture 12   Case Studies- Scalable Multiprocessors: Cache Coherence, limited pointer schemes Slides from previous lecture   Ch8      
Lecture 13   Project Related Discussion- Last Part of Cache Coherence, memory consistency models Slides from previous lecture Notes on blue pad Ch8 & papers   Stenstrom Survey False sharing [2], Dash[3]  
Lecture 14   Interconnection Networks Lecture 14   Ch10      
Lecture 15   Performance Modeling of Parallel Machines Slides from previous lecture       Culler LogP [1][2], LogGP papers, MPI model Homework 2 out

. Send to TA (email or paper copy) Late submissions will not be accepted.

Lecture 16   Exam Review Canceled (because of sickness.)          
Lecture 16   Exam Review and Network Topologies Review if you need the solution for the hw send me email, I can't put it up on the web!     LoGPC, Frank LoPC, Agarwal KnC  
Exam   Midterm            
Lecture 17   Unloaded Performance in K-ary N-cubes. same slides Notes on blue pad       Off-campus exam date ! VIP office will send info!
Lecture 18   Routing same slides            
Lecture19   Alewife res. paper       Alewife paper [2] [3]    
Lecture20   Alewife res. paper       Alewife paper [2] [3]    
Lecture21   Network & Resource Contention slides       LoGPC, Frank LoPC, Agarwal KnC  


  Estimating network contention in applications. The Dash multiprocessor same slides Notes on blue pad and Dash overview     Daniel Lenoski, Wolf-Dietrich Weber; Scalable Shared-Memory Multiprocessing. (book)  
Lecture22    The Stanford Dash multiprocessor            
Lecture23   Microarchitectural trends. Discussing Micro-34 Overview Raw and Hydra.            
Lecture24   MLP vs ILP & Raw Compiler, slides material will be distributed in class.{download}     RawCC papers [1][2][3], DeepC paper SUDS, HotPages Homework 2 is due May 7  Send to instructor (email or paper copy) Late submissions will not be accepted.
Lecture25   Raw Architecture and Design Exploration. Billion transistor designs. slides       Raw Design MS Thesis [2] SimpleFit paper Last lecture!
    Final exam review            
    FINAL EXAM            
              Additional readings:

Alewife synchronization [1][2][3]

IEEE Computer 1998, Spec Issue on Billion Transistor Architectures