Syllabus

This course is designed for graduate and senior undergraduate students to study hardware acceleration techniques for machine learning (ML). It covers the design and optimization of hardware systems, from mobile devices to cloud systems, with a focus on ML training and inference.

Course Overview

Objective: To understand hardware acceleration for ML, including architectural techniques, performance optimization, and trade-offs.
Course Structure: The course is divided into three parts:
1. ML Models: Convolutional and deep neural networks.
2. Parallelization: Techniques to improve ML algorithm performance.
3. Hardware Design: Focus on acceleration, efficiency in ML kernel computation, and design space trade-offs (locality, precision, compression, etc.).

Key Topics

ML Principles: Supervised and unsupervised learning, neural networks (CNN, DNN), loss functions, linear classification.
Parallel Computing Architectures: SIMD, MIMD, GPU, TPU, instruction decoding, memory subsystem optimization.
Optimization Techniques: Quantization, sparsity, pruning, energy-efficient dataflow, systolic arrays.
Hardware Design: Dataflow mapping, matrix operations, performance evaluation metrics (accuracy, throughput, energy), mapping algorithms to hardware.

Labs

Lab 1: Training a neural network using Google Colab, implementing batch normalization, dropout, and pruning in PyTorch/TensorFlow.
Lab 2: Parallelizing convolution using MPI in Python, analyzing performance gains.
Lab 3: Implementing a systolic array in Verilog/OpenCL, synthesizing code, and simulating the design.
Lab 4 (Graduate): SIMD programming with vector intrinsics in Intel Xeon processor for matrix multiplication.

Evaluation

Emphasis on hands-on experience, coding in Python and Verilog/OpenCL.
The course uses recent publications and offers practical labs for direct application of techniques learned.
Exam will constitute 40% of the grade.
Homeworks are for practice only. Not graded.

The course aims to provide both theoretical knowledge and practical experience with hardware design for efficient ML computation.

Prerequisites

Class is open to computer engineering seniors and graduate students. Prior experience with Python required.

Textbook

There are no textbooks for this class. The course will use a variety of publicly accessible materials.

Late Policy

Project demos and lab reports are due as posted on the course web page. Late submissions will not be accepted in general and be graded at the instructor’s discretion. If you know that your project is running late, contact the instructor to make individual arrangements.