Module I: Deep Learning Basics |
ML Basics
|
Mon, Aug. 26 |
Lecture 1: Course Introduction
Course overview,
Course logistics
|
|
|
Wed, Aug. 28 |
Lecture 2: Machine Learning Basics
Machine learning overview
ML: pipeline, tasks
Linear regression, Polynomial regression
|
|
Assignment 1 out
[Lab1a: Python Basic]
[Lab1b: Linear Regression]
|
Fri, Aug. 30 |
Lecture 3: Linear regression
Optimization: gradient-based solution, closed-form solution
Underfit, Overfit, Regularization, Generalization
|
|
|
No Class (Labor Day) |
Wed, Sept. 4 |
Lecture 4: Neural Network
Binary Classification / Multi-Class Classification
Sigmoid / Softmax
Cross-Entropy Loss
|
|
Assignment 1 due
Assignment 2 out
[Lab2a: Gradient Descent]
[Lab2b: Pytorch Basics]
[Lab2c: Linear Classifier]
|
Fri, Sept. 6 |
Lecture 5: Multi-Layer Perceptron (MLP)
Linear Problems / Non-Linear Problems
Feature transforms
Model: Fully-connected networks
Computational Graph
Optimization: Backpropagation
|
|
|
Mon, Sept. 9 |
Lecture 6: Activation Functions and Optimization
Activation Functions: ReLU, Sigmoid, tanh, Leaky ReLU, ELU
Regularization
Weight decay
|
|
|
Deep Learning Architectures
|
Wed, Sept. 11 |
Lecture 7: Convolutional Neural Networks (CNNs)
Weight initialization, dropout, haperparameters
Universal approximation theorem
Intro to CNNs -- Convolution
|
|
Assignment 2 due
|
Fri, Sept. 13 |
Lecture 8: Convolutional Neural Networks (CNNs)
Convolution: kernel, receptive field, stride
Padding
Learning convolutional filters
One layer (breadth): multiple kernels
K layers (depth): nonlinearity in between
|
|
|
Mon, Sept. 16 |
Lecture 8: Convolutional Neural Networks (CNNs) -- Continued
|
|
|
Wed, Sept. 18 |
Lecture 9: CNNs
Pooling
AlexNet
Batch Normalization
ResNet + Residual Blocks
|
|
Assignment 3 out
[Lab 3: Autograd and NN]
|
Fri, Sept. 20 |
Lecture 10: CNN Architectures
AlexNet, VGGNet, GoogLeNet, BatchNorm, ResNet
Deep Learning Framework
|
|
|
Mon, Sept. 23 |
Lecture 11: Training Neural Networks
Activation functions
Data preprocessing
Weight initialization
Data augmentation
Regularization (Dropout, etc)
Learning rate schedules
Hyperparameter optimization
Transfer learning
|
|
|
Wed, Sept. 25 |
Lecture 12: Deep Learning Framework
PyTorch
Dynamic vs Static graphs
|
|
Assignment 3 Due
|
Fri, Sept. 27 |
Lecture 12: PyTorch Review Session (continued)
PyTorch
Dynamic vs Static graphs
|
|
|
Mon, Sept. 30 |
Lecture 13: Final Project Overview
Final Project Overview
Life cycle of a Machine Learning System
Sequential models use cases
|
|
|
Wed, Oct. 2 |
Lecture 14: Recurrent Neural Networks (RNNs)
Sequential models use cases
CNNs for sequences
RNNs
|
|
|
Fri, Oct. 4 |
Lecture 15: Recurrent Networks: Stability analysis and LSTMs
Gradient Explosion
LSTM, GRU
Language modeling
|
|
|
Mon, Oct. 7 |
Lecture 15: Recurrent Networks: Stability analysis and LSTMs (2)
|
|
|
Module II: Advanced Topics on Deep Learning |
Vision Applications
|
Wed, Oct. 9 |
Lecture 16: Attention and Transformers
Self-Attention
Transformers
|
|
Assignment 4 Part 1 Out
|
Fri, Oct. 11 |
Lecture 16: Attention and Transformers (2)
Multi-head Self-Attention
Mask Self-Attention
|
|
|
No Class (Fall Break) |
Wed, Oct. 16 |
Lecture 17: BERT and GPTs
Encoder-Decoder Attention
Word Embedding
Pre-training
|
|
|
Fri, Oct. 18 |
Lecture 17: BERT and GPTs (2)
|
|
Assignment 4 Part 1 Due
|
Mon, Oct. 21 |
Lecture 18: Training Large Language Models
Self-Supervised Learning
Data Scaling
|
|
Assignment 4 Part 2 out (Transformers)
|
Generative and Interactive Visual Intelligence
|
Wed, Oct. 23 |
Lecture 19: Computer Vision: Detection and Segmentation
Semantic segmentation
Object detection
Instance segmentation
|
|
|
Fri, Oct. 25 |
Lecture 19: Computer Vision: Detection and Segmentation (2)
|
|
|
Mon, Oct. 28 |
Lecture 20: Generative Models (1)
Unsupervised Learning
Clustering / PCA
Autoregressive Models
|
|
|
Wed, Oct. 30 |
Lecture 21: Generative Models (2) -- VAEs
Convolutional AEs, Transpose Convolution
Variational Autoencoders (VAE)
|
|
|
Fri, Nov. 1 |
Lecture 24: Generative Models (2) -- VAEs (continued)
VAE Loss - KL Divergence
Reparameterization trick
Conditional VAE
|
|
Assignment 4 Part 2 Due (11/2)
|
Mon, Nov. 4 |
Lecture 22: Generative Models (3) -- GANs
Generative Adversarial Networks (GANs)
Training GANs and challenges
Applications
|
|
|
Wed, Nov. 6 |
Lecture 23: Generative Models (4) -- Diffusion Models
Denoising Diffusion Probabilistic Models (DDPMs)
Conditional Diffusion Models
|
|
|
Fri, Nov. 8 |
Lecture 23: Generative Models (4) -- Diffusion Models (continued)
Denoising Diffusion Probabilistic Models (DDPMs)
Conditional Diffusion Models
|
|
|
Mon, Nov. 11 |
Lecture 24: Self-supervised Learning
Pretext tasks
Contrastive representation learning
Instance contrastive learning: SimCLR and MOCO
Sequence contrastive learning: CPC
|
|
|
Wed, Nov. 13 |
Lecture 24: Self-supervised learning (continued)
|
|
|
Fri, Nov. 15 |
Lecture 25: Transfer Learning
Finetuning
Knowledge distillation
Fundation Models: Text Prompting, Visual Prompting, Prompting for other modalities, Combining Foundation Models
|
|
|
Cutting-Edge Research
|
Mon, Nov. 18 |
Multimodal AI
Guest Speaker: Paul Liang (MIT Media Lab & MIT EECS)
Bio: Paul Liang is an Assistant Professor at the MIT Media Lab and MIT EECS. His research advances the foundations of multisensory artificial intelligence to enhance the human experience. He is a recipient of the Siebel Scholars Award, Waibel Presidential Fellowship, Facebook PhD Fellowship, Center for ML and Health Fellowship, Rising Stars in Data Science, and 3 best paper awards. Outside of research, he received the Alan J. Perlis Graduate Student Teaching Award for developing new courses on multimodal machine learning.
|
Abstract: Multimodal AI is a vibrant multi-disciplinary research field that aims to design AI with intelligent capabilities through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. Multimodality brings unique computational and theoretical challenges given the heterogeneity of data sources and the interconnections often found between modalities. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this lecture is designed to provide an overview of multimodal AI. Building upon a new survey paper (https://arxiv.org/abs/2209.03430), we will cover three topics: (1) what is multimodal: the principles in learning from heterogeneous, connected, and interacting data, (2) why is it hard: a taxonomy of six core technical challenges faced in multimodal ML but understudied in unimodal ML, and (3) what is next: major directions for future research as identified by our taxonomy.
|
|
Wed, Nov. 20 |
Towards Test-time Self-supervised Learning (Slides)
Guest Speaker: Yifei Wang (MIT CSAIL)
Bio: Yifei Wang is a postdoc at MIT CSAIL, advised by Prof. Stefanie Jegelka. He earned his bachelor’s and Ph.D. degrees from Peking University. His research is focused on bridging the theory and practice of self-supervised learning to advance the scalability and safety of foundation models. His first-author works have been recognized by 3 best paper awards, including the sole Best ML Paper Award at ECML-PKDD 2021, the Silver Best Paper Award at the ICML 2021 AdvML Workshop, and the Best Paper Award at the ICML 2024 ICL Workshop. Academic page: https://yifeiwang77.com.
Abstract: Self-supervised learning (SSL) has been instrumental in unlocking the potential of massive unlabeled datasets, driving the development of foundation models across various domains. However, the benefits from pretraining are diminishing, signaling a plateau in performance gains. To introduce a new dimension for scaling SSL beyond the pretraining stage, we propose the paradigm of test-time self-supervised learning (TT-SSL), which leverages test-time computation to enhance pretrained models without requiring labeled data. We investigate two examples of TT-SSL: (1) unsupervised in-context adaptation, where models adjust to downstream tasks during test time based solely on input context, and (2) self-correction through self-reflection and iterative improvement, allowing models to refine their predictions in real-time without external feedback. This paradigm unlocks the potential of test-time computation for self-exploration and autonomous improvement of model behaviors, offering a promising new direction for advancing the scalability and capabilities of foundation models.
|
|
|
Fri, Nov. 22 |
Respiratory Intelligence: What Can AI Learn About Your Health from Your Breathing
Guest Speaker: Hao He (MIT CSAIL)
Bio: Hao is a final-year PhD student at MIT, where he is supervised by Prof. Dina Katabi. His research focuses on leveraging machine learning for healthcare applications, with a particular emphasis on sleep science. His contributions have been recognized through publications in top AI conferences and high-impact medical journals. Hao is the recipient of the Takeda Fellowship, awarded to outstanding researchers in AI and health, and Barbara J. Weedon Fellowship, given to researchers making advancements in neurodegenerative diseases.
|
Abstract: Respiration is one of the most fundamental functions of the human body, closely tied to a person’s overall health. In this talk, I will explore how advancements in AI technology allow us to extract valuable health insights from nocturnal breathing patterns. I will address various health aspects, including sleep quality, physiological conditions such as oxygen desaturation and inflammation, and even neurodegenerative diseases like Alzheimer’s.
|
|
Mon, Nov. 25 |
Generalizable Algorithms for Long-Horizon Manipulation in Complex Environments by Integrating Deep Learning and Planning-Based Approaches
Guest Speaker: Zhutian Yang (MIT CSAIL)
Bio: Zhutian Yang is a PhD candidate at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), advised by Leslie Kaelbling and Tomás Lozano-Pérez. Her research focuses on developing algorithms for long-horizon manipulation by combining deep learning with model-based planning techniques. Her work has been published in top robotics and learning conferences such as RSS, CoRL, and ICLR. She has gained valuable experience through internships at NVIDIA’s Seattle Robotics Lab and Toyota Research Institute’s Large Behaviors Team. Academic page: https://zt-yang.com.
|
Abstract: To enable robots to perform long-horizon manipulation tasks in diverse, complex environments—such as organizing shelf spaces or making chicken soup in various office or home settings, it is beneficial to leverage the strengths of both deep learning and model-based methods. Learning-based methods offer rapid inference, local reactivity, and large-scale knowledge from the internet, but they struggle to generate long-horizon trajectories in visually diverse and geometrically complex environments. On the other hand, Task and motion planning ensures geometric feasibility, but its computational demands become impractical as the state space and task horizon grow. Additionally, encoding domain-specific knowledge and object dynamics is often cumbersome. Neither approach alone can fully address the complexity of real-world robotic tasks in a generalizable way. To overcome these challenges, we can strategically determine which components to learn from data and which to delegate to domain-agnostic planners. This talk will explore three recent projects performed in this fashion. They address tasks with intricate temporal and geometric dependencies, such as making a chicken soup, packing a box full of objects, and rearranging office chairs in a cluttered conference room.
Reference Papers:
[Sequence-Based Plan Feasibility Prediction for Efficient Task and Motion Planning],
[Compositional Diffusion-Based Continuous Constraint Solvers],
[Guiding Long-Horizon Task and Motion Planning with Vision Language Models],
[Combining Planning and Diffusion for Mobility with Unknown Dynamics].
| | |
No Class (Thanksgiving Break) |
No Class (Thanksgiving Break) |
Final Projects
|
Mon, Dec. 2 |
Final Project Presentation (1)
|
| |
Wed, Dec. 4 |
Final Project Presentation (2)
|
| |
Fri, Dec. 6 |
Final Project Presentation (3)
|
| |
TBA |
|
| Final project report/code due |