Overview

Deep Learning is rapidly emerging as one of the most successful and widely applicable sets of techniques across a range of domains (vision, language, speech, reasoning, robotics, medicine, science, and AI in general), leading to significant commercial success, transforming people's lives, and opening up exciting new directions that may previously have seemed out of reach.

This course will introduce students to the basics of Neural Networks (NNs) and expose them to cutting-edge research. It is structured into modules (Background, Convolutional NNs, NN Training, Sequence Modeling, Self-Supervised Learning, Generative Modeling, and Frontiers). These modules will be delivered through instructor-led lectures and TA-led tutorials, reinforced with assignments that cover both theoretical and practical aspects. The course will also include a project that allows students to explore an area of Deep Learning that interests them in more depth.

At the end of the course, guest speakers will be invited to share the latest research developments in academia and industry, offering valuable insights and broadening students' horizons in this dynamic field.

  • Prerequisites:
    • Programming: You should be familiar with algorithms and data structures. Familiarity with python or similar frameworks for numeric programming will be helpful but is not strictly required. Python (Basics).
    • Probability: You should have been exposed to probability distributions, random variables, expectations, etc. Linear Algebra (Essence, Chap 1-4), Multivariate Calculus (Essence, Chap 1, 3-4, 8-9).
    • Machine Learning: Some familiarity with machine learning will be helpful but not required; we will review important concepts that are needed for this course.
  • Lecture:
    Lectures will be Monday, Wednesday, and Friday at 245 Beacon St. Room 102, from 9:00am to 9:50am.
  • Textbooks and Materials:
    There is no required textbook for the course. However, the following books (available for free online) can be useful as references on relevant topics: You may also find this tutorial Deep Learning with PyTorch: A 60 Minute Blitz helpful.
  • Grading Policy:
    • No quizzes/exams.
    • 10%: Attendance (Including Asking Questions)
    • 60%: Homework Assigenments (10%*6)
    • 30%: Final Project (Proposal, Presentation, and Report)
    You will complete six programming assignments over the course of the semester. All homework assignments will be in Python, and will use PyTorch on Google Colab.
    Instead of a final exam, at the end of the semester you will complete a project working in groups of at most 3 students.

Staff

Instructor
Teaching Assistant
Teaching Assistant

Guest Speakers

MIT Media Lab & EECS
MIT CSAIL
MIT CSAIL
MIT CSAIL

Tentative Schedule (subject to changes)

Theme Date Topic Materials Assignments

Module I: Deep Learning Basics

ML Basics
Mon, Aug. 26 Lecture 1: Course Introduction
Course overview,
Course logistics
Wed, Aug. 28 Lecture 2: Machine Learning Basics
Machine learning overview
ML: pipeline, tasks
Linear regression, Polynomial regression
Assignment 1 out
  • [Lab1a: Python Basic]
  • [Lab1b: Linear Regression]
  • Fri, Aug. 30 Lecture 3: Linear regression
    Optimization: gradient-based solution, closed-form solution
    Underfit, Overfit, Regularization, Generalization
    No Class (Labor Day)
    Wed, Sept. 4 Lecture 4: Neural Network
    Binary Classification / Multi-Class Classification
    Sigmoid / Softmax
    Cross-Entropy Loss
    Assignment 1 due Assignment 2 out
  • [Lab2a: Gradient Descent]
  • [Lab2b: Pytorch Basics]
  • [Lab2c: Linear Classifier]
  • Fri, Sept. 6 Lecture 5: Multi-Layer Perceptron (MLP)
    Linear Problems / Non-Linear Problems
    Feature transforms Model: Fully-connected networks
    Computational Graph
    Optimization: Backpropagation
    Mon, Sept. 9 Lecture 6: Activation Functions and Optimization
    Activation Functions: ReLU, Sigmoid, tanh, Leaky ReLU, ELU
    Regularization
    Weight decay
    Deep Learning Architectures
    Wed, Sept. 11 Lecture 7: Convolutional Neural Networks (CNNs)
    Weight initialization, dropout, haperparameters
    Universal approximation theorem
    Intro to CNNs -- Convolution
    Assignment 2 due
    Fri, Sept. 13 Lecture 8: Convolutional Neural Networks (CNNs)
    Convolution: kernel, receptive field, stride
    Padding
    Learning convolutional filters
    One layer (breadth): multiple kernels
    K layers (depth): nonlinearity in between
    Mon, Sept. 16 Lecture 8: Convolutional Neural Networks (CNNs) -- Continued
    Wed, Sept. 18 Lecture 9: CNNs
    Pooling
    AlexNet
    Batch Normalization
    ResNet + Residual Blocks
    Assignment 3 out
  • [Lab 3: Autograd and NN]
  • Fri, Sept. 20 Lecture 10: CNN Architectures
    AlexNet, VGGNet, GoogLeNet, BatchNorm, ResNet
    Deep Learning Framework
    Mon, Sept. 23 Lecture 11: Training Neural Networks
    Activation functions
    Data preprocessing
    Weight initialization
    Data augmentation
    Regularization (Dropout, etc)
    Learning rate schedules
    Hyperparameter optimization
    Transfer learning
    Wed, Sept. 25 Lecture 12: Deep Learning Framework
    PyTorch
    Dynamic vs Static graphs
    Assignment 3 Due
    Fri, Sept. 27 Lecture 12: PyTorch Review Session (continued)
    PyTorch
    Dynamic vs Static graphs
    Mon, Sept. 30 Lecture 13: Final Project Overview
    Final Project Overview
    Life cycle of a Machine Learning System
    Sequential models use cases
    Wed, Oct. 2 Lecture 14: Recurrent Neural Networks (RNNs)
    Sequential models use cases
    CNNs for sequences
    RNNs
    Fri, Oct. 4 Lecture 15: Recurrent Networks: Stability analysis and LSTMs
    Gradient Explosion
    LSTM, GRU
    Language modeling
    Mon, Oct. 7 Lecture 15: Recurrent Networks: Stability analysis and LSTMs (2)

    Module II: Advanced Topics on Deep Learning

    Vision Applications
    Wed, Oct. 9 Lecture 16: Attention and Transformers
    Self-Attention
    Transformers
    Assignment 4 Part 1 Out
    Fri, Oct. 11 Lecture 16: Attention and Transformers (2)
    Multi-head Self-Attention
    Mask Self-Attention
    No Class (Fall Break)
    Wed, Oct. 16 Lecture 17: BERT and GPTs
    Encoder-Decoder Attention
    Word Embedding
    Pre-training
    Fri, Oct. 18 Lecture 17: BERT and GPTs (2)
    Assignment 4 Part 1 Due
    Mon, Oct. 21 Lecture 18: Training Large Language Models
    Self-Supervised Learning
    Data Scaling
    Assignment 4 Part 2 out (Transformers)
    Generative and Interactive Visual Intelligence
    Wed, Oct. 23 Lecture 19: Computer Vision: Detection and Segmentation
    Semantic segmentation
    Object detection
    Instance segmentation
    Fri, Oct. 25 Lecture 19: Computer Vision: Detection and Segmentation (2)
    Mon, Oct. 28 Lecture 20: Generative Models (1)
    Unsupervised Learning
    Clustering / PCA
    Autoregressive Models
    Wed, Oct. 30 Lecture 21: Generative Models (2) -- VAEs
    Convolutional AEs, Transpose Convolution
    Variational Autoencoders (VAE)
    Fri, Nov. 1 Lecture 24: Generative Models (2) -- VAEs (continued)
    VAE Loss - KL Divergence
    Reparameterization trick
    Conditional VAE
    Assignment 4 Part 2 Due (11/2)
    Mon, Nov. 4 Lecture 22: Generative Models (3) -- GANs
    Generative Adversarial Networks (GANs)
    Training GANs and challenges
    Applications
    Wed, Nov. 6 Lecture 23: Generative Models (4) -- Diffusion Models
    Denoising Diffusion Probabilistic Models (DDPMs)
    Conditional Diffusion Models
    Fri, Nov. 8 Lecture 23: Generative Models (4) -- Diffusion Models (continued)
    Denoising Diffusion Probabilistic Models (DDPMs)
    Conditional Diffusion Models
    Mon, Nov. 11 Lecture 24: Self-supervised Learning
    Pretext tasks
    Contrastive representation learning
    Instance contrastive learning: SimCLR and MOCO
    Sequence contrastive learning: CPC

    Wed, Nov. 13 Lecture 24: Self-supervised learning (continued)
    Fri, Nov. 15 Lecture 25: Transfer Learning
    Finetuning
    Knowledge distillation
    Fundation Models: Text Prompting, Visual Prompting, Prompting for other modalities, Combining Foundation Models

    Cutting-Edge Research
    Mon, Nov. 18 Multimodal AI

    Guest Speaker: Paul Liang (MIT Media Lab & MIT EECS)

    Bio: Paul Liang is an Assistant Professor at the MIT Media Lab and MIT EECS. His research advances the foundations of multisensory artificial intelligence to enhance the human experience. He is a recipient of the Siebel Scholars Award, Waibel Presidential Fellowship, Facebook PhD Fellowship, Center for ML and Health Fellowship, Rising Stars in Data Science, and 3 best paper awards. Outside of research, he received the Alan J. Perlis Graduate Student Teaching Award for developing new courses on multimodal machine learning.

    Abstract: Multimodal AI is a vibrant multi-disciplinary research field that aims to design AI with intelligent capabilities through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. Multimodality brings unique computational and theoretical challenges given the heterogeneity of data sources and the interconnections often found between modalities. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this lecture is designed to provide an overview of multimodal AI. Building upon a new survey paper (https://arxiv.org/abs/2209.03430), we will cover three topics: (1) what is multimodal: the principles in learning from heterogeneous, connected, and interacting data, (2) why is it hard: a taxonomy of six core technical challenges faced in multimodal ML but understudied in unimodal ML, and (3) what is next: major directions for future research as identified by our taxonomy.

    Wed, Nov. 20 Towards Test-time Self-supervised Learning (Slides)

    Guest Speaker: Yifei Wang (MIT CSAIL)

    Bio: Yifei Wang is a postdoc at MIT CSAIL, advised by Prof. Stefanie Jegelka. He earned his bachelor’s and Ph.D. degrees from Peking University. His research is focused on bridging the theory and practice of self-supervised learning to advance the scalability and safety of foundation models. His first-author works have been recognized by 3 best paper awards, including the sole Best ML Paper Award at ECML-PKDD 2021, the Silver Best Paper Award at the ICML 2021 AdvML Workshop, and the Best Paper Award at the ICML 2024 ICL Workshop. Academic page: https://yifeiwang77.com.

    Abstract: Self-supervised learning (SSL) has been instrumental in unlocking the potential of massive unlabeled datasets, driving the development of foundation models across various domains. However, the benefits from pretraining are diminishing, signaling a plateau in performance gains. To introduce a new dimension for scaling SSL beyond the pretraining stage, we propose the paradigm of test-time self-supervised learning (TT-SSL), which leverages test-time computation to enhance pretrained models without requiring labeled data. We investigate two examples of TT-SSL: (1) unsupervised in-context adaptation, where models adjust to downstream tasks during test time based solely on input context, and (2) self-correction through self-reflection and iterative improvement, allowing models to refine their predictions in real-time without external feedback. This paradigm unlocks the potential of test-time computation for self-exploration and autonomous improvement of model behaviors, offering a promising new direction for advancing the scalability and capabilities of foundation models.


    Fri, Nov. 22 Respiratory Intelligence: What Can AI Learn About Your Health from Your Breathing

    Guest Speaker: Hao He (MIT CSAIL)

    Bio: Hao is a final-year PhD student at MIT, where he is supervised by Prof. Dina Katabi. His research focuses on leveraging machine learning for healthcare applications, with a particular emphasis on sleep science. His contributions have been recognized through publications in top AI conferences and high-impact medical journals. Hao is the recipient of the Takeda Fellowship, awarded to outstanding researchers in AI and health, and Barbara J. Weedon Fellowship, given to researchers making advancements in neurodegenerative diseases.

    Abstract: Respiration is one of the most fundamental functions of the human body, closely tied to a person’s overall health. In this talk, I will explore how advancements in AI technology allow us to extract valuable health insights from nocturnal breathing patterns. I will address various health aspects, including sleep quality, physiological conditions such as oxygen desaturation and inflammation, and even neurodegenerative diseases like Alzheimer’s.


    Mon, Nov. 25 Generalizable Algorithms for Long-Horizon Manipulation in Complex Environments by Integrating Deep Learning and Planning-Based Approaches

    Guest Speaker: Zhutian Yang (MIT CSAIL)

    Bio: Zhutian Yang is a PhD candidate at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), advised by Leslie Kaelbling and Tomás Lozano-Pérez. Her research focuses on developing algorithms for long-horizon manipulation by combining deep learning with model-based planning techniques. Her work has been published in top robotics and learning conferences such as RSS, CoRL, and ICLR. She has gained valuable experience through internships at NVIDIA’s Seattle Robotics Lab and Toyota Research Institute’s Large Behaviors Team. Academic page: https://zt-yang.com.

    Abstract: To enable robots to perform long-horizon manipulation tasks in diverse, complex environments—such as organizing shelf spaces or making chicken soup in various office or home settings, it is beneficial to leverage the strengths of both deep learning and model-based methods. Learning-based methods offer rapid inference, local reactivity, and large-scale knowledge from the internet, but they struggle to generate long-horizon trajectories in visually diverse and geometrically complex environments. On the other hand, Task and motion planning ensures geometric feasibility, but its computational demands become impractical as the state space and task horizon grow. Additionally, encoding domain-specific knowledge and object dynamics is often cumbersome. Neither approach alone can fully address the complexity of real-world robotic tasks in a generalizable way. To overcome these challenges, we can strategically determine which components to learn from data and which to delegate to domain-agnostic planners. This talk will explore three recent projects performed in this fashion. They address tasks with intricate temporal and geometric dependencies, such as making a chicken soup, packing a box full of objects, and rearranging office chairs in a cluttered conference room.


    Reference Papers:
    [Sequence-Based Plan Feasibility Prediction for Efficient Task and Motion Planning],
    [Compositional Diffusion-Based Continuous Constraint Solvers],
    [Guiding Long-Horizon Task and Motion Planning with Vision Language Models],
    [Combining Planning and Diffusion for Mobility with Unknown Dynamics].

    No Class (Thanksgiving Break)
    No Class (Thanksgiving Break)
    Final Projects
    Mon, Dec. 2 Final Project Presentation (1)
    Wed, Dec. 4 Final Project Presentation (2)
    Fri, Dec. 6 Final Project Presentation (3)
    TBA Final project report/code due


    Office Hours

    Name Office hours
    Yuan Mon/Tue 3-4PM @ 245 Beacon Rm. 528E
    Lejun M 5-6, 5-7PM if needed, W 5-6PM @ CS Lab
    Yunhan TW TH 6-7PM @ CS Lab
    • Office hours will take place in person (or Zoom if needed).


    Course Information

    This is a challenging course and we are here to help you become a more-AI version of yourself. Please feel free to reach out if you need help in any form.

    1. Get help (besides office hours)

    • Dropbox: The lecture pdfs will be uploaded to Dropbox (follow the link) and you can ask questions there by making comments on the slides directly.
    • Discord: For labs/psets/final projects, we will create dedicated channels for you to ask public questions. If you cannot make your post public (e.g., due to revealing problem set solutions), please directly DM TAs or the instructor separately, or come to office hours. Please note, however, that the course staff cannot provide help debugging code, and there is no guarantee that they'll be able to answer last-minute homework questions before the deadline. We also appreciate it when you respond to questions from other students! If you have an important question that you would prefer to discuss over email, you may email the course staff, or you can contact the instructor by email directly.
    • Support: The university counseling services center provides a variety of programs and activities.
    • Accommodations for students with disabilities: If you are a student with a documented disability seeking reasonable accommodations in this course, please contact Kathy Duggan, (617) 552-8093, dugganka@bc.edu, at the Connors Family Learning Center regarding learning disabilities and ADHD, or Rory Stein, (617) 552-3470, steinr@bc.edu, in the Disability Services Office regarding all other types of disabilities, including temporary disabilities. Advance notice and appropriate documentation are required for accommodations.

    2. Homework submission

    All programming assignments are in Python on Colab, always due at midnight (11:59 pm) on the due date.
    • Install Colab on the browser: Sign in to your Google account, follow the "Link" (to be updated) to the folder of assignments, click on lab0.ipynb, click on "Open with" and "Connect more apps", install "Colaboratory".
    • Submission: You need save a copy of the file in your own Google drive, so that you can save your edits. Afterwards, you can download the ipynb file and submit it to Canvas.
    • Final project: In lieu of a final exam, we'll have a final project. This project will be completed in small groups during the last weeks of the class. The direction for this project is open-ended: you can either choose from a list of project ideas that we distribute, or you can propose a topic of your own. A short project proposal will be due approximately halfway through the course. During the final exam period, you'll turn in a final report and give a short presentation. You may use an ongoing research work for your final project, as long it meets the requirements.

    3. Academic policy

    • Late days: You'll have 1 late day for every lab and pset respectively over the course of the semester. Each time you use one, you may submit a homework assignment one day late without penalty. You do not need to notify us when you use a late day; we'll deduct it automatically. If you run out of late days and still submit late, your assignment will be penalized at a rate of 2% per day. If you edit your assignment after the deadline, this will count as a late submission, and we'll use the revision time as the date of submission (after a short grace period of a few minutes). We will not provide additional late time, except under exceptional circumstances, and for these we'll require documentation (e.g., a doctor's note). Please note that the late days are provided to help you deal with minor setbacks, such as routine illness or injury, paper deadlines, interviews, and computer problems; these do not generally qualify for an additional extension.
    • Academic integrity: While you are encouraged to discuss homework assignments with other students, your programming work must be completed individually. You may not search for solutions online, or to use existing implementations of the algorithms in the assignments. Thus it is acceptable to learn from another student the general idea for writing program code to perform a particular task, or the technique for solving a mathematical problem, but unacceptable for two students to prepare their assignments together and submit what are essentially two copies of identical work. If you have any uncertainty about the application of this policy, please check with me. Failure to comply with these guidelines will be considered a violation of the University policies on academic integrity. Please make sure that you are familiar with these policies. We will use moss.pl tool to check each lab and pset for plagriasm detection.
    • AI assistants policy:
      • Our policy for using ChatGPT and other AI assistants is identical to our policy for using human assistants.
      • This is a deep learning class and you should try out all the latest AI assistants (they are pretty much all using deep learning). It's very important to play with them to learn what they can do and what they can't do. That's a part of the content of this course.
      • Just like you can come to office hours and ask a human questions (about the lecture material, clarifications about pset questions, tips for getting started, etc), you are very welcome to do the same with AI assistants.
      • But: just like you are not allowed to ask an expert friend to do your homework for you, you also should not ask an expert AI.
      • If it is ever unclear, just imagine the AI as a human and apply the same norm as you would with a human.

    4. Related Classes / Online Resources
    Acknowledgements: This course draws heavily from MIT's 6.869: Advances in Computer Vision by Antonio Torralba, William Freeman, and Phillip Isola, and from Stanford's CS231n: Deep Learning for Computer Vision by Fei-Fei Li. It also includes lecture slides from other researchers, including Andrew Owens , Svetlana Lazebnik, Alexei Efros, Fei-fei Li, Carl Vondrick, David Fouhey, Justin Johnson, and Noah Snavely, David Fouhey and Ava Amini. Special thanks to Hao Wang for the insightful and generous advice.