CSCI 3370: Fall 2024

Overview

Deep Learning is rapidly emerging as one of the most successful and widely applicable sets of techniques across a range of domains (vision, language, speech, reasoning, robotics, medicine, science, and AI in general), leading to significant commercial success, transforming people's lives, and opening up exciting new directions that may previously have seemed out of reach.

This course will introduce students to the basics of Neural Networks (NNs) and expose them to cutting-edge research. It is structured into modules (Background, Convolutional NNs, NN Training, Sequence Modeling, Self-Supervised Learning, Generative Modeling, and Frontiers). These modules will be delivered through instructor-led lectures and TA-led tutorials, reinforced with assignments that cover both theoretical and practical aspects. The course will also include a project that allows students to explore an area of Deep Learning that interests them in more depth.

At the end of the course, guest speakers will be invited to share the latest research developments in academia and industry, offering valuable insights and broadening students' horizons in this dynamic field.

Prerequisites:
- Programming: You should be familiar with algorithms and data structures. Familiarity with python or similar frameworks for numeric programming will be helpful but is not strictly required. Python (Basics).
- Probability: You should have been exposed to probability distributions, random variables, expectations, etc. Linear Algebra (Essence, Chap 1-4), Multivariate Calculus (Essence, Chap 1, 3-4, 8-9).
- Machine Learning: Some familiarity with machine learning will be helpful but not required; we will review important concepts that are needed for this course.
Lecture:
Lectures will be Monday, Wednesday, and Friday at 245 Beacon St. Room 102, from 9:00am to 9:50am.
Textbooks and Materials:
There is no required textbook for the course. However, the following books (available for free online) can be useful as references on relevant topics:
- Deep Learning (DL), Goodfellow, Ian and Bengio, Yoshua and Courville, Aaron, MIT Press, 2016, ISBN: 9780262035613
- Dive into Deep Learning (D2L), Zhang et al.
- Computer Vision: Algorithms and Applications 2nd Edition (CV), Richard Szeliski.
- Pattern Recognition and Machine Learning (PRML), Christopher C. Bishop, Springer, 2006, ISBN: 9780387310732
You may also find this tutorial Deep Learning with PyTorch: A 60 Minute Blitz helpful.
Grading Policy:
- No quizzes/exams.
- 10%: Attendance (Including Asking Questions)
- 60%: Homework Assigenments (10%*6)
- 30%: Final Project (Proposal, Presentation, and Report)
You will complete six programming assignments over the course of the semester. All homework assignments will be in Python, and will use PyTorch on Google Colab.
Instead of a final exam, at the end of the semester you will complete a project working in groups of at most 3 students.

Staff

Yuan Yuan

Instructor

Lejun Liao

Teaching Assistant

Yunhan Liu

Teaching Assistant

Guest Speakers

Paul Liang

MIT Media Lab & EECS

Yifei Wang

MIT CSAIL

Hao He

MIT CSAIL

Zhutian Yang

MIT CSAIL

Tentative Schedule (subject to changes)

Theme	Date	Topic	Materials	Assignments
Module I: Deep Learning Basics
ML Basics	Mon, Aug. 26	Lecture 1: Course Introduction Course overview, Course logistics	[Slides] [Python Tutorial], [Colab] [DL Sec 1.2] , [DL Sec 6.6]
	Wed, Aug. 28	Lecture 2: Machine Learning Basics Machine learning overview ML: pipeline, tasks Linear regression, Polynomial regression	[Slides] [Linear Regression Python Tutorial] [DL Sec 5.1 to 5.3]	Assignment 1 out [Lab1a: Python Basic] [Lab1b: Linear Regression]
	Fri, Aug. 30	Lecture 3: Linear regression Optimization: gradient-based solution, closed-form solution Underfit, Overfit, Regularization, Generalization	[Slides]
	No Class (Labor Day)
	Wed, Sept. 4	Lecture 4: Neural Network Binary Classification / Multi-Class Classification Sigmoid / Softmax Cross-Entropy Loss	[Slides] [D2L Sec 4.1] [A short intro to Entropy, Cross-Entropy and KL Divergence] [231n Image Classification] [231n Linear Classification]	Assignment 1 due Assignment 2 out [Lab2a: Gradient Descent] [Lab2b: Pytorch Basics] [Lab2c: Linear Classifier]
	Fri, Sept. 6	Lecture 5: Multi-Layer Perceptron (MLP) Linear Problems / Non-Linear Problems Feature transforms Model: Fully-connected networks Computational Graph Optimization: Backpropagation	[Slides] [MLP web training] [231n Image Classification]
	Mon, Sept. 9	Lecture 6: Activation Functions and Optimization Activation Functions: ReLU, Sigmoid, tanh, Leaky ReLU, ELU Regularization Weight decay	[Slides] [MLP web training] [231n Image Classification]
Deep Learning Architectures	Wed, Sept. 11	Lecture 7: Convolutional Neural Networks (CNNs) Weight initialization, dropout, haperparameters Universal approximation theorem Intro to CNNs -- Convolution	[Slides] [DL Sec. 7.1], [D2L Sec. 6.3] [DL Sec. 9.1, 9.2], [D2L Sec. 7.1], [D2L Sec. 7.2]	Assignment 2 due
	Fri, Sept. 13	Lecture 8: Convolutional Neural Networks (CNNs) Convolution: kernel, receptive field, stride Padding Learning convolutional filters One layer (breadth): multiple kernels K layers (depth): nonlinearity in between	[Slides] [Image Kernels] [DL Sec. 9.3, 9.4], [D2L Sec. 7.2], [D2L Sec. 7.3], [D2L Sec. 7.4], [D2L Sec. 7.5]
	Mon, Sept. 16	Lecture 8: Convolutional Neural Networks (CNNs) -- Continued	[Slides] [Image Kernels] [DL Sec. 9.3, 9.4], [D2L Sec. 7.2], [D2L Sec. 7.3], [D2L Sec. 7.4], [D2L Sec. 7.5]
	Wed, Sept. 18	Lecture 9: CNNs Pooling AlexNet Batch Normalization ResNet + Residual Blocks	[Slides] [D2L Sec. 7.5], [D2L Sec. 7.6] [D2L Sec. 8.1], [D2L Sec. 8.2], [D2L Sec. 8.3], [D2L Sec. 8.4], [D2L Sec. 8.5], [D2L Sec. 8.6]	Assignment 3 out [Lab 3: Autograd and NN]
	Fri, Sept. 20	Lecture 10: CNN Architectures AlexNet, VGGNet, GoogLeNet, BatchNorm, ResNet Deep Learning Framework	[Slides] [CS231n CNN Architectures] [D2L Sec. 8.1], [D2L Sec. 8.2], [D2L Sec. 8.3], [D2L Sec. 8.4], [D2L Sec. 8.5], [D2L Sec. 8.6]
	Mon, Sept. 23	Lecture 11: Training Neural Networks Activation functions Data preprocessing Weight initialization Data augmentation Regularization (Dropout, etc) Learning rate schedules Hyperparameter optimization Transfer learning	[Slides] [CS231n Traning I] [Karpathy "Recipe for Training"]
	Wed, Sept. 25	Lecture 12: Deep Learning Framework PyTorch Dynamic vs Static graphs	[Slides] [Hacker’s guide to DL]	Assignment 3 Due
	Fri, Sept. 27	Lecture 12: PyTorch Review Session (continued) PyTorch Dynamic vs Static graphs	[Slides] [Hacker’s guide to DL]
	Mon, Sept. 30	Lecture 13: Final Project Overview Final Project Overview Life cycle of a Machine Learning System Sequential models use cases	[Slides]
	Wed, Oct. 2	Lecture 14: Recurrent Neural Networks (RNNs) Sequential models use cases CNNs for sequences RNNs	[Slides] [RNNs]
	Fri, Oct. 4	Lecture 15: Recurrent Networks: Stability analysis and LSTMs Gradient Explosion LSTM, GRU Language modeling	[Slides] [RNNs] [RNN Stability analysis and LSTMs]
	Mon, Oct. 7	Lecture 15: Recurrent Networks: Stability analysis and LSTMs (2)	[Slides]
Module II: Advanced Topics on Deep Learning
Vision Applications	Wed, Oct. 9	Lecture 16: Attention and Transformers Self-Attention Transformers	[Slides] [Attention is all you need] [BERT Paper] [The Illustrated Transformer] [Formal Algorithms for Transformers]	Assignment 4 Part 1 Out [Lab 4: Neural Machine Translation]
	Fri, Oct. 11	Lecture 16: Attention and Transformers (2) Multi-head Self-Attention Mask Self-Attention	[Slides]
	No Class (Fall Break)
	Wed, Oct. 16	Lecture 17: BERT and GPTs Encoder-Decoder Attention Word Embedding Pre-training	[Slides]
	Fri, Oct. 18	Lecture 17: BERT and GPTs (2)	[Slides]	Assignment 4 Part 1 Due
	Mon, Oct. 21	Lecture 18: Training Large Language Models Self-Supervised Learning Data Scaling	[Slides] [The Practical Guides for Large Language Models]	Assignment 4 Part 2 out (Transformers)
Generative and Interactive Visual Intelligence	Wed, Oct. 23	Lecture 19: Computer Vision: Detection and Segmentation Semantic segmentation Object detection Instance segmentation	[Slides]
	Fri, Oct. 25	Lecture 19: Computer Vision: Detection and Segmentation (2)	[Slides]
	Mon, Oct. 28	Lecture 20: Generative Models (1) Unsupervised Learning Clustering / PCA Autoregressive Models	[Slides]
	Wed, Oct. 30	Lecture 21: Generative Models (2) -- VAEs Convolutional AEs, Transpose Convolution Variational Autoencoders (VAE)	[Slides] [Reading: Convolutional AEs]
	Fri, Nov. 1	Lecture 24: Generative Models (2) -- VAEs (continued) VAE Loss - KL Divergence Reparameterization trick Conditional VAE	[Slides] [KL Divergence]	Assignment 4 Part 2 Due (11/2)
	Mon, Nov. 4	Lecture 22: Generative Models (3) -- GANs Generative Adversarial Networks (GANs) Training GANs and challenges Applications	[Slides]
	Wed, Nov. 6	Lecture 23: Generative Models (4) -- Diffusion Models Denoising Diffusion Probabilistic Models (DDPMs) Conditional Diffusion Models	[Slides]
	Fri, Nov. 8	Lecture 23: Generative Models (4) -- Diffusion Models (continued) Denoising Diffusion Probabilistic Models (DDPMs) Conditional Diffusion Models	[Slides]
	Mon, Nov. 11	Lecture 24: Self-supervised Learning Pretext tasks Contrastive representation learning Instance contrastive learning: SimCLR and MOCO Sequence contrastive learning: CPC	[Slides]
	Wed, Nov. 13	Lecture 24: Self-supervised learning (continued)	[Slides] [SimCLR] [MoCo] [MoCo v2] [CPC]
	Fri, Nov. 15	Lecture 25: Transfer Learning Finetuning Knowledge distillation Fundation Models: Text Prompting, Visual Prompting, Prompting for other modalities, Combining Foundation Models	[Slides] [Chain of Thought] [Prompting] [Visual Prompting] [LoRA]
Cutting-Edge Research	Mon, Nov. 18	Multimodal AI Guest Speaker: Paul Liang (MIT Media Lab & MIT EECS) Bio: Paul Liang is an Assistant Professor at the MIT Media Lab and MIT EECS. His research advances the foundations of multisensory artificial intelligence to enhance the human experience. He is a recipient of the Siebel Scholars Award, Waibel Presidential Fellowship, Facebook PhD Fellowship, Center for ML and Health Fellowship, Rising Stars in Data Science, and 3 best paper awards. Outside of research, he received the Alan J. Perlis Graduate Student Teaching Award for developing new courses on multimodal machine learning.	Abstract: Multimodal AI is a vibrant multi-disciplinary research field that aims to design AI with intelligent capabilities through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. Multimodality brings unique computational and theoretical challenges given the heterogeneity of data sources and the interconnections often found between modalities. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this lecture is designed to provide an overview of multimodal AI. Building upon a new survey paper (https://arxiv.org/abs/2209.03430), we will cover three topics: (1) what is multimodal: the principles in learning from heterogeneous, connected, and interacting data, (2) why is it hard: a taxonomy of six core technical challenges faced in multimodal ML but understudied in unimodal ML, and (3) what is next: major directions for future research as identified by our taxonomy.
	Wed, Nov. 20	Towards Test-time Self-supervised Learning (Slides) Guest Speaker: Yifei Wang (MIT CSAIL) Bio: Yifei Wang is a postdoc at MIT CSAIL, advised by Prof. Stefanie Jegelka. He earned his bachelor’s and Ph.D. degrees from Peking University. His research is focused on bridging the theory and practice of self-supervised learning to advance the scalability and safety of foundation models. His first-author works have been recognized by 3 best paper awards, including the sole Best ML Paper Award at ECML-PKDD 2021, the Silver Best Paper Award at the ICML 2021 AdvML Workshop, and the Best Paper Award at the ICML 2024 ICL Workshop. Academic page: https://yifeiwang77.com.	Abstract: Self-supervised learning (SSL) has been instrumental in unlocking the potential of massive unlabeled datasets, driving the development of foundation models across various domains. However, the benefits from pretraining are diminishing, signaling a plateau in performance gains. To introduce a new dimension for scaling SSL beyond the pretraining stage, we propose the paradigm of test-time self-supervised learning (TT-SSL), which leverages test-time computation to enhance pretrained models without requiring labeled data. We investigate two examples of TT-SSL: (1) unsupervised in-context adaptation, where models adjust to downstream tasks during test time based solely on input context, and (2) self-correction through self-reflection and iterative improvement, allowing models to refine their predictions in real-time without external feedback. This paradigm unlocks the potential of test-time computation for self-exploration and autonomous improvement of model behaviors, offering a promising new direction for advancing the scalability and capabilities of foundation models.
	Fri, Nov. 22	Respiratory Intelligence: What Can AI Learn About Your Health from Your Breathing Guest Speaker: Hao He (MIT CSAIL) Bio: Hao is a final-year PhD student at MIT, where he is supervised by Prof. Dina Katabi. His research focuses on leveraging machine learning for healthcare applications, with a particular emphasis on sleep science. His contributions have been recognized through publications in top AI conferences and high-impact medical journals. Hao is the recipient of the Takeda Fellowship, awarded to outstanding researchers in AI and health, and Barbara J. Weedon Fellowship, given to researchers making advancements in neurodegenerative diseases.	Abstract: Respiration is one of the most fundamental functions of the human body, closely tied to a person’s overall health. In this talk, I will explore how advancements in AI technology allow us to extract valuable health insights from nocturnal breathing patterns. I will address various health aspects, including sleep quality, physiological conditions such as oxygen desaturation and inflammation, and even neurodegenerative diseases like Alzheimer’s.
	Mon, Nov. 25	Generalizable Algorithms for Long-Horizon Manipulation in Complex Environments by Integrating Deep Learning and Planning-Based Approaches Guest Speaker: Zhutian Yang (MIT CSAIL) Bio: Zhutian Yang is a PhD candidate at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), advised by Leslie Kaelbling and Tomás Lozano-Pérez. Her research focuses on developing algorithms for long-horizon manipulation by combining deep learning with model-based planning techniques. Her work has been published in top robotics and learning conferences such as RSS, CoRL, and ICLR. She has gained valuable experience through internships at NVIDIA’s Seattle Robotics Lab and Toyota Research Institute’s Large Behaviors Team. Academic page: https://zt-yang.com.	Abstract: To enable robots to perform long-horizon manipulation tasks in diverse, complex environments—such as organizing shelf spaces or making chicken soup in various office or home settings, it is beneficial to leverage the strengths of both deep learning and model-based methods. Learning-based methods offer rapid inference, local reactivity, and large-scale knowledge from the internet, but they struggle to generate long-horizon trajectories in visually diverse and geometrically complex environments. On the other hand, Task and motion planning ensures geometric feasibility, but its computational demands become impractical as the state space and task horizon grow. Additionally, encoding domain-specific knowledge and object dynamics is often cumbersome. Neither approach alone can fully address the complexity of real-world robotic tasks in a generalizable way. To overcome these challenges, we can strategically determine which components to learn from data and which to delegate to domain-agnostic planners. This talk will explore three recent projects performed in this fashion. They address tasks with intricate temporal and geometric dependencies, such as making a chicken soup, packing a box full of objects, and rearranging office chairs in a cluttered conference room. Reference Papers: [Sequence-Based Plan Feasibility Prediction for Efficient Task and Motion Planning], [Compositional Diffusion-Based Continuous Constraint Solvers], [Guiding Long-Horizon Task and Motion Planning with Vision Language Models], [Combining Planning and Diffusion for Mobility with Unknown Dynamics].
No Class (Thanksgiving Break)
No Class (Thanksgiving Break)
Final Projects	Mon, Dec. 2	Final Project Presentation (1)
	Wed, Dec. 4	Final Project Presentation (2)
	Fri, Dec. 6	Final Project Presentation (3)
	TBA			Final project report/code due

Office Hours

Name	Office hours
Yuan	Mon/Tue 3-4PM @ 245 Beacon Rm. 528E
Lejun	M 5-6, 5-7PM if needed, W 5-6PM @ CS Lab
Yunhan	TW TH 6-7PM @ CS Lab

Office hours will take place in person (or Zoom if needed).

Course Information

This is a challenging course and we are here to help you become a more-AI version of yourself. Please feel free to reach out if you need help in any form.

1. Get help (besides office hours)

Dropbox: The lecture pdfs will be uploaded to Dropbox (follow the link) and you can ask questions there by making comments on the slides directly.
Discord: For labs/psets/final projects, we will create dedicated channels for you to ask public questions. If you cannot make your post public (e.g., due to revealing problem set solutions), please directly DM TAs or the instructor separately, or come to office hours. Please note, however, that the course staff cannot provide help debugging code, and there is no guarantee that they'll be able to answer last-minute homework questions before the deadline. We also appreciate it when you respond to questions from other students! If you have an important question that you would prefer to discuss over email, you may email the course staff, or you can contact the instructor by email directly.
Support: The university counseling services center provides a variety of programs and activities.
Accommodations for students with disabilities: If you are a student with a documented disability seeking reasonable accommodations in this course, please contact Kathy Duggan, (617) 552-8093, dugganka@bc.edu, at the Connors Family Learning Center regarding learning disabilities and ADHD, or Rory Stein, (617) 552-3470, steinr@bc.edu, in the Disability Services Office regarding all other types of disabilities, including temporary disabilities. Advance notice and appropriate documentation are required for accommodations.

2. Homework submission

All programming assignments are in Python on Colab, always due at midnight (11:59 pm) on the due date.

Install Colab on the browser: Sign in to your Google account, follow the "Link" (to be updated) to the folder of assignments, click on lab0.ipynb, click on "Open with" and "Connect more apps", install "Colaboratory".
Submission: You need save a copy of the file in your own Google drive, so that you can save your edits. Afterwards, you can download the ipynb file and submit it to Canvas.
Final project: In lieu of a final exam, we'll have a final project. This project will be completed in small groups during the last weeks of the class. The direction for this project is open-ended: you can either choose from a list of project ideas that we distribute, or you can propose a topic of your own. A short project proposal will be due approximately halfway through the course. During the final exam period, you'll turn in a final report and give a short presentation. You may use an ongoing research work for your final project, as long it meets the requirements.

3. Academic policy

Late days: You'll have 1 late day for every lab and pset respectively over the course of the semester. Each time you use one, you may submit a homework assignment one day late without penalty. You do not need to notify us when you use a late day; we'll deduct it automatically. If you run out of late days and still submit late, your assignment will be penalized at a rate of 2% per day. If you edit your assignment after the deadline, this will count as a late submission, and we'll use the revision time as the date of submission (after a short grace period of a few minutes). We will not provide additional late time, except under exceptional circumstances, and for these we'll require documentation (e.g., a doctor's note). Please note that the late days are provided to help you deal with minor setbacks, such as routine illness or injury, paper deadlines, interviews, and computer problems; these do not generally qualify for an additional extension.
Academic integrity: While you are encouraged to discuss homework assignments with other students, your programming work must be completed individually. You may not search for solutions online, or to use existing implementations of the algorithms in the assignments. Thus it is acceptable to learn from another student the general idea for writing program code to perform a particular task, or the technique for solving a mathematical problem, but unacceptable for two students to prepare their assignments together and submit what are essentially two copies of identical work. If you have any uncertainty about the application of this policy, please check with me. Failure to comply with these guidelines will be considered a violation of the University policies on academic integrity. Please make sure that you are familiar with these policies. We will use moss.pl tool to check each lab and pset for plagriasm detection.
AI assistants policy:
- Our policy for using ChatGPT and other AI assistants is identical to our policy for using human assistants.
- This is a deep learning class and you should try out all the latest AI assistants (they are pretty much all using deep learning). It's very important to play with them to learn what they can do and what they can't do. That's a part of the content of this course.
- Just like you can come to office hours and ask a human questions (about the lecture material, clarifications about pset questions, tips for getting started, etc), you are very welcome to do the same with AI assistants.
- But: just like you are not allowed to ask an expert friend to do your homework for you, you also should not ask an expert AI.
- If it is ever unclear, just imagine the AI as a human and apply the same norm as you would with a human.

4. Related Classes / Online Resources

Acknowledgements: This course draws heavily from MIT's 6.869: Advances in Computer Vision by Antonio Torralba, William Freeman, and Phillip Isola, and from Stanford's CS231n: Deep Learning for Computer Vision by Fei-Fei Li. It also includes lecture slides from other researchers, including Andrew Owens , Svetlana Lazebnik, Alexei Efros, Fei-fei Li, Carl Vondrick, David Fouhey, Justin Johnson, and Noah Snavely, David Fouhey and Ava Amini. Special thanks to Hao Wang for the insightful and generous advice.

CSCI 3370: Deep Learning

Instructor: Yuan Yuan Fall 2024 (MWF 9:00-9:50 AM) 245 Beacon Street Room 102

Overview

Staff

Guest Speakers

Tentative Schedule (subject to changes)

Module I: Deep Learning Basics

Module II: Advanced Topics on Deep Learning

Office Hours

Course Information