Overview

Over the past few years, Deep Learning has become ubiquitous in our society, with applications spanning search, image understanding, apps, mapping, medicine, drones, self-driving cars, robotics, and art. At the core of many of these applications are visual recognition tasks, such as image classification and object detection. Recent developments in neural network approaches have significantly enhanced the performance of these state-of-the-art visual recognition systems. In the realm of learning algorithms, beyond supervised learning, self-supervised learning has gained widespread use in recent years, particularly in vision and language modeling. This approach enables the extraction of labels for free from unlabeled data, allowing for the training of an unsupervised dataset in a supervised manner. During this course, students will gain foundational knowledge of deep learning algorithms and neural network architectures, as well as practical experience in building, training, and fine-tuning neural networks. They will also gain an understanding of cutting-edge research topics in areas such as vision, language, medicine, generative AI, robotics, and more.

  • Prerequisites:
    • Programming: You should be familiar with algorithms and data structures. Familiarity with python or similar frameworks for numeric programming will be helpful but is not strictly required. Python (Basics).
    • Probability: You should have been exposed to probability distributions, random variables, expectations, etc. Linear Algebra (Essence, Chap 1-4), Multivariate Calculus (Essence, Chap 1, 3-4, 8-9).
    • Machine Learning: Some familiarity with machine learning will be helpful but not required; we will review important concepts that are needed for this course.
  • Lecture:
    Lectures will be Monday, Wednesday, and Friday at Fulton Hall 415, from 9:00am to 9:50am.
  • Textbooks and Materials:
    There is no required textbook for the course. However, the following books (available for free online) can be useful as references on relevant topics: You may also find this tutorial Deep Learning with PyTorch: A 60 Minute Blitz helpful.
  • Grading Policy:
    • No quizzes/exams.
    • 10%: Attendance (Including Asking Questions)
    • 60%: Homework Assigenments (10%*6)
    • 30%: Final Project (Proposal, Presentation, and Report)
    You will complete six programming assignments over the course of the semester. All homework assignments will be in Python, and will use PyTorch on Google Colab.
    Instead of a final exam, at the end of the semester you will complete a project working in groups of at most 3 students.

Staff

Instructor
Teaching Assistant

Guest Speakers

Microsoft Research
MIT CSAIL
MIT CSAIL
Stanford / HKUST
University of Oxford
Nvidia Research
MIT CSAIL
MIT CSAIL

Tentative Schedule (subject to changes)

Theme Date Topic Materials Assignments

Module I: Deep Learning Basics

ML Basics
Wed, Jan. 17 Lecture 1: Course Introduction
Course overview,
Course logistics
Fri, Jan. 19 Lecture 2: Machine Learning Basics
Machine learning overview
ML: pipeline, tasks
Linear regression, Polynomial regression
Mon, Jan. 22 Lecture 3: Linear regression
Optimization: gradient-based solution, closed-form solution
Underfit, Overfit, Regularization, Generalization
Assignment 1 out
  • [Lab1a: Python Basic]
  • [Lab1b: Linear Regression]
  • Wed, Jan. 24 Lecture 4: Neural Network
    Binary Classification / Multi-Class Classification
    Sigmoid / Softmax
    Cross-Entropy Loss
    Fri, Jan. 26 Lecture 5: Multi-Layer Perceptron (MLP)
    Linear Problems / Non-Linear Problems
    Feature transforms Model: Fully-connected networks
    Computational Graph
    Optimization: Backpropagation
    Mon, Jan. 29 Lecture 6: Activation Functions and Optimization
    Activation Functions: ReLU, Sigmoid, tanh, Leaky ReLU, ELU
    Regularization
    Weight decay
    Assignment 1 due (Jan. 30)
    Deep Learning Architectures
    Wed, Jan. 31 Lecture 7: Convolutional Neural Networks (CNNs)
    Weight initialization, dropout, haperparameters
    Universal approximation theorem
    Intro to CNNs -- Convolution
    Fri, Feb. 2 Lecture 8: Convolutional Neural Networks (CNNs)
    Convolution: kernel, receptive field, stride
    Padding
    Learning convolutional filters
    One layer (breadth): multiple kernels
    K layers (depth): nonlinearity in between
    Mon, Feb. 5 Lecture 9: Convolutional Neural Networks (CNNs)
    Pooling
    AlexNet
    Batch Normalization
    ResNet + Residual Blocks
    Wed, Feb. 7 Lecture 10: CNN Architectures
    AlexNet, VGGNet, GoogLeNet, BatchNorm, ResNet
    Deep Learning Framework
    Fri, Feb. 9 Lecture 11: Training Neural Networks
    Activation functions
    Data preprocessing
    Weight initialization
    Data augmentation
    Regularization (Dropout, etc)
    Learning rate schedules
    Hyperparameter optimization
    Transfer learning
    Assignment 2 out
  • [Lab2a: Gradient Descent]
  • [Lab2b: PyTorch]
  • [Lab2c: Linear Classifier]
  • Mon, Feb. 12 Lecture 12: Deep Learning Framework
    Hyperparameter optimization
    Transfer learning
    PyTorch
    Dynamic vs Static graphs
    Wed, Feb. 14 Lecture 13: PyTorch Review Session
    PyTorch
    Final project overview
    Life cycle of a Machine Learning System
    Fri, Feb. 16 Lecture 14: Recurrent Neural Networks (RNNs)
    Life cycle of a Machine Learning System
    Sequential models use cases
    CNNs for sequences
    RNNs
    Assignment 2 due (Feb. 18)
    Mon, Feb. 19 Lecture 15: Recurrent Networks: Stability analysis and LSTMs
    Gradient Explosion
    LSTM, GRU
    Language modeling
    Assignment 3 out:
  • [Lab3: Autograd and NN]
  • Wed, Feb. 21 Lecture 16: Recurrent Networks: Stability analysis and LSTMs (2)
    Fri, Feb. 23 Lecture 17: Attention and Transformers
    Self-Attention
    Transformers
    Mon, Feb. 26 Lecture 18: Attention and Transformers (2)
    Multi-head Self-Attention
    Mask Self-Attention
    Assignment 3 due (Feb. 27)

    Module II: Advanced Topics on Deep Learning

    Vision Applications
    Wed, Feb. 28 Lecture 19: BERT and GPTs
    Encoder-Decoder Attention
    Word Embedding
    Pre-training
  • [Slides]
  • Fri, Mar. 1 Lecture 20: Training Large Language Models
    Self-Supervised Learning
    Data Scaling
  • [Slides]
  • Mon, Mar. 11 Lecture 20: Training Large Language Models (2)
    Self-Supervised Learning
    Data Scaling
  • [Slides]
  • [The Practical Guides for Large Language Models]
  • Assignment 4 out:
  • [Lab4: Neural Machine Translation]

  • Project proposal due (Mar. 12)
    Wed, Mar. 13 Lecture 21: Computer Vision: Detection and Segmentation
    Semantic segmentation
    Object detection
    Instance segmentation
  • [Slides]
  • Fri, Mar. 15 Lecture 22: Generative Models (1)
    Unsupervised Learning
    Clustering / PCA
    Autoregressive Models
  • [Slides]
  • Generative and Interactive Visual Intelligence
    Mon, Mar. 18 Lecture 23: Generative Models (2) -- VAEs
    Convolutional AEs, Transpose Convolution
    Variational Autoencoders (VAE)
  • [Slides]
  • [Reading: Convolutional AEs]
  • Assignment 4 Part 1 (LSTM and Attention) due (Mar. 19)
    Wed, Mar. 20 Lecture 24: Generative Models (2) -- VAEs (continued)
    VAE Loss - KL Divergence
    Reparameterization trick
    Conditional VAE
  • [Slides]
  • [KL Divergence]
  • Fri, Mar. 22 Lecture 25: Generative Models (3) -- GANs
    Generative Adversarial Networks (GANs)
    Training GANs and challenges
    Applications
  • [Slides]
  • Mon, Mar. 25 Lecture 26: Generative Models (4) -- Diffusion Models
    Denoising Diffusion Probabilistic Models (DDPMs)
    Conditional Diffusion Models
  • [Slides]
  • Assignment 4 Part 2 (Transformers) due (Mar. 28)
    Wed, Mar. 27 Lecture 26: Generative Models (4) -- Diffusion Models (continued)
    Denoising Diffusion Probabilistic Models (DDPMs)
    Conditional Diffusion Models
  • [Slides]
  • Project milestone due (Mar. 31)
    No Class (Good Friday)
    No Class (Easter Monday)
    Wed, Apr. 3 Lecture 27: Self-supervised Learning
    Pretext tasks
    Contrastive representation learning
    Instance contrastive learning: SimCLR and MOCO
    Sequence contrastive learning: CPC

  • [Slides]
  • Fri, Apr. 5 Lecture 27: Self-supervised learning (continued)
  • [Slides]
  • [SimCLR]
  • [MoCo]
  • [MoCo v2]
  • [CPC]
  • Mon, Apr. 8
    LLaVA: A Vision-and-Language Approach to Computer Vision in the Wild
    Guest Speaker: Chunyuan Li (Microsoft Research)

    Abstract: The future of AI is in creating systems like foundation models that are pre-trained once, and will handle countless many downstream tasks directly (zero-shot), or adapt to new tasks quickly (few-shot). In this talk, I will discuss our vision-language approach to achieving “Computer Vision in the Wild (CVinW)”: building such a transferable system in computer vision (CV) that can effortlessly generalize to a wide range of visual recognition tasks in the wild. I will first describe the definition and current status of CVinW, and briefly summarize our efforts on benchmark and modeling. I will dive into Large Language-and-Vision Assistant (LLaVA) and its series, including LLaVA-Med, LLaVA-1.5, LLaVA-NeXT, LLaVA-Interactive, LLaVA-Plus. LLaVA family represents the first open-source project to exhibit the GPT-4V level capabilities in image understanding and reasoning. demonstrate a promising path to build customizable large multimodal models that follow humans' intent with an affordable cost.

    Reference Papers: [LLaVA], [LLaVA-Med], [LLaVA-1.5], [LLaVA-NeXT], [LLaVA-Interactive], [LLaVA-Plus].

    Wed, Apr. 10
    Learning to and from Predict in Computer Vision
    Guest Speaker: Tianhong Li (MIT CSAIL)

    Abstract: Predictive learning has been a long-standing topic in computer vision and has gain increased attention recently due to the success of large language models. This lecture will introduce several pivotal studies within this domain. We begin with image inpainting — a technique vital for understanding context and filling missing information. We will then see how predictive learning could facilitate unsupervised representation learning. Finally, we will introduce how can we use predictive learning to generate novel images.


    Reference Papers: [VQGAN], [MAE], [MAGE].

    Fri, Apr. 12 Foundation Priors for Robot Perception: From Neural Radiance Fields to OpenAI Sora
    Guest Speaker: Ge Yang (MIT CSAIL)

    Abstract: Recent developments in Artificial Intelligence have produced a trifecta of new techniques in generative modeling, computer graphics, and representation learning that once combined, will lead to radical changes in robotics. In this talk, we will study robot perception as an ill-defined inverse problem whose goal is to infer knowledge of the environment from noise and partial observability. We will start with Neural Radiance Fields (NeRFs) and study ways to combine them with prior knowledge from Foundation Models that are trained over internet-scale datasets to give robots the ability to know what is where in their surrounding environment. We will then look at the AI debate over priors vs data, and discuss how it is affected by recent results from OpenAI sora, the state-of-the-art AI system for generating videos from text.


    Reference Papers: [CLIP], [NeRF], [Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation (CoRL 2023 Best Paper)].

    No Class (Patriot's Day)
    AI for Science
    Tue, Apr. 16 Towards Efficient and High-Quality 3D Generation
    Guest Speaker: Zifan Shi (Stanford / HKUST)

    Abstract: 3D generation has received growing attention due to its potential in modeling the 3D visual world. Despite remarkable advancements, there remains a significant journey ahead. In this talk, we will explore three key aspects of 3D generation. Firstly, we will focus on geometry quality, delving into the design of the discriminator. This crucial component has often been overlooked in many existing 3D generative approaches. Secondly, we will examine the realm of animatable human generation, probing into techniques and challenges associated with this dynamic aspect of 3D modeling. Lastly, we will discuss strategies for constructing a foundational model tailored for 3D generation, aiming to provide a robust framework for further advancements in the field.


    Reference Papers:
    [Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator],
    [Learning 3D-aware Image Synthesis with Unknown Pose Distribution],
    [Gaussian Shell Maps for Efficient 3D Human Generation],
    [GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation],
    [3D Gaussian Splatting for Real-Time Radiance Field Rendering (Siggraph 2023 Best Paper)].

    Wed, Apr. 17 CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society
    Guest Speaker: Guohao Li (University of Oxford)

    Abstract: The rapid advancement of chat-based language models has led to remarkable progress in complex task-solving. However, their success heavily relies on human input to guide the conversation, which can be challenging and time-consuming. This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents, and provides insight into their "cognitive" processes. To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named role-playing. Our approach involves using inception prompting to guide chat agents toward task completion while maintaining consistency with human intentions. We showcase how role-playing can be used to generate conversational data for studying the behaviors and capabilities of a society of agents, providing a valuable resource for investigating conversational language models. In particular, we conduct comprehensive studies on instruction-following cooperation in multi-agent settings.

    Reference Papers:
    [CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society],
    [https://www.camel-ai.org/]

    Fri, Apr. 19 Segment Anything
    Guest Speaker: Hanzi Mao (Nvidia Deep Imagination Research)
    Time: 1:00 PM - 2:00 PM (Eastern Time), 10:00 AM - 11:00 AM (Pacific Time)

    Abstract: We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive – often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at https://segment-anything.com to foster research into foundation models for computer vision.


    Reference Papers: [Segment Anything (ICCV 2023 Best Honorable Mention Paper)]

    Mon, Apr. 22 Respiration Intelligence: Know Your Health from Your Breathing with an AI Assistant
    Guest Speaker: Hao He (MIT CSAIL)

    Abstract: Respiration is a fundamental life-sustaining function intricately connected to various aspects of human health. With the aid of AI, we can uncover associations between respiration and numerous health conditions. In this lecture, we'll introduce three case studies demonstrating the use of breathing signals to predict blood oxygen saturation, sleep stages, and inflammation. We'll discuss the accuracy and practical applications of these predictive systems, as well as the core AI technologies that power them.


    Wed, Apr. 24 Learning from Synthetic data from LLMs and Diffusion Models
    Guest Speaker: Lijie Fan (MIT CSAIL)
    Final Projects
    [Week 15-18]
    Fri, Apr. 26 Final Project Presentation (1)
    Mon, Apr. 29 Final Project Presentation (2)
    Wed, May. 1 Final Project Presentation (3)
    Mon, May. 6 Final project report/code due


    Office Hours

    Name Office hours
    Yuan Mon/Tue 3-4pm @ 245 Beacon Rm. 528E
    Gavin M W F 10-11am @ CS Lab
    • Office hours will take place in person (or Zoom if needed).
    • Yuan will hold additional one-on-one AMA office hours Tue 4-5pm/Wed 3-4pm (15-min by appointment)


    Course Information

    This is a challenging course and we are here to help you become a more-AI version of yourself. Please feel free to reach out if you need help in any form.

    1. Get help (besides office hours)

    • Dropbox: The lecture pdfs will be uploaded to Dropbox (follow the link) and you can ask questions there by making comments on the slides directly.
    • Discord: For labs/psets/final projects, we will create dedicated channels for you to ask public questions. If you cannot make your post public (e.g., due to revealing problem set solutions), please directly DM TAs or the instructor separately, or come to office hours. Please note, however, that the course staff cannot provide help debugging code, and there is no guarantee that they'll be able to answer last-minute homework questions before the deadline. We also appreciate it when you respond to questions from other students! If you have an important question that you would prefer to discuss over email, you may email the course staff, or you can contact the instructor by email directly.
    • Support: The university counseling services center provides a variety of programs and activities.
    • Accommodations for students with disabilities: If you are a student with a documented disability seeking reasonable accommodations in this course, please contact Kathy Duggan, (617) 552-8093, dugganka@bc.edu, at the Connors Family Learning Center regarding learning disabilities and ADHD, or Rory Stein, (617) 552-3470, steinr@bc.edu, in the Disability Services Office regarding all other types of disabilities, including temporary disabilities. Advance notice and appropriate documentation are required for accommodations.

    2. Homework submission

    All programming assignments are in Python on Colab, always due at midnight (11:59 pm) on the due date.
    • Install Colab on the browser: Sign in to your Google account, follow the "Link" (to be updated) to the folder of assignments, click on lab0.ipynb, click on "Open with" and "Connect more apps", install "Colaboratory".
    • Submission: You need save a copy of the file in your own Google drive, so that you can save your edits. Afterwards, you can download the ipynb file and submit it to Canvas.
    • Final project: In lieu of a final exam, we'll have a final project. This project will be completed in small groups during the last weeks of the class. The direction for this project is open-ended: you can either choose from a list of project ideas that we distribute, or you can propose a topic of your own. A short project proposal will be due approximately halfway through the course. During the final exam period, you'll turn in a final report and give a short presentation. You may use an ongoing research work for your final project, as long it meets the requirements.

    3. Academic policy

    • Late days: You'll have 4 late days each for labs and psets respectively over the course of the semester. Each time you use one, you may submit a homework assignment one day late without penalty. You are allowed to use multiple late days on a single assignment. For example, you can use all of your days at once to turn in one assignment a week late. You do not need to notify us when you use a late day; we'll deduct it automatically. If you run out of late days and still submit late, your assignment will be penalized at a rate of 10% per day. If you edit your assignment after the deadline, this will count as a late submission, and we'll use the revision time as the date of submission (after a short grace period of a few minutes). We will not provide additional late time, except under exceptional circumstances, and for these we'll require documentation (e.g., a doctor's note). Please note that the late days are provided to help you deal with minor setbacks, such as routine illness or injury, paper deadlines, interviews, and computer problems; these do not generally qualify for an additional extension.
    • Academic integrity: While you are encouraged to discuss homework assignments with other students, your programming work must be completed individually. You may not search for solutions online, or to use existing implementations of the algorithms in the assignments. Thus it is acceptable to learn from another student the general idea for writing program code to perform a particular task, or the technique for solving a mathematical problem, but unacceptable for two students to prepare their assignments together and submit what are essentially two copies of identical work. If you have any uncertainty about the application of this policy, please check with me. Failure to comply with these guidelines will be considered a violation of the University policies on academic integrity. Please make sure that you are familiar with these policies. We will use moss.pl tool to check each lab and pset for plagriasm detection.
    • AI assistants policy:
      • Our policy for using ChatGPT and other AI assistants is identical to our policy for using human assistants.
      • This is a deep learning class and you should try out all the latest AI assistants (they are pretty much all using deep learning). It's very important to play with them to learn what they can do and what they can't do. That's a part of the content of this course.
      • Just like you can come to office hours and ask a human questions (about the lecture material, clarifications about pset questions, tips for getting started, etc), you are very welcome to do the same with AI assistants.
      • But: just like you are not allowed to ask an expert friend to do your homework for you, you also should not ask an expert AI.
      • If it is ever unclear, just imagine the AI as a human and apply the same norm as you would with a human.

    4. Related Classes / Online Resources
    Acknowledgements: This course draws heavily from MIT's 6.869: Advances in Computer Vision by Antonio Torralba, William Freeman, and Phillip Isola, and from Stanford's CS231n: Deep Learning for Computer Vision by Fei-Fei Li. It also includes lecture slides from other researchers, including Andrew Owens , Svetlana Lazebnik, Alexei Efros, Fei-fei Li, Carl Vondrick, David Fouhey, Justin Johnson, and Noah Snavely, David Fouhey and Ava Amini. Special thanks to Hao Wang for the insightful and generous advice.