Module I: Deep Learning Basics |
ML Basics
|
Wed, Jan. 17 |
Lecture 1: Course Introduction
Course overview,
Course logistics
|
|
|
Fri, Jan. 19 |
Lecture 2: Machine Learning Basics
Machine learning overview
ML: pipeline, tasks
Linear regression, Polynomial regression
|
|
|
Mon, Jan. 22 |
Lecture 3: Linear regression
Optimization: gradient-based solution, closed-form solution
Underfit, Overfit, Regularization, Generalization
|
|
Assignment 1 out
[Lab1a: Python Basic]
[Lab1b: Linear Regression]
|
Wed, Jan. 24 |
Lecture 4: Neural Network
Binary Classification / Multi-Class Classification
Sigmoid / Softmax
Cross-Entropy Loss
|
|
|
Fri, Jan. 26 |
Lecture 5: Multi-Layer Perceptron (MLP)
Linear Problems / Non-Linear Problems
Feature transforms
Model: Fully-connected networks
Computational Graph
Optimization: Backpropagation
|
|
|
Mon, Jan. 29 |
Lecture 6: Activation Functions and Optimization
Activation Functions: ReLU, Sigmoid, tanh, Leaky ReLU, ELU
Regularization
Weight decay
|
|
Assignment 1 due (Jan. 30)
|
Deep Learning Architectures
|
Wed, Jan. 31 |
Lecture 7: Convolutional Neural Networks (CNNs)
Weight initialization, dropout, haperparameters
Universal approximation theorem
Intro to CNNs -- Convolution
|
|
|
Fri, Feb. 2 |
Lecture 8: Convolutional Neural Networks (CNNs)
Convolution: kernel, receptive field, stride
Padding
Learning convolutional filters
One layer (breadth): multiple kernels
K layers (depth): nonlinearity in between
|
|
|
Mon, Feb. 5 |
Lecture 9: Convolutional Neural Networks (CNNs)
Pooling
AlexNet
Batch Normalization
ResNet + Residual Blocks
|
|
|
Wed, Feb. 7 |
Lecture 10: CNN Architectures
AlexNet, VGGNet, GoogLeNet, BatchNorm, ResNet
Deep Learning Framework
|
|
|
Fri, Feb. 9 |
Lecture 11: Training Neural Networks
Activation functions
Data preprocessing
Weight initialization
Data augmentation
Regularization (Dropout, etc)
Learning rate schedules
Hyperparameter optimization
Transfer learning
|
|
Assignment 2 out
[Lab2a: Gradient Descent]
[Lab2b: PyTorch]
[Lab2c: Linear Classifier]
|
Mon, Feb. 12 |
Lecture 12: Deep Learning Framework
Hyperparameter optimization
Transfer learning
PyTorch
Dynamic vs Static graphs
|
|
|
Wed, Feb. 14 |
Lecture 13: PyTorch Review Session
PyTorch
Final project overview
Life cycle of a Machine Learning System
|
|
|
Fri, Feb. 16 |
Lecture 14: Recurrent Neural Networks (RNNs)
Life cycle of a Machine Learning System
Sequential models use cases
CNNs for sequences
RNNs
|
|
Assignment 2 due (Feb. 18)
|
Mon, Feb. 19 |
Lecture 15: Recurrent Networks: Stability analysis and LSTMs
Gradient Explosion
LSTM, GRU
Language modeling
|
|
Assignment 3 out:
[Lab3: Autograd and NN]
|
Wed, Feb. 21 |
Lecture 16: Recurrent Networks: Stability analysis and LSTMs (2)
|
|
|
Fri, Feb. 23 |
Lecture 17: Attention and Transformers
Self-Attention
Transformers
|
|
|
Mon, Feb. 26 |
Lecture 18: Attention and Transformers (2)
Multi-head Self-Attention
Mask Self-Attention
|
|
Assignment 3 due (Feb. 27)
|
Module II: Advanced Topics on Deep Learning |
Vision Applications
|
Wed, Feb. 28 |
Lecture 19: BERT and GPTs
Encoder-Decoder Attention
Word Embedding
Pre-training
|
[Slides]
|
|
Fri, Mar. 1 |
Lecture 20: Training Large Language Models
Self-Supervised Learning
Data Scaling
|
[Slides]
|
|
Mon, Mar. 11 |
Lecture 20: Training Large Language Models (2)
Self-Supervised Learning
Data Scaling
|
[Slides]
[The Practical Guides for Large Language Models]
|
Assignment 4 out:
[Lab4: Neural Machine Translation]
Project proposal due (Mar. 12)
|
Wed, Mar. 13 |
Lecture 21: Computer Vision: Detection and Segmentation
Semantic segmentation
Object detection
Instance segmentation
|
[Slides]
|
|
Fri, Mar. 15 |
Lecture 22: Generative Models (1)
Unsupervised Learning
Clustering / PCA
Autoregressive Models
|
[Slides]
|
|
Generative and Interactive Visual Intelligence
|
Mon, Mar. 18 |
Lecture 23: Generative Models (2) -- VAEs
Convolutional AEs, Transpose Convolution
Variational Autoencoders (VAE)
|
[Slides]
[Reading: Convolutional AEs]
|
Assignment 4 Part 1 (LSTM and Attention) due (Mar. 19)
|
Wed, Mar. 20 |
Lecture 24: Generative Models (2) -- VAEs (continued)
VAE Loss - KL Divergence
Reparameterization trick
Conditional VAE
|
[Slides]
[KL Divergence]
|
|
Fri, Mar. 22 |
Lecture 25: Generative Models (3) -- GANs
Generative Adversarial Networks (GANs)
Training GANs and challenges
Applications
|
[Slides]
|
|
Mon, Mar. 25 |
Lecture 26: Generative Models (4) -- Diffusion Models
Denoising Diffusion Probabilistic Models (DDPMs)
Conditional Diffusion Models
|
[Slides]
|
Assignment 4 Part 2 (Transformers) due (Mar. 28)
|
Wed, Mar. 27 |
Lecture 26: Generative Models (4) -- Diffusion Models (continued)
Denoising Diffusion Probabilistic Models (DDPMs)
Conditional Diffusion Models
|
[Slides]
|
Project milestone due (Mar. 31)
|
No Class (Good Friday) |
No Class (Easter Monday) |
Wed, Apr. 3 |
Lecture 27: Self-supervised Learning
Pretext tasks
Contrastive representation learning
Instance contrastive learning: SimCLR and MOCO
Sequence contrastive learning: CPC
|
[Slides]
|
|
Fri, Apr. 5 |
Lecture 27: Self-supervised learning (continued)
|
[Slides]
[SimCLR]
[MoCo]
[MoCo v2]
[CPC]
|
|
Mon, Apr. 8
|
LLaVA: A Vision-and-Language Approach to Computer Vision in the Wild
|
Abstract: The future of AI is in creating systems like foundation models that are pre-trained once, and will handle countless many downstream tasks directly (zero-shot), or adapt to new tasks quickly (few-shot). In this talk, I will discuss our vision-language approach to achieving “Computer Vision in the Wild (CVinW)”: building such a transferable system in computer vision (CV) that can effortlessly generalize to a wide range of visual recognition tasks in the wild. I will first describe the definition and current status of CVinW, and briefly summarize our efforts on benchmark and modeling. I will dive into Large Language-and-Vision Assistant (LLaVA) and its series, including LLaVA-Med, LLaVA-1.5, LLaVA-NeXT, LLaVA-Interactive, LLaVA-Plus. LLaVA family represents the first open-source project to exhibit the GPT-4V level capabilities in image understanding and reasoning. demonstrate a promising path to build customizable large multimodal models that follow humans' intent with an affordable cost.
Reference Papers: [LLaVA], [LLaVA-Med], [LLaVA-1.5], [LLaVA-NeXT], [LLaVA-Interactive], [LLaVA-Plus].
|
|
Wed, Apr. 10
|
Learning to and from Predict in Computer Vision
|
Abstract: Predictive learning has been a long-standing topic in computer vision and has gain increased attention recently due to the success of large language models. This lecture will introduce several pivotal studies within this domain. We begin with image inpainting — a technique vital for understanding context and filling missing information. We will then see how predictive learning could facilitate unsupervised representation learning. Finally, we will introduce how can we use predictive learning to generate novel images.
Reference Papers: [VQGAN], [MAE], [MAGE].
|
|
Fri, Apr. 12 |
Foundation Priors for Robot Perception: From Neural Radiance Fields to OpenAI Sora
|
Abstract: Recent developments in Artificial Intelligence have produced a trifecta of new techniques in generative modeling, computer graphics, and representation learning that once combined, will lead to radical changes in robotics. In this talk, we will study robot perception as an ill-defined inverse problem whose goal is to infer knowledge of the environment from noise and partial observability. We will start with Neural Radiance Fields (NeRFs) and study ways to combine them with prior knowledge from Foundation Models that are trained over internet-scale datasets to give robots the ability to know what is where in their surrounding environment. We will then look at the AI debate over priors vs data, and discuss how it is affected by recent results from OpenAI sora, the state-of-the-art AI system for generating videos from text.
Reference Papers: [CLIP], [NeRF], [Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation (CoRL 2023 Best Paper)].
|
|
No Class (Patriot's Day) |
AI for Science
|
Tue, Apr. 16 |
Towards Efficient and High-Quality 3D Generation
|
Abstract: 3D generation has received growing attention due to its potential in modeling the 3D visual world. Despite remarkable advancements, there remains a significant journey ahead. In this talk, we will explore three key aspects of 3D generation. Firstly, we will focus on geometry quality, delving into the design of the discriminator. This crucial component has often been overlooked in many existing 3D generative approaches. Secondly, we will examine the realm of animatable human generation, probing into techniques and challenges associated with this dynamic aspect of 3D modeling. Lastly, we will discuss strategies for constructing a foundational model tailored for 3D generation, aiming to provide a robust framework for further advancements in the field.
Reference Papers:
[Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator],
[Learning 3D-aware Image Synthesis with Unknown Pose Distribution],
[Gaussian Shell Maps for Efficient 3D Human Generation],
[GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation],
[3D Gaussian Splatting for Real-Time Radiance Field Rendering (Siggraph 2023 Best Paper)].
|
|
Wed, Apr. 17 |
CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society
Guest Speaker: Guohao Li (University of Oxford)
|
Abstract: The rapid advancement of chat-based language models has led to remarkable progress in complex task-solving. However, their success heavily relies on human input to guide the conversation, which can be challenging and time-consuming. This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents, and provides insight into their "cognitive" processes. To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named role-playing. Our approach involves using inception prompting to guide chat agents toward task completion while maintaining consistency with human intentions. We showcase how role-playing can be used to generate conversational data for studying the behaviors and capabilities of a society of agents, providing a valuable resource for investigating conversational language models. In particular, we conduct comprehensive studies on instruction-following cooperation in multi-agent settings.
Reference Papers:
[CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society],
[https://www.camel-ai.org/]
|
|
Fri, Apr. 19 |
Segment Anything
Guest Speaker: Hanzi Mao (Nvidia Deep Imagination Research)
Time: 1:00 PM - 2:00 PM (Eastern Time), 10:00 AM - 11:00 AM (Pacific Time)
|
Abstract: We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive – often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at https://segment-anything.com to foster research into foundation models for computer vision.
Reference Papers: [Segment Anything (ICCV 2023 Best Honorable Mention Paper)]
|
|
Mon, Apr. 22 |
Respiration Intelligence: Know Your Health from Your Breathing with an AI Assistant
Guest Speaker: Hao He (MIT CSAIL)
|
Abstract: Respiration is a fundamental life-sustaining function intricately connected to various aspects of human health. With the aid of AI, we can uncover associations between respiration and numerous health conditions. In this lecture, we'll introduce three case studies demonstrating the use of breathing signals to predict blood oxygen saturation, sleep stages, and inflammation. We'll discuss the accuracy and practical applications of these predictive systems, as well as the core AI technologies that power them.
|
Wed, Apr. 24 |
Learning from Synthetic data from LLMs and Diffusion Models
|
| |
Final Projects
|
Fri, Apr. 26 |
Final Project Presentation (1)
|
| |
Mon, Apr. 29 |
Final Project Presentation (2)
|
| |
Wed, May. 1 |
Final Project Presentation (3)
|
| |
Mon, May. 12 |
|
| Final project report/code due |