Deep Learning Seminar

meets TTH 12:30 p.m. - 2:00 p.m. in GDC 4.304 (zoom mirror; see canvas)

instructor Philipp Krähenbühl
email philkr (at) utexas.edu
office hours Th 2pm-2:30pm (zoom, see canvas)

TA Yue Zhao
email yzhao (at) cs.utexas.edu
TA hours T 2pm-2:30pm (zoom, see canvas)

Please use github for all assignments. Zoom links and final grades are available in canvas.

Unless disaster strikes, the course will be taught in-person with an optional zoom mirror. The zoom mirror will be live, there will not be any recordings.

Prerequisites

Intro Machine learning

391L or equivalent

Discrete math for computer science

311, 311H or equivalent

Proficiency in Python

All projects use Python with PyTorch
It is recommended to familiarize yourself with additional libraries: numpy, scikit-learn, matplotlib

Basic deep learning background

Familiarity with at least one deep learning package (PyTorch, Caffe, Tensorflow, Torch, Matconvnet, ...)
You should have trained at least one deep network

We wont enforce strict prerequisites (help of these topics is limited though).

Class overview

We discuss up to 5 recent research papers per class

We will try to get through 100+ papers over the semester (in other words 1/10 CVPR or NeurIPS)
No individual student will need to read 100 papers, but you'll need to read 100+ 5-minute paper summaries.

Before class: Groups of 4 students ([S1], [S2], [C], [R]) read each paper

[S1][S2] write a summary and review
[C] codes up the main idea (using starter code we provide)
[R] performs a round of peer review on both summaries and code

In class

[S1][S2] present the summaries and reviews for each paper (5min per paper at most), [C] briefly presents an overview of the implementation (1-2min)
We look at a comparison between papers
Discussion

Auditing allowed if there is space (no coding or presentation, but in class participation required)

Schedule

Date

Topic

Papers

Aug 26

Course Introduction

Aug 31

Convolutional Neural Networks
[no code (yet)]

[S1]	Gradient-based learning applied to document recognition, LeCun, Bottun, Bengio, Haffner; 1998
[S1]	ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky, Sutskever, Hinton; 2012
[S1]	Network In Network, Lin, Chen, Yan; 2013
[S1]	Going Deeper with Convolutions, Szegedy, Liu, Jia, Sermanet, Reed, Anguelov, Erhan, Vanhoucke, Rabinovich; 2014
[S1]	Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan, Zisserman; 2014

Sep 02

Convolutional Neural Networks
[no code (yet)]

[S1]	Deep Residual Learning for Image Recognition, He, Zhang, Ren, Sun; 2015
[S1]	Densely Connected Convolutional Networks, Huang, Liu, Maaten, Weinberger; 2016
[S1]	MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, Howard, Zhu, Chen, Kalenichenko, Wang, Weyand, Andreetto, Adam; 2017
[S1]	MobileNetV2: Inverted Residuals and Linear Bottlenecks, Sandler, Howard, Zhu, Zhmoginov, Chen; 2018
[S1]	EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, Tan, Le; 2019

Sep 07

Non-linearities (and initialization)
[C]

[S1] [S2]	Understanding the difficulty of training deep feedforward neural networks, Glorot, Bengio; 2010	S1: Serdjan Rolovic S2: Elias Lampietti C: Matthew Kelleher R: Samantha Hay
[S1] [S2]	Deep Sparse Rectifier Neural Networks, Glorot, Bordes, Bengio; 2011	S1: Ishank Arora S2: Yeming Wen C: Ayush Chauhan R: Christopher Hahn
[S1] [S2]	Maxout Networks, Goodfellow, Warde-Farley, Mirza, Courville, Bengio; 2013	S1: Zayne Sprague S2: Zhou Fang C: Reid Ling Tong Li R: Jay Whang
[S1]	Exact solutions to the nonlinear dynamics of learning in deep linear neural networks, Saxe, McClelland, Ganguli; 2013	S1: Liyan Chen S2: C: Marlan McInnes-Taylor R: Nilesh Gupta
[S1] [S2]	Rectifier Nonlinearities Improve Neural Network Acoustic Models, Maas, Hannun, Ng; 2013	S1: Kelsey Ball S2: Srinath Tankasala C: Hung-Ting Chen R: Sai Kiran Maddela

Sep 09

Non-linearities (and initialization)
[C]

[S1] [S2]	Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, He, Zhang, Ren, Sun; 2015	S1: Tongrui Li S2: Jordi Ramos Chen C: R:
[S1] [S2]	Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs), Clevert, Unterthiner, Hochreiter; 2015	S1: Joshua Papermaster S2: ABAYOMI ADEKANMBI C: Ian Trowbridge R: Tarannum Khan
[S2]	Searching for Activation Functions, Ramachandran, Zoph, Le; 2017	S1: S2: Jay Liao C: Ishan Shah R: Kiran Raja
[S1] [S2]	Mish: A Self Regularized Non-Monotonic Activation Function, Misra; 2019	S1: Shivi Agarwal S2: Atreya Dey C: Ojas Patel R: Daniel Almeraz
[S1] [S2]	Gaussian Error Linear Units (GELUs), Hendrycks, Gimpel; 2016	S1: Marco Bueso S2: Jose Chavez C: Cheng-Chun Hsu R: Shivang Singh

Sep 14

Optimizers
[C]

[S1] [S2]	Large-Scale Machine Learning with Stochastic Gradient Descent, Bottou; 2010	S1: Elias Lampietti S2: Zayne Sprague C: Samantha Hay R: Ojas Patel
[S1] [S2]	On the importance of initialization and momentum in deep learning, Sutskever, Martens, Dahl, Hinton; 2013	S1: Zhou Fang S2: Kelsey Ball C: Kiran Raja R: Reid Ling Tong Li
[S1] [S2]	Cyclical Learning Rates for Training Neural Networks, Smith; 2015	S1: Jay Liao S2: Liyan Chen C: Shivang Singh R: Ian Trowbridge
[S1] [S2]	SGDR: Stochastic Gradient Descent with Warm Restarts, Loshchilov, Hutter; 2016	S1: Atreya Dey S2: Tongrui Li C: Tarannum Khan R:
[S1] [S2]	Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates, Smith, Topin; 2017	S1: Jose Chavez S2: Marco Bueso C: R: Marlan McInnes-Taylor

Sep 16

Optimizers
[C]

[S1]	Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, Duchi, Hazan, Singer; 2011	S1: Yeming Wen S2: C: Daniel Almeraz R: Ayush Chauhan
[S1] [S2]	ADADELTA: An Adaptive Learning Rate Method, Zeiler; 2012	S1: Srinath Tankasala S2: Shivi Agarwal C: Nilesh Gupta R: Matthew Kelleher
[S1] [S2]	Adam: A Method for Stochastic Optimization, Kingma, Ba; 2014	S1: Jordi Ramos Chen S2: Ishank Arora C: Jay Whang R: Cheng-Chun Hsu
[S1] [S2]	On the Convergence of Adam and Beyond, Reddi, Kale, Kumar; 2019	S1: ABAYOMI ADEKANMBI S2: Joshua Papermaster C: Sai Kiran Maddela R: Hung-Ting Chen
[S2]	Decoupled Weight Decay Regularization, Loshchilov, Hutter; 2017	S1: S2: Serdjan Rolovic C: Christopher Hahn R: Ishan Shah

Sep 21

Normalizations
[C]

[S1] [S2]	Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov; 2014	S1: Ojas Patel S2: Tarannum Khan C: Marco Bueso R:
[S1]	Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Ioffe, Szegedy; 2015	S1: Ishan Shah S2: C: Kelsey Ball R: Elias Lampietti
[S1] [S2]	Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks, Salimans, Kingma; 2016	S1: Marlan McInnes-Taylor S2: Samantha Hay C: Zayne Sprague R: Jay Liao
[S1] [S2]	Layer Normalization, Ba, Kiros, Hinton; 2016	S1: Ayush Chauhan S2: Jay Whang C: Ishank Arora R: Atreya Dey
[S1] [S2]	Instance Normalization: The Missing Ingredient for Fast Stylization, Ulyanov, Vedaldi, Lempitsky; 2016	S1: Reid Ling Tong Li S2: Kiran Raja C: Liyan Chen R: ABAYOMI ADEKANMBI

Sep 23

Normalizations
[C]

[S1] [S2]	Group Normalization, Wu, He; 2018	S1: Ian Trowbridge S2: Shivang Singh C: Shivi Agarwal R: Srinath Tankasala
[S1] [S2]	High-Performance Large-Scale Image Recognition Without Normalization, Brock, De, Smith, Simonyan; 2021	S1: Matthew Kelleher S2: Christopher Hahn C: Serdjan Rolovic R: Jose Chavez
[S2]	Micro-Batch Training with Batch-Channel Normalization and Weight Standardization, Qiao, Wang, Liu, Shen, Yuille; 2019	S1: S2: Nilesh Gupta C: Joshua Papermaster R: Jordi Ramos Chen
[S1] [S2]	Understanding Batch Normalization, Bjorck, Gomes, Selman, Weinberger; 2018	S1: Hung-Ting Chen S2: Daniel Almeraz C: Tongrui Li R: Zhou Fang
[S1] [S2]	Rethinking "Batch" in BatchNorm, Wu, Johnson; 2021	S1: Cheng-Chun Hsu S2: Sai Kiran Maddela C: R: Yeming Wen

Sep 28

Sequence models
[C]

[S1] [S2]	Sequence to Sequence Learning with Neural Networks, Sutskever, Vinyals, Le; 2014	S1: Sai Kiran Maddela S2: Ian Trowbridge C: Srinath Tankasala R: Liyan Chen
[S1] [S2]	Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling, Chung, Gulcehre, Cho, Bengio; 2014	S1: Christopher Hahn S2: Reid Ling Tong Li C: Jay Liao R: Tongrui Li
[S1] [S2]	Neural Machine Translation by Jointly Learning to Align and Translate, Bahdanau, Cho, Bengio; 2014	S1: Shivang Singh S2: Ayush Chauhan C: Elias Lampietti R: Serdjan Rolovic
[S1] [S2]	Attention Is All You Need, Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin; 2017	S1: Samantha Hay S2: Ojas Patel C: Zhou Fang R: Ishank Arora
[S1] [S2]	End-To-End Memory Networks, Sukhbaatar, Szlam, Weston, Fergus; 2015	S1: Tarannum Khan S2: Marlan McInnes-Taylor C: R: Zayne Sprague

Sep 30

Sequence models
[C]

[S1] [S2]	Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth, Dong, Cordonnier, Loukas; 2021	S1: Nilesh Gupta S2: Ishan Shah C: Jordi Ramos Chen R: Kelsey Ball
[S1]	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin, Chang, Lee, Toutanova; 2018	S1: Daniel Almeraz S2: C: Jose Chavez R: Joshua Papermaster
[S2]	A Primer in BERTology: What we know about how BERT works, Rogers, Kovaleva, Rumshisky; 2020	S1: S2: Hung-Ting Chen C: Atreya Dey R: Shivi Agarwal
[S1] [S2]	Improving Language Understanding by Generative Pre-Training, Radford, Narasimhan, Salimans, Sutskever; 2018	S1: Kiran Raja S2: Cheng-Chun Hsu C: Yeming Wen R: Marco Bueso
[S1] [S2]	Language Models are Few-Shot Learners, Brown, Mann, Ryder, Subbiah, Kaplan, Dhariwal, Neelakantan, Shyam, Sastry, Askell, Agarwal, Herbert-Voss, Krueger, Henighan, Child, Ramesh, Ziegler, Wu, Winter, Hesse, Chen, Sigler, Litwin, Gray, Chess, Clark, Berner, McCandlish, Radford, Sutskever, Amodei; 2020	S1: Jay Whang S2: Matthew Kelleher C: ABAYOMI ADEKANMBI R:

Oct 05

Efficient Transformers
[C]

[S1] [S2]	Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context, Dai, Yang, Yang, Carbonell, Le, Salakhutdinov; 2019	S1: Serdjan Rolovic S2: Elias Lampietti C: Matthew Kelleher R: Samantha Hay
[S1] [S2]	Generating Long Sequences with Sparse Transformers, Child, Gray, Radford, Sutskever; 2019	S1: Ishank Arora S2: Yeming Wen C: Ayush Chauhan R: Christopher Hahn
[S1] [S2]	Compressive Transformers for Long-Range Sequence Modelling, Rae, Potapenko, Jayakumar, Lillicrap; 2019	S1: Zayne Sprague S2: Zhou Fang C: Reid Ling Tong Li R: Jay Whang
[S1]	Reformer: The Efficient Transformer, Kitaev, Kaiser, Levskaya; 2020	S1: Liyan Chen S2: C: Marlan McInnes-Taylor R: Nilesh Gupta
[S1] [S2]	Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention, Katharopoulos, Vyas, Pappas, Fleuret; 2020	S1: Kelsey Ball S2: Srinath Tankasala C: Hung-Ting Chen R: Sai Kiran Maddela

Oct 07

Efficient Transformers
[C]

[S1] [S2]	Linformer: Self-Attention with Linear Complexity, Wang, Li, Khabsa, Fang, Ma; 2020	S1: Tongrui Li S2: Jordi Ramos Chen C: R:
[S1] [S2]	Rethinking Attention with Performers, Choromanski, Likhosherstov, Dohan, Song, Gane, Sarlos, Hawkins, Davis, Mohiuddin, Kaiser, Belanger, Colwell, Weller; 2020	S1: Joshua Papermaster S2: ABAYOMI ADEKANMBI C: Ian Trowbridge R: Tarannum Khan
[S2]	Longformer: The Long-Document Transformer, Beltagy, Peters, Cohan; 2020	S1: S2: Jay Liao C: Ishan Shah R: Kiran Raja
[S1] [S2]	Big Bird: Transformers for Longer Sequences, Zaheer, Guruganesh, Dubey, Ainslie, Alberti, Ontanon, Pham, Ravula, Wang, Yang, Ahmed; 2020	S1: Shivi Agarwal S2: Atreya Dey C: Ojas Patel R: Daniel Almeraz
[S1] [S2]	LambdaNetworks: Modeling Long-Range Interactions Without Attention, Bello; 2021	S1: Marco Bueso S2: Jose Chavez C: Cheng-Chun Hsu R: Shivang Singh

Oct 12

Vision Transformers
[C]

[S1] [S2]	An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Dosovitskiy, Beyer, Kolesnikov, Weissenborn, Zhai, Unterthiner, Dehghani, Minderer, Heigold, Gelly, Uszkoreit, Houlsby; 2020	S1: Elias Lampietti S2: Zayne Sprague C: Samantha Hay R: Ojas Patel
[S1] [S2]	Training data-efficient image transformers & distillation through attention, Touvron, Cord, Douze, Massa, Sablayrolles, Jégou; 2020	S1: Zhou Fang S2: Kelsey Ball C: Kiran Raja R: Reid Ling Tong Li
[S1] [S2]	BEiT: BERT Pre-Training of Image Transformers, Bao, Dong, Wei; 2021	S1: Jay Liao S2: Liyan Chen C: Shivang Singh R: Ian Trowbridge
[S1] [S2]	LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference, Graham, El-Nouby, Touvron, Stock, Joulin, Jégou, Douze; 2021	S1: Atreya Dey S2: Tongrui Li C: Tarannum Khan R:
[S1] [S2]	Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions, Wang, Xie, Li, Fan, Song, Liang, Lu, Luo, Shao; 2021	S1: Jose Chavez S2: Marco Bueso C: R: Marlan McInnes-Taylor

Oct 14

Vision Transformers
[C]

[S2]	Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, Liu, Lin, Cao, Hu, Wei, Zhang, Lin, Guo; 2021	S1: Yeming Wen S2: C: Daniel Almeraz R: Ayush Chauhan
[S1] [S2]	Transformer in Transformer, Han, Xiao, Wu, Guo, Xu, Wang; 2021	S1: Srinath Tankasala S2: Shivi Agarwal C: Nilesh Gupta R: Matthew Kelleher
[S1] [S2]	Perceiver: General Perception with Iterative Attention, Jaegle, Gimeno, Brock, Zisserman, Vinyals, Carreira; 2021	S1: Jordi Ramos Chen S2: Ishank Arora C: Jay Whang R: Cheng-Chun Hsu
[S1] [S2]	Perceiver IO: A General Architecture for Structured Inputs & Outputs, Jaegle, Borgeaud, Alayrac, Doersch, Ionescu, Ding, Koppula, Zoran, Brock, Shelhamer, Hénaff, Botvinick, Zisserman, Vinyals, Carreira; 2021	S1: ABAYOMI ADEKANMBI S2: Joshua Papermaster C: Sai Kiran Maddela R: Hung-Ting Chen
[S2]	MLP-Mixer: An all-MLP Architecture for Vision, Tolstikhin, Houlsby, Kolesnikov, Beyer, Zhai, Unterthiner, Yung, Steiner, Keysers, Uszkoreit, Lucic, Dosovitskiy; 2021	S1: S2: Serdjan Rolovic C: Christopher Hahn R: Ishan Shah

Oct 19

Implicit functions
[C]

[S1] [S2]	DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation, Park, Florence, Straub, Newcombe, Lovegrove; 2019	S1: Ojas Patel S2: Tarannum Khan C: Marco Bueso R:
[S1]	Occupancy Networks: Learning 3D Reconstruction in Function Space, Mescheder, Oechsle, Niemeyer, Nowozin, Geiger; 2018	S1: Ishan Shah S2: C: Kelsey Ball R: Elias Lampietti
[S1] [S2]	Implicit Geometric Regularization for Learning Shapes, Gropp, Yariv, Haim, Atzmon, Lipman; 2020	S1: Marlan McInnes-Taylor S2: Samantha Hay C: Zayne Sprague R: Jay Liao
[S1] [S2]	Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains, Tancik, Srinivasan, Mildenhall, Fridovich-Keil, Raghavan, Singhal, Ramamoorthi, Barron, Ng; 2020	S1: Ayush Chauhan S2: Jay Whang C: Ishank Arora R: Atreya Dey
[S1] [S2]	Implicit Neural Representations with Periodic Activation Functions, Sitzmann, Martel, Bergman, Lindell, Wetzstein; 2020	S1: Reid Ling Tong Li S2: Kiran Raja C: Liyan Chen R: ABAYOMI ADEKANMBI

Oct 21

Implicit functions
[C]

[S1] [S2]	Learning Continuous Image Representation with Local Implicit Image Function, Chen, Liu, Wang; 2020	S1: Ian Trowbridge S2: Shivang Singh C: Shivi Agarwal R: Srinath Tankasala
[S1] [S2]	NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, Mildenhall, Srinivasan, Tancik, Barron, Ramamoorthi, Ng; 2020	S1: Matthew Kelleher S2: Christopher Hahn C: Serdjan Rolovic R: Jose Chavez
[S2]	NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections, Martin-Brualla, Radwan, Sajjadi, Barron, Dosovitskiy, Duckworth; 2020	S1: S2: Nilesh Gupta C: Joshua Papermaster R: Jordi Ramos Chen
[S1] [S2]	Baking Neural Radiance Fields for Real-Time View Synthesis, Hedman, Srinivasan, Mildenhall, Barron, Debevec; 2021	S1: Hung-Ting Chen S2: Daniel Almeraz C: Tongrui Li R: Zhou Fang
[S1] [S2]	GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields, Niemeyer, Geiger; 2020	S1: Cheng-Chun Hsu S2: Sai Kiran Maddela C: R: Yeming Wen

Oct 26

2D recognition
[C]

[S1] [S2]	Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, Ren, He, Girshick, Sun; 2015	S1: Sai Kiran Maddela S2: Ian Trowbridge C: Srinath Tankasala R: Liyan Chen
[S1] [S2]	You Only Look Once: Unified, Real-Time Object Detection, Redmon, Divvala, Girshick, Farhadi; 2015	S1: Christopher Hahn S2: Reid Ling Tong Li C: Jay Liao R: Tongrui Li
[S1] [S2]	Focal Loss for Dense Object Detection, Lin, Goyal, Girshick, He, Dollár; 2017	S1: Shivang Singh S2: Ayush Chauhan C: Elias Lampietti R: Serdjan Rolovic
[S1] [S2]	Mask R-CNN, He, Gkioxari, Dollár, Girshick; 2017	S1: Samantha Hay S2: Ojas Patel C: Zhou Fang R: Ishank Arora
[S1] [S2]	Cascade R-CNN: Delving into High Quality Object Detection, Cai, Vasconcelos; 2017	S1: Tarannum Khan S2: Marlan McInnes-Taylor C: R: Zayne Sprague

Oct 28

2D recognition
[C]

[S1] [S2]	Deformable Convolutional Networks, Dai, Qi, Xiong, Li, Zhang, Hu, Wei; 2017	S1: Nilesh Gupta S2: Ishan Shah C: Jordi Ramos Chen R: Kelsey Ball
[S1]	CornerNet: Detecting Objects as Paired Keypoints, Law, Deng; 2018	S1: Daniel Almeraz S2: C: Jose Chavez R: Joshua Papermaster
[S2]	Objects as Points, Zhou, Wang, Krähenbühl; 2019	S1: S2: Hung-Ting Chen C: Atreya Dey R: Shivi Agarwal
[S1] [S2]	End-to-End Object Detection with Transformers, Carion, Massa, Synnaeve, Usunier, Kirillov, Zagoruyko; 2020	S1: Kiran Raja S2: Cheng-Chun Hsu C: Yeming Wen R: Marco Bueso
[S1] [S2]	Deformable DETR: Deformable Transformers for End-to-End Object Detection, Zhu, Su, Lu, Li, Wang, Dai; 2020	S1: Jay Whang S2: Matthew Kelleher C: ABAYOMI ADEKANMBI R:

Nov 02

3D recognition
[C]

[S1] [S2]	PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation, Qi, Su, Mo, Guibas; 2016	S1: Serdjan Rolovic S2: Elias Lampietti C: Matthew Kelleher R: Samantha Hay
[S1] [S2]	PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space, Qi, Yi, Su, Guibas; 2017	S1: Ishank Arora S2: Yeming Wen C: Ayush Chauhan R: Christopher Hahn
[S1] [S2]	Dynamic Graph CNN for Learning on Point Clouds, Wang, Sun, Liu, Sarma, Bronstein, Solomon; 2018	S1: Zayne Sprague S2: Zhou Fang C: Reid Ling Tong Li R: Jay Whang
[S1]	PointCNN: Convolution On $\mathcal{X}$-Transformed Points, Li, Bu, Sun, Wu, Di, Chen; 2018	S1: Liyan Chen S2: C: Marlan McInnes-Taylor R: Nilesh Gupta
[S1] [S2]	Point Transformer, Zhao, Jiang, Jia, Torr, Koltun; 2020	S1: Kelsey Ball S2: Srinath Tankasala C: Hung-Ting Chen R: Sai Kiran Maddela

Nov 04

3D recognition
[C]

[S1] [S2]	VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection, Zhou, Tuzel; 2017	S1: Tongrui Li S2: Jordi Ramos Chen C: R:
[S1] [S2]	PointPillars: Fast Encoders for Object Detection from Point Clouds, Lang, Vora, Caesar, Zhou, Yang, Beijbom; 2018	S1: Joshua Papermaster S2: ABAYOMI ADEKANMBI C: Ian Trowbridge R: Tarannum Khan
[S2]	PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud, Shi, Wang, Li; 2018	S1: S2: Jay Liao C: Ishan Shah R: Kiran Raja
[S1] [S2]	Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving, Wang, Chao, Garg, Hariharan, Campbell, Weinberger; 2018	S1: Shivi Agarwal S2: Atreya Dey C: Ojas Patel R: Daniel Almeraz
[S1] [S2]	Center-based 3D Object Detection and Tracking, Yin, Zhou, Krähenbühl; 2020	S1: Marco Bueso S2: Jose Chavez C: Cheng-Chun Hsu R: Shivang Singh

Nov 09

Open world perception
[C]

[S1] [S2]	Momentum Contrast for Unsupervised Visual Representation Learning, He, Fan, Wu, Xie, Girshick; 2019	S1: Elias Lampietti S2: Zayne Sprague C: Samantha Hay R: Ojas Patel
[S1] [S2]	A Simple Framework for Contrastive Learning of Visual Representations, Chen, Kornblith, Norouzi, Hinton; 2020	S1: Zhou Fang S2: Kiran Raja C: Kelsey Ball R: Reid Ling Tong Li
[S1]	VirTex: Learning Visual Representations from Textual Annotations, Desai, Johnson; 2020	S1: Jay Liao S2: Liyan Chen C: Shivang Singh R: Ian Trowbridge
[S1] [S2]	Contrastive Learning of Medical Visual Representations from Paired Images and Text, Zhang, Jiang, Miura, Manning, Langlotz; 2020	S1: Atreya Dey S2: Tongrui Li C: Marco Bueso R:
[S1] [S2]	Learning Transferable Visual Models From Natural Language Supervision, Radford, Kim, Hallacy, Ramesh, Goh, Agarwal, Sastry, Askell, Mishkin, Clark, Krueger, Sutskever; 2021	S1: Jose Chavez S2: Tarannum Khan C: R: Marlan McInnes-Taylor

Nov 11

Open world perception
[C]

[S1]	Towards Open Set Deep Networks, Bendale, Boult; 2015	S1: Yeming Wen S2: C: Daniel Almeraz R: Ayush Chauhan
[S1] [S2]	Large-Scale Long-Tailed Recognition in an Open World, Liu, Miao, Zhan, Wang, Gong, Yu; 2019	S1: Srinath Tankasala S2: Shivi Agarwal C: Nilesh Gupta R: Matthew Kelleher
[S1] [S2]	Class-Balanced Loss Based on Effective Number of Samples, Cui, Jia, Lin, Song, Belongie; 2019	S1: Jordi Ramos Chen S2: Ishank Arora C: Jay Whang R: Cheng-Chun Hsu
[S1] [S2]	Decoupling Representation and Classifier for Long-Tailed Recognition, Kang, Xie, Rohrbach, Yan, Gordo, Feng, Kalantidis; 2019	S1: ABAYOMI ADEKANMBI S2: Joshua Papermaster C: Sai Kiran Maddela R: Hung-Ting Chen
[S2]	Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax, Li, Wang, Kang, Tang, Wang, Li, Feng; 2020	S1: S2: Serdjan Rolovic C: Christopher Hahn R: Ishan Shah

Nov 16

Temporal reasoning and Video
[no code (yet)]

[S1]	Two-Stream Convolutional Networks for Action Recognition in Videos, Simonyan, Zisserman; 2014
[S1]	Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset, Carreira, Zisserman; 2017
[S1]	SlowFast Networks for Video Recognition, Feichtenhofer, Fan, Malik, He; 2018
[S1]	Is Space-Time Attention All You Need for Video Understanding?, Bertasius, Wang, Torresani; 2021
[S1]	Multiscale Vision Transformers, Fan, Xiong, Mangalam, Li, Yan, Malik, Feichtenhofer; 2021

Nov 18

Temporal reasoning and Video
[no code (yet)]

[S1]	Online Model Distillation for Efficient Video Inference, Mullapudi, Chen, Zhang, Ramanan, Fatahalian; 2018
[S1]	Long-Term Feature Banks for Detailed Video Understanding, Wu, Feichtenhofer, Fan, He, Krähenbühl, Girshick; 2018
[S1]	Long Short-Term Transformer for Online Action Detection, Xu, Xiong, Chen, Li, Xia, Tu, Soatto; 2021
[S1]	Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling, Lei, Li, Zhou, Gan, Berg, Bansal, Liu; 2021
[S1]	CLEVRER: CoLlision Events for Video REpresentation and Reasoning, Yi, Gan, Li, Kohli, Wu, Torralba, Tenenbaum; 2019

Nov 23

Final Project Q/A

Nov 25

No class - Thanksgiving

Nov 30

Final Project Presentations

S1: Kelsey Ball S2: Zayne Sprague C: Marco Bueso R:

S1: Hung-Ting Chen S2: Jordi Ramos Chen C: Cheng-Chun Hsu R: Marlan McInnes-Taylor

S1: Atreya Dey S2: C: R:

S1: Christopher Hahn S2: C: R:

S1: Jay Liao S2: Elias Lampietti C: Tongrui Li R: Serdjan Rolovic

S1: ABAYOMI ADEKANMBI S2: Ishank Arora C: Ojas Patel R: Kiran Raja

S1: Srinath Tankasala S2: C: R:

S1: Jay Whang S2: C: R:

Dec 02

Final Project Presentations

Your role before of class

Weeks 2-12:

Every week you read one paper.
You alternate between
- [S1] summary and review (details here)
- [S2] summary and review (details here)
- [C] coding (details here)
- [R] peer review (details here)

Final week:

[F] Work in a team on the final project (details here)

Your role in class

Weeks 2-12:

[S1][S2] Present your paper (<5min)
[C] Show and discuss your implementations (as team 15-30 min)
[all] Participate in discussion

Final week:

[F] Present your final project

Goals of the class

After this class you should be able to

Read and understand deep learning papers
Implement and execute a research project in deep learning

Grading

5% Participation
- May miss 2 classes unexcused
- We provide remote options upon request
25% Coding [C]
20% Paper summary and review [S1][S2]
10% Peer review [R]
40% Final project [F]

Expected workload

Estimates of required effort to pass the class are:

1-2 h / week paper reading
3 h / week class participation
2-10 h / week coding, summary, or review
20-40 h final project

General tips

Start coding and final project early
- most deep neural networks take 1 day to train on a GPU
- let us know early if you don’t have GPU access (first or second week), colab or google cloud might be options
read the assigned papers early, write down questions and discussion topics

What should I do if I get sick?

DO NOT COME TO CLASS
We will send out a poll before every class to determine potential remote options
You may miss two classes without any excuse

Notes

Syllabus subject to change.