Announcement
Homework 3 is posted. Due 11/16 in class.
Homework 2 is posted. Due 10/19 in class.
Homework 1 is posted. Due 9/21 in class.
Room change! The lectures are moved to Physics 299.
Instructor
Rong
Ge
D226 LSRC
Email: rongge@cs.duke.edu
Lectures: Monday and Wednesday 10:05 - 11:20 AM, Physics 299.
Textbook: Ankur Moitra's monograph. Lecture notes for new topics will be provided.
Prerequisites: Familiarity with algorithms, probability and linear algebra.
Evaluation:
The course grades will be determined based on:
Date | Contents | References | Future Reading |
8/29 | Intro: Machine Learning Problems | Slides | |
Nonnegative Matrix Factorization and Topic Models | |||
8/31 | New algorithms for separable NMF | Notes, Section 2.3 in textbook | Section 2.1-2.2 in textbook [3] |
9/5 | Topic Models | Notes, Section 2.4 in textbook | [4] |
9/7 | Implementing the Topic Modeling Algorithm | Notes [5] Slides | See links in the slides |
[1]
D. Lee and S. Seung. Learning
the Parts of Objects by Nonnegative Matrix Factorization,
Nature 1999. [2] S. Vavasis.On the Complexity of Nonnegative Matrix Factorization, SIOPT 2009. [3] S. Arora, R. Ge, R. Kannan and A. Moitra.Computing a Nonnegative Matrix Factorization -- Provably, STOC 2012. [4] S. Arora, R. Ge and A. Moitra. Learning Topic Models -- Going Beyond SVD, FOCS 2012. [5] S. Arora et al. A Practical Algorithm for Topic Modeling with Provable Guarantees, ICML 2013. |
|||
Matrix and Spectral Techniques | |||
9/12 | Singular Value
Decompositions, Power Method |
Notes | |
9/14 | Matrix Perturbations, Matrix Concentration |
Notes |
[1] |
9/19 | Stochastic block model, McSherry's algorithm | [2] Notes | |
[1] G.W. Stewart and J.G.
Sun, Matrix
Perturbation Theory [2] F. McSherry. Spectral Partitioning of Random Graphs, FOCS 2001 |
|||
Tensor Decompositions | |||
9/21 |
Tensor basics | Notes Section 3.1 in textbook | |
9/26 | Tensor Decompositions | Notes Section 3.1 in textbook | Section 3.2 in textbook |
9/28 | Power Method | [4] Notes | [4] |
10/3 | Manipulating Moments: topic models, mixture of Gaussians | Notes Section 3.5 in textbook | [3] |
10/5 | Handling Asymmetric Tensors: Hidden Markov Models |
Notes Sections 3.3, 3.4 in textbook | [2] |
[1]
C.
Hillar and L. Lim. Most Tensor Problems
are NP-hard, JACM 2013.
[2] E. Mossel and S. Roch. Learning Nonsingular Phylogenies and Hidden Markov Models, STOC 2005. [3] A. Anandkumar, D. Foster, D. Hsu, S. Kakade and Y. Liu A Spectral Algorithm for Latent Dirichlet Allocation, NIPS 2012. [4] A. Anandkumar, R. Ge, D. Hsu, S. Kakade, M. Telgarsky. Tensor decompositions for learning latent variable models, JMLR 2014. [5] N. Goyal, S. Vempala and Y. Xiao. Fourier PCA, STOC 2014. |
|||
10/10 | Fall break | ||
10/12 | Project: Groups and Topics | Slides | |
(non-convex) Optimization | |||
10/17 | Basic optimization techniques | Notes | [1] [2] |
10/19 | Stochastic Gradient and Variance Reduction | Notes |
[2] [3] |
10/24 | Approximate Gradient Descent | [4] Section 2, 4 Notes | [4] |
10/26 | Saddle Point Problem | Guest Lecture by Chi |
Blog Post |
10/31 | Sparse Coding |
Notes |
|
11/2 | Matrix Completion | Notes | [6] |
[1] Y. Nesterov. Introductory
Lectures on Convex Optimization. [2] A. Rakhlin, O. Shamir, K. Sridharan. Making gradient descent optimal for strongly convex stochastic optimization, ICML 2012. [3] R. Johnson, T. Zhang. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction, NIPS 2013 [4] S. Arora, R. Ge, T. Ma and A. Moitra. Simple, Efficient and Neural Algorithms for Sparse Coding, COLT 2015. [5] R. Ge, F. Huang, C. Jin, Y. Yuan. Escaping from Saddle-Points --- Online Stochastic Gradient for Tensor Decomposition, COLT 2015. [6] R. Ge, J. Lee, T. Ma. Matrix Completion has No Spurious Local Minimum, NIPS 2016 |
|||
Neural Networks and Deep Learning | |||
11/7 | Neural Networks and Backpropagation. | Notes | |
11/9 | Representation Power I: Representer Theorems | [1] Notes | |
11/14 | Prepresentation Power II: Lowerbounds | [2] Notes | [3] [4] |
11/16 | Random Neural Networks | [5] Notes | |
[1]
A. Barron. Universal approximation bounds for superpositions of
a sigmoidal function. 1993 [2] M. Telgarsky. Benefits of depth in neural networks. COLT 2016 [3] R. Elden and O. Shamir. The Power of Depth for Feedforward Neural Networks. COLT 2016 [4] N. Cohen, O. Sharir and A. Shashua. On the Expressive Power of Deep Learning: A Tensor Analysis. COLT 2016 [5] S. Arora, A. Bhaskara, R. Ge, T. Ma. Provable Bounds for Learning Some Deep Representations. ICML 2014 |
|||
11/21 | Summary: what we learned and what we didn't cover | Slides | |
11/23 | Thanksgiving | ||
11/28 | Project Presentations | ||
11/30 |
Project Presentations |