Fall 2016 - COMPSCI 590.2 - Algorithimic Aspects of Machine Learning

Many machine learning problems (like sparse coding or topic modeling) are hard in the worst-case, but nevertheless solved in practice by algorithms whose convergence properties are not understood. In this course we will see many instances where we can design algorithms for machine learning problems that have rigorous guarantees. The course will cover topics such as: nonnegative matrix factorization, topic models, matrix completion and tensor decomposition. Although all these problems are NP-hard in the worst-case, we will see what properties real-life instances may satisfy and why these properties allow us to design efficient algorithms with provable guarantees.

Course Information

Announcement

Homework 3 is posted. Due 11/16 in class.

Homework 2 is posted. Due 10/19 in class.

Homework 1 is posted. Due 9/21 in class.

Room change! The lectures are moved to Physics 299.

Instructor
Rong Ge
D226 LSRC
Email: rongge@cs.duke.edu

Lectures: Monday and Wednesday 10:05 - 11:20 AM, Physics 299. 

Textbook: Ankur Moitra's monograph. Lecture notes for new topics will be provided.

Prerequisites: Familiarity with algorithms, probability and linear algebra.

Evaluation:
The course grades will be determined based on:

Students are also supposed to scribe one lecture (this will happen more frequently when we get to non-convex optimization and neural networks). 

Discussions: Questions about the course material or problem sets should be discussed on piazza.

Course Outline

Date Contents References Future Reading
8/29 Intro: Machine Learning Problems Slides
Nonnegative Matrix Factorization and Topic Models
8/31 New algorithms for separable NMF Notes, Section 2.3 in textbook Section 2.1-2.2 in textbook [3]
9/5 Topic Models Notes, Section 2.4 in textbook [4]
9/7 Implementing the Topic Modeling Algorithm Notes [5]  Slides See links in the slides
[1] D. Lee and S. Seung. Learning the Parts of Objects by Nonnegative Matrix Factorization, Nature 1999.
[2] S. Vavasis.On the Complexity of Nonnegative Matrix Factorization, SIOPT 2009.
[3] S. Arora, R. Ge, R. Kannan and A. Moitra.Computing a Nonnegative Matrix Factorization -- Provably, STOC 2012.
[4] S. Arora, R. Ge and A. Moitra. Learning Topic Models -- Going Beyond SVD, FOCS 2012.
[5] S. Arora et al. A Practical Algorithm for Topic Modeling with Provable Guarantees, ICML 2013.
Matrix and Spectral Techniques
9/12 Singular Value Decompositions, Power Method
Notes
9/14 Matrix Perturbations, Matrix Concentration
Notes
[1]
9/19 Stochastic block model, McSherry's algorithm [2]  Notes
[1] G.W. Stewart and J.G. Sun, Matrix Perturbation Theory
[2] F. McSherry. Spectral Partitioning of Random Graphs, FOCS 2001
Tensor Decompositions
9/21
Tensor basics Notes Section 3.1 in textbook
9/26 Tensor Decompositions Notes Section 3.1 in textbook Section 3.2 in textbook
9/28 Power Method [4]  Notes [4]
10/3 Manipulating Moments: topic models, mixture of Gaussians Notes Section  3.5 in textbook [3]
10/5 Handling Asymmetric Tensors: Hidden Markov Models
Notes Sections 3.3, 3.4 in textbook [2]
[1] C. Hillar and L. Lim. Most Tensor Problems are NP-hard, JACM 2013.
[2] E. Mossel and S. Roch. Learning Nonsingular Phylogenies and Hidden Markov Models, STOC 2005.
[3] A. Anandkumar, D. Foster, D. Hsu, S. Kakade and Y. Liu A Spectral Algorithm for Latent Dirichlet Allocation, NIPS 2012.
[4] A. Anandkumar, R. Ge, D. Hsu, S. Kakade, M. Telgarsky. Tensor decompositions for learning latent variable models, JMLR 2014.
[5] N. Goyal, S. Vempala and Y. Xiao. Fourier PCA, STOC 2014.
10/10 Fall break
10/12 Project: Groups and Topics Slides
(non-convex) Optimization
10/17 Basic optimization techniques Notes [1] [2]
10/19 Stochastic Gradient and Variance Reduction Notes
[2] [3]
10/24 Approximate Gradient Descent [4] Section 2, 4 Notes [4]
10/26 Saddle Point Problem Guest Lecture by Chi
Blog Post
10/31 Sparse Coding
Notes

11/2 Matrix Completion Notes [6]
[1] Y. Nesterov. Introductory Lectures on Convex Optimization.
[2] A. Rakhlin, O. Shamir, K. Sridharan. Making gradient descent optimal for strongly convex stochastic optimization, ICML 2012.
[3] R. Johnson, T. Zhang. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction, NIPS 2013
[4] S. Arora, R. Ge, T. Ma and A. Moitra. Simple, Efficient and Neural Algorithms for Sparse Coding, COLT 2015.
[5] R. Ge, F. Huang, C. Jin, Y. Yuan. Escaping from Saddle-Points --- Online Stochastic Gradient for Tensor Decomposition, COLT 2015.
[6] R. Ge, J. Lee, T. Ma. Matrix Completion has No Spurious Local Minimum, NIPS 2016
Neural Networks and Deep Learning
11/7 Neural Networks and Backpropagation. Notes
11/9 Representation Power I: Representer Theorems [1] Notes
11/14 Prepresentation Power II: Lowerbounds [2] Notes [3] [4]
11/16 Random Neural Networks [5] Notes
[1] A. Barron. Universal approximation bounds for superpositions of a sigmoidal function. 1993
[2] M. Telgarsky. Benefits of depth in neural networks. COLT 2016
[3] R. Elden and O. Shamir. The Power of Depth for Feedforward Neural Networks. COLT 2016
[4] N. Cohen, O. Sharir and A. Shashua. On the Expressive Power of Deep Learning: A Tensor Analysis. COLT 2016
[5] S. Arora, A. Bhaskara, R. Ge, T. Ma. Provable Bounds for Learning Some Deep Representations. ICML 2014
11/21 Summary: what we learned and what we didn't cover Slides
11/23 Thanksgiving
11/28 Project Presentations
11/30
Project Presentations