Fall 2016 - COMPSCI 590.2 - Algorithimic Aspects of Machine Learning

Many machine learning problems (like sparse coding or topic modeling) are hard in the worst-case, but nevertheless solved in practice by algorithms whose convergence properties are not understood. In this course we will see many instances where we can design algorithms for machine learning problems that have rigorous guarantees. The course will cover topics such as: nonnegative matrix factorization, topic models, matrix completion and tensor decomposition. Although all these problems are NP-hard in the worst-case, we will see what properties real-life instances may satisfy and why these properties allow us to design efficient algorithms with provable guarantees.

Course Information

Announcement

Homework 3 is posted. Due 11/16 in class.

Homework 2 is posted. Due 10/19 in class.

Homework 1 is posted. Due 9/21 in class.

Room change! The lectures are moved to Physics 299.

Instructor
Rong Ge
D226 LSRC
Email: rongge@cs.duke.edu

Lectures: Monday and Wednesday 10:05 - 11:20 AM, Physics 299.

Textbook: Ankur Moitra's monograph. Lecture notes for new topics will be provided.

Prerequisites: Familiarity with algorithms, probability and linear algebra.

Evaluation:
The course grades will be determined based on:

Homework assignments (50%): students are expected to solve 3 problem sets.

Course project (50%): students are expected to complete a final course project in groups.
The project can include experiments with heuristic or provable algorithms, and/or theoretical analysis

Students are also supposed to scribe one lecture (this will happen more frequently when we get to non-convex optimization and neural networks).

Discussions: Questions about the course material or problem sets should be discussed on piazza.

Course Outline

Date	Contents	References	Future Reading
8/29	Intro: Machine Learning Problems	Slides
Nonnegative Matrix Factorization and Topic Models
8/31	New algorithms for separable NMF	Notes, Section 2.3 in textbook	Section 2.1-2.2 in textbook [3]
9/5	Topic Models	Notes, Section 2.4 in textbook	[4]
9/7	Implementing the Topic Modeling Algorithm	Notes [5] Slides	See links in the slides
[1] D. Lee and S. Seung. Learning the Parts of Objects by Nonnegative Matrix Factorization, Nature 1999. [2] S. Vavasis.On the Complexity of Nonnegative Matrix Factorization, SIOPT 2009. [3] S. Arora, R. Ge, R. Kannan and A. Moitra.Computing a Nonnegative Matrix Factorization -- Provably, STOC 2012. [4] S. Arora, R. Ge and A. Moitra. Learning Topic Models -- Going Beyond SVD, FOCS 2012. [5] S. Arora et al. A Practical Algorithm for Topic Modeling with Provable Guarantees, ICML 2013.
Matrix and Spectral Techniques
9/12	Singular Value Decompositions, Power Method	Notes
9/14	Matrix Perturbations, Matrix Concentration	Notes	[1]
9/19	Stochastic block model, McSherry's algorithm	[2] Notes
[1] G.W. Stewart and J.G. Sun, Matrix Perturbation Theory [2] F. McSherry. Spectral Partitioning of Random Graphs, FOCS 2001
Tensor Decompositions
9/21	Tensor basics	Notes Section 3.1 in textbook
9/26	Tensor Decompositions	Notes Section 3.1 in textbook	Section 3.2 in textbook
9/28	Power Method	[4] Notes	[4]
10/3	Manipulating Moments: topic models, mixture of Gaussians	Notes Section 3.5 in textbook	[3]
10/5	Handling Asymmetric Tensors: Hidden Markov Models	Notes Sections 3.3, 3.4 in textbook	[2]
[1] C. Hillar and L. Lim. Most Tensor Problems are NP-hard, JACM 2013. [2] E. Mossel and S. Roch. Learning Nonsingular Phylogenies and Hidden Markov Models, STOC 2005. [3] A. Anandkumar, D. Foster, D. Hsu, S. Kakade and Y. Liu A Spectral Algorithm for Latent Dirichlet Allocation, NIPS 2012. [4] A. Anandkumar, R. Ge, D. Hsu, S. Kakade, M. Telgarsky. Tensor decompositions for learning latent variable models, JMLR 2014. [5] N. Goyal, S. Vempala and Y. Xiao. Fourier PCA, STOC 2014.
10/10	Fall break
10/12	Project: Groups and Topics	Slides
(non-convex) Optimization
10/17	Basic optimization techniques	Notes	[1] [2]
10/19	Stochastic Gradient and Variance Reduction	Notes	[2] [3]
10/24	Approximate Gradient Descent	[4] Section 2, 4 Notes	[4]
10/26	Saddle Point Problem	Guest Lecture by Chi	Blog Post
10/31	Sparse Coding	Notes
11/2	Matrix Completion	Notes	[6]
[1] Y. Nesterov. Introductory Lectures on Convex Optimization. [2] A. Rakhlin, O. Shamir, K. Sridharan. Making gradient descent optimal for strongly convex stochastic optimization, ICML 2012. [3] R. Johnson, T. Zhang. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction, NIPS 2013 [4] S. Arora, R. Ge, T. Ma and A. Moitra. Simple, Efficient and Neural Algorithms for Sparse Coding, COLT 2015. [5] R. Ge, F. Huang, C. Jin, Y. Yuan. Escaping from Saddle-Points --- Online Stochastic Gradient for Tensor Decomposition, COLT 2015. [6] R. Ge, J. Lee, T. Ma. Matrix Completion has No Spurious Local Minimum, NIPS 2016
Neural Networks and Deep Learning
11/7	Neural Networks and Backpropagation.	Notes
11/9	Representation Power I: Representer Theorems	[1] Notes
11/14	Prepresentation Power II: Lowerbounds	[2] Notes	[3] [4]
11/16	Random Neural Networks	[5] Notes
[1] A. Barron. Universal approximation bounds for superpositions of a sigmoidal function. 1993 [2] M. Telgarsky. Benefits of depth in neural networks. COLT 2016 [3] R. Elden and O. Shamir. The Power of Depth for Feedforward Neural Networks. COLT 2016 [4] N. Cohen, O. Sharir and A. Shashua. On the Expressive Power of Deep Learning: A Tensor Analysis. COLT 2016 [5] S. Arora, A. Bhaskara, R. Ge, T. Ma. Provable Bounds for Learning Some Deep Representations. ICML 2014
11/21	Summary: what we learned and what we didn't cover	Slides
11/23	Thanksgiving
11/28	Project Presentations
11/30	Project Presentations