In this course we will see many instances where we can design algorithms for machine learning problems that have rigorous guarantees. The course will cover topics such as: nonnegative matrix factorization, topic models, matrix completion and tensor decomposition. Although all these problems are NP-hard in the worst-case, we will see what properties real-life instances may satisfy and why these properties allow us to design efficient algorithms with provable guarantees.

Instructor

Rong
Ge

D226 LSRC

Email: rongge@cs.duke.edu

Lectures: Tuesdays and Thursdays 4:40-5:55 pm in LSRC D106

Textbook: Ankur Moitra's monograph. Lecture notes for new topics will be provided.

Prerequisites: Familiarity with algorithms, probability and linear algebra.

Evaluation:

The course grades will be determined based on:

- Homework assignments (50%):
students are expected to solve 3 problem sets.

- Course project (50%): students are expected to complete a final course project in groups. The project could be either a survey or original research.

Date | Contents | References | Future Reading |

8/27 | Intro: Machine Learning Problems | Slides | |

Nonnegative Matrix Factorization and Topic Models | |||

9/1 | Matrix Factorizations, Nonnegative rank | Section 2.1 in textbook Short Slides/ illustrative example |
Section 2.2 in textbook |

9/3 | New algorithms for separable NMF | Section 2.3 in textbook | [3] |

9/8 | Topic Models | Section 2.4 in textbook | [4] |

9/10 | Implementing the Topic Modeling Algorithm | [5] Slides | See links in the slides |

[1]
D. Lee and S. Seung. Learning
the Parts of Objects by Nonnegative Matrix Factorization,
Nature 1999. [2] S. Vavasis. On the Complexity of Nonnegative Matrix Factorization, SIOPT 2009. [3] S. Arora, R. Ge, R. Kannan and A. Moitra. Computing a Nonnegative Matrix Factorization -- Provably, STOC 2012. [4] S. Arora, R. Ge and A. Moitra. Learning Topic Models -- Going Beyond SVD, FOCS 2012. [5] S. Arora et al. A Practical Algorithm for Topic Modeling with Provable Guarantees, ICML 2013. |
|||

Spectral Clustering | |||

9/15 | Stochastic block model, McSherry's algorithm | [1] | |

9/17 | Provable guarantee for Lloyd's algorithm | [2] | [2]
Especially Section 6 [3] semi-random models |

[1] F. McSherry. Spectral
Partitioning of Random Graphs, FOCS 2001 [2] A. Kumar, R. Kannan. Clustering with spectral norm and the k-means algorithm, FOCS 2010 [3] U. Feige, J. Kilian. Heuristics for finding large independent sets, with applications to coloring semi-random graphs FOCS 1998 |
|||

Tensor Decompositions | |||

9/22 | Tensor basics | Section 3.1 in textbook | |

9/24 | Tensor Decompositions | Section 3.1 in textbook | Section 3.2 in textbook |

9/29 | Power Method | [4] | [4] |

10/1 | Applications: topic models, mixture of Gaussians | Section 3.5 in textbook | [3] |

10/6 | HMMs, Phylogenetic tree reconstruction, stochastic block model | Sections 3.3, 3.4 in textbook | [2] |

10/8 | Independenct Component Analysis, sample projects | Section 3.6 in textbook | [5] |

[1]C.
Hillar and L. Lim. Most Tensor Problems
are NP-hard, JACM 2013.
[2]E. Mossel and S. Roch. Learning Nonsingular Phylogenies and Hidden Markov Models, STOC 2005. [3]A. Anandkumar, D. Foster, D. Hsu, S. Kakade and Y. Liu A Spectral Algorithm for Latent Dirichlet Allocation, NIPS 2012. [4]A. Anandkumar, R. Ge, D. Hsu, S. Kakade, M. Telgarsky. Tensor decompositions for learning latent variable models, JMLR 2014. [5]N. Goyal, S. Vempala and Y. Xiao. Fourier PCA, STOC 2014. |
|||

(non-convex) Optimization | |||

10/13 | Fall break | ||

10/15 | Basic optimization techniques | [1] Chapter 2 | |

10/20 | Strong convexity, approximate gradient descent | [3] Section 2 | [2] |

10/22 | Approximate gradient descent for sparse coding | [3] Section 4 | [3] |

10/27 | Second order optimization, cubic regularization | [4] | [4], Section 5 |

10/29 | Saddle point problem, stochastic gradient | [5] | |

[1]Y. Nesterov. Introductory
Lectures on Convex Optimization. [2]A. Rakhlin, O. Shamir, K. Sridharan. Making gradient descent optimal for strongly convex stochastic optimization, ICML 2012. [3]S. Arora, R. Ge, T. Ma and A. Moitra. Simple, Efficient and Neural Algorithms for Sparse Coding, COLT 2015. [4]Y. Nesterov and B.T. Polyak. Cubic regularization of Newton method and its global performance. Mathematical Programming 2006. [5]R. Ge, F. Huang, C. Jin, Y. Yuan. Escaping from Saddle-Points --- Online Stochastic Gradient for Tensor Decomposition, COLT 2015. |
|||

Matrix Completion | |||

11/3 | Matrix Completion, nuclear norm, Rademacher complexity | [1] | |

11/5 | Rademacher complexity (continued), matrix concentrations | [1] | [6] |

11/10 | Matrix Completion via Convex Optimization | [2], Section 7 in textbook | |

11/12 | Matrix Completion via Convex Optimization (continued) | [2], Section 7 in textbook | [3] |

11/17 | Matrix Completion via Alternating Minimization | [5] | |

11/19 | Matrix Completion via Alternating Minimization (continued) | [5] | [4] |

[1] N.
Srebro and A. Shraibman. Rank, Trace-Norm
and Max-Norm, COLT 2005. [2] E. Candes and B. Recht. Exact Matrix Completion via Convex Optimization, FOCM 2009. [3] V. Chandrasekaran, P. Parrilo, B. Recht and A. Willsky. The Convex Geometry of Linear Inverse Problems, FOCM 2012. [4] P. Jain, P. Netrapalli and S. Sanghavi. Low-rank Matrix Completion using Alternating Minimization, STOC 2013. [5] M. Hardt. Understanding Alternating Minimization for Matrix Completion, FOCS 2014. [6] J. Tropp. User-friendly tail bounds for sums of random matrices. FOCM 2012. |
|||

11/22 | Summary: what we learned and what we didn't cover | Slides |