Gregor Reisch: Madame Arithmatica, 1508

Administration

Algorithmic Foundations of Data Science
COMPSCI 390.01 • Spring 2025

Instructor: Pankaj K. Agarwal

TA: Qilin Ye

Time: Tue, Thur 10:05-11:20am

Location: LSRC A247

Office Hours:
Agarwal: Tues, Thurs 1:30-2:30pm, D214A LSRC
Ye: Mon 2:00-3:00pm, Fri 11:30-12:30pm, on Zoom

Course Synopsis

This course provides foundations of designing scalable algorithms that can solve fundamental tasks relevant for data science. The emphasis will be on understanding high-level theoretical intuitions and principles underlying the algorithms covered in the class as well as on implementing and applying them. The course is divided into six units:


  • Hashing: Universal and consistent hashing, Bloom filter, sketches
  • Data Compression: Huffman coding, entropy, move-to-front and sliding window methods
  • Similarity Analysis: Distance measures, clustering, NN searching, dimension reduction
  • Linear Algebraic Methods: Principle component analysis, singular value decomposition, tensor methods, spectral graph theory
  • Sampling and Estimation:Learnability, sampling methods, random walks, MCMC methods
  • Privacy and Fairness: Differential privacy, algorithmic fairness

Prerequisites

COMPSCI 230 and (preferably) 330. This course requires undergraduate background in discrete mathematics and algorithms as well as experience with writing medium-size programs

References

[BHK]A. Blum, J. Hopcroft, R. Kannan, Foundations of Data Science, Cambridge, 2020.
[HP]S. Har-Peled, Geometric Approximation Algorithms, AMS, 2013.
[MU] M. Mitzenmacher, E. Upfal, Probability and Computing, Cambridge, 2017.
[Ph] J. Phillips, Mathematical Foundations of Data Analysis, Springer, 2021.
[Va] G. Valiant, CS 168: The Modern Algorithmic Toolbox, Stanford Univ, Spring 2024.

Grading

  • Assignments (groups of at most two; four out of five): 35%
  • Two midterm exams (in-class, closed book): 45%
  • Course project (groups of at most two): 20%

page top