Duke DBGroup Logo

CPS 196.03: Information Management and Mining
(Spring 2009, Shivnath Babu)

Course information
Course schedule and notes
Assignments
Readings
Project

* The reference textbook by Jiawei Han and Micheline Kamber will be referred to as JHMK.

WeekDateTopicSlides and reference*
101-08Introduction and overview Notes 1
201-13Introduction to Frequent itemsets Notes 2
01-15Apriori algorithm Notes 2
301-20Extensions to Apriori Notes 3
01-22Maximal and Closed Itemsets Notes 4
401-27Toivonen's algorithm, FP-trees Notes 4
01-29FP-growth, rule generation Notes 4
502-03Constraint-based mining
Programming Project 1 announced
Notes 5
02-05Introduction to Data Warehousing Notes 6
602-10Data cubes Notes 6
02-12Indexes in data warehouses Notes 7
702-17MOLAP, Multi-dimensional arrays, compression Notes 8
Cube computation paper
02-19Midterm
802-24Clustering large datasets (overview) Notes 9
Lecture notes
02-26Materialized views in data cubes HRU Paper
903-03Multi-way MOLAP algorithm for Cube Computation Cube computation paper
03-05Cube computation (contd.) Notes 8
1003-10Spring break
03-12Spring break
1103-17Clustering large datasets (distance metrics, k-means) Notes 9
Lecture notes
03-19Clustering large datasets (BFR algorithm) Notes 9
Lecture notes
1203-24Wrap up of clustering large datasets Notes 9
Lecture notes
03-26Google's initial system architecture (overview) Google paper
1303-31No class
04-02Google's initial system architecture (PageRank) Google paper
1404-07Google's initial system architecture (System and search) Notes 10
04-09The PageRank Citation Ranking (The theory behind PageRank) PageRank paper
1504-14The PageRank Citation Ranking (Computation, crawling) PageRank paper
04-16Internet Advertising and Click-Fraud Detection Tuzhilin paper
Notes 11
1404-21Wrap up
1604-27Finals -- 2.00-5.00 PM