Privacy and Fairness in Data Science

CompSci 590.01, Fall 2018

Instructor: Ashwin Machanavajjhala
When: 3:05 PM - 4:20 PM Mondays and Wednesdays
Where: LSRC D243
Office Hours: By appointment (send email to firstname@cs.duke.edu)

Synopsis: This course focuses on the design of data science algorithms and characterizing properties of privacy and fairness.

Our daily lives are actively being monitoring on web browsers, social networks, wearable devices and even robots. These data are routinely analyzed by statistical and machine learning tools to infer aggregate patterns of our behavior (with applications to science, medicine, advertising, IoT, etc). However, these data also contain private information about us that we would not like revealed to others (e.g., medical history, sexual orientation, locations visited, etc.). Disclosure of such data lead to our privacy being breached. Moreover, acting on data with such sensitive information could result in physical or financial harm, and discrimination, leading to issues in fairness.

A grand challenge that pervades all fields of computer science (and more generally scientific) research is: how to learn from data collected from individuals while provably ensuring that (a) private properties of individuals are not revealed by the results of the learning process, and (b) the decisions taken as a result of data analysis ensure fairness.

In this course, we will study recent work in computer science that mathematically formulates these societal constraints. For privacy, we will study differential privacy, a breakthrough privacy notion: an algorithm is differentially private if its output is insensitive to (small) changes in the input. Differential private algorithms have found applications in developing algorithms with provable privacy guarantees while learning from data from varied domains (e.g., social science, medicine, communications) and in varied modalities (e.g., tables, graphs, streams), and is used by government organizations and internet corporations like Google and Apple to collect and analyze data. Differential privacy has also been shown to help design better learning algorithms. We will also investigate how to formulate the notion of fairness in data analysis mathematical. We will study both the theory and practice of designing private and fair data analysis algorithms, and their applications to data arising from real world systems.

Prerequisites: The course is open to interested graduate and undergraduate students with sufficient mathematical maturity. Basic knowledge in algorithms, proof techniques, and probability will be assumed. Familiarity with databases and machine learning would help but is not necessary.

Format:

There are no exams.

Each lecture will involve reading one or more papers. Students will also be required to present at least one reading during the semester.

Students will be evaluated based on class performance and a research project that they complete individually or in groups of size 2. Projects can focus on developing new theory/algorithms for privacy/fairness, or on implementing/adapting known algorithms to a real application setting.

Tentative Syllabus:

Intro to privacy and fairness in data science. (4 lectures)
Formulating Privacy: Differential Privacy (4 lectures)
Formulating Fairness (4 lectures)

Algorithms for releasing statistics differential privacy (4 lectures)
Machine learning with privacy & fairness (4 lectures)

References:

Ashwin's course on privacy: Fall 2016, Fall 2013
Ashwin's tutorial on privacy: Part 1 (slides, video), Part 2 (slides, video)
Moritz Hardt's tutorial on fairness: (slides, video)