Privacy and Fairness in Data Science
CompSci 590.01, Fall 2018
Instructor: Ashwin Machanavajjhala
When: 3:05 PM - 4:20 PM Mondays and Wednesdays
Where: LSRC D243
Office Hours: By appointment (send email to firstname@cs.duke.edu)
Synopsis:
This course focuses on the design of data science algorithms and characterizing properties of privacy and fairness.
Our daily lives are actively being monitoring on web browsers, social
networks, wearable devices and even robots. These data are routinely
analyzed by statistical and machine learning tools to infer aggregate
patterns of our behavior (with applications to science, medicine,
advertising, IoT, etc). However, these data also contain private information
about us that we would not like revealed to others (e.g., medical
history, sexual orientation, locations visited, etc.). Disclosure of such data lead to our privacy
being breached. Moreover, acting on data with such sensitive
information could result in physical or financial harm, and
discrimination, leading to issues in fairness.
A grand
challenge that pervades all fields of computer science (and more
generally scientific) research is: how to learn from data collected
from individuals while provably ensuring that (a) private properties of
individuals are not revealed by the results of the learning process,
and (b) the decisions taken as a result of data analysis ensure fairness.
In this course, we will study recent work in computer science that
mathematically formulates these societal constraints. For privacy, we
will study differential privacy, a breakthrough privacy notion: an
algorithm is differentially private if its
output is insensitive to (small) changes in the input. Differential
private algorithms have found applications in developing algorithms
with provable privacy guarantees while learning from data from varied
domains (e.g., social science, medicine, communications) and in varied
modalities (e.g., tables, graphs, streams), and is used by government
organizations and internet corporations like Google and Apple to
collect and analyze data. Differential privacy has also been shown to
help design better learning algorithms. We will also investigate how to
formulate the notion of fairness in data analysis mathematical. We will
study both the theory and practice of designing
private and fair data analysis algorithms, and their applications to
data arising from real
world systems.
Prerequisites: The course is
open to interested graduate and undergraduate students with sufficient
mathematical maturity. Basic knowledge in algorithms, proof techniques,
and probability will be assumed. Familiarity with databases and machine
learning would help but is not necessary.
Format:
- There are no exams.
- Each lecture will involve reading one or more
papers. Students will also be required to present at least one reading
during the semester.
- Students will be evaluated based on class
performance and a research project that they complete individually or
in groups of size 2.
Projects can focus on developing new theory/algorithms for privacy/fairness, or on
implementing/adapting known algorithms to a real application setting.
Tentative Syllabus:
- Intro to privacy and fairness in data science. (4 lectures)
- Formulating Privacy: Differential Privacy (4 lectures)
- Formulating Fairness (4 lectures)
- Algorithms for releasing statistics differential privacy (4 lectures)
- Machine learning with privacy & fairness (4 lectures)
References: