u CompSci 590.01 - Causal Inference in Data Analysis >with Applications to Fairness and Explanations - Spring 2023

CompSci 590.01

Causal Inference in Data Analysis
with Applications to Fairness and Explanations

Spring 2023

Home
Updates
Logistics
Grading
Reading
Schedule

Announcements for Spring'23:

Overview

In this class, we will learn techniques to do formal and rigorous causal analysis based on observational (collected data), and see its applications in inferring fairness and explainability in data analysis. As commonly known as "Correlation is not Causation", the problem of causal inference goes far beyond simple correlation, association, or model-based prediction analysis, and is practically indispensable in health, medicine, social sciences, and other domains. For example, a medical researcher may want to find out whether a new drug is effective in curing cancer of a certain type, or an economist may want to understand whether a job-training program helps improve employment prospects. Causal inference lays the foundation of sound and robust policy making by providing a means to estimate the impact of a certain "intervention" to the world. While the gold standard of causal inference is performing randomized controlled experiments, often they are not possible due to ethical or cost reasons. Hence for practical applications, "observational studies" or causal inference based on observational data is used. A dataset can tell us very different stories based on how we look at it (e.g., Simpson Paradox), so it is important to understand the right way to look at a given dataset, in particular, what variables to condition on before making any conclusions, especially causal conclusions in data analysis. In this class we will discuss two models for observational causal inference: Probabilistic Graphical Causal Model (Pearl) more prevalent in Artificial Intelligence (AI) research, and the Potential Outcome Framework (Rubin) more prevalent in Statistical research, along with related concepts and techniques. We will also discuss recent applications of causal analysis to (1) Fairness and (2) eXplainable Artificial Intelligence (XAI). The growing concerns about the complexity and opacity of data-driven decision making systems, deployed for making consequential decisions in healthcare, criminal justice systems, and finance, has led to a surge of interest in research in these topics.

Prerequisites

There are no hard prerequisites, but this is an advanced graduate-level seminar and basic knowledge in CS, e.g., graphs, probability theory, algorithms, machine learning, databases (equivalent of CompSci 201, 230, 330, 371, 316) will be assumed, otherwise the students taking this class should be willing to learn preliminary concepts as needed themselves. Students should also be willing to read a number of research papers, book chapters, and other materials. We will revise some of the concepts used in this class (not all) as needed.

Time/Day

Tuesdays and Thursdays, 1:45 pm - 3:00 pm, LSRC D106

Instructor

Sudeepa Roy

Office hour: Tuesdays 3-4 pm, LSRC D316

Grading

This course will have a mix of lectures and presentation of research papers, and will require reading and critiquing research papers. The grade will be given based on class participation, presenting and leading discussions on 1-2 research papers, a few assignments, and a semester-long class project in small groups.

There are not any exams. This also means that your grade will significantly depend on your participation and discussions in the class, along with your presentation, assignments, and the class project. We assume that you are taking this class because you are interested in this topic, and having experience in reading papers and doing research/project. This is a relatively small class and the instructor expects to know and frequently interact with each of you. If you think you might miss several lectures and others' presentations, this class might not be a good fit for you.

Grading criteria:


Reading

Books

  • [Primer] Judea Pearl, Madelyn Glymour, Nicholas P. Jewell, Causal Inference in Statistics: A Primer[link] (available from Duke online library on loan)

  • [Causality] Judea Pearl - Causality - Models, Reasoning, and Inference, 2nd Edition, 2009[link] (available from Duke online library on loan)

  • [Why] Judea Pearl and Dana Mackenzie: The Book of Why: The New Science of Cause and Effect [link] (available from Duke online library on loan)

  • (more to be added)

Papers

Please see the Google Doc - link shared on Ed

Schedule



(mix of lectures and paper presentations, the first class on Jan 17 will be an introduction to causal inference, overview, and logistics information)

Slides will be uploaded after the lectures/presentations to keep the class interactive and brainstorm together

  Day Topic                 Slides            Reading Comments/Optional Reading
1 1/12 (Th) No class
2 1/17 (T) Overview and intro to Causal Inference Lecture-1
3 1/19 (Th) Simpson's Paradox, d-Separation, Structural/Graphical Causal Models Lecture-2 [Primer] Ch 1, Ch 2.1-2.4
4 1/24 (T) Intervention - Adjustment formula, backdoor criterion Lecture-3 [Primer] Ch 3
5 1/26 (Th) Counterfactuals Lecture-4 [Primer] Ch 4
6 1/31 (T) Rubin's potential outcome framework & statistical causal inference methods Lecture-5
7 2/2 (Th) Intro to fairness & causality for fairness Lecture-6
8 2/7 (T) Contd.
9 2/9 (Th) Intro to explanations & causality for explanations Lecture-7
10 2/14 (T) Causality for time-series data Presentation-1

Nathan and Gaurav
11 2/16 (Th) Almost Matching Exactly for causal inference Presentation-2

Yiyang and Zhehan
12 2/21 (T) Instrumental variables in causal inference Presentation-3

Sakina and Shota
Angrist & Imbens (1995) and Syrgkanis et al. (NeurIPS 2019)
13 2/23 (Th) GNN and GNN Explainer Presentation-4

Hanze and Jinze
Ying et al. (NeurIPS 2019)
14 2/28 (T) Do-calculus, data fusion, and Transportability for visual recognition Presentation-5

Frankie and Zach
Bareinboim and Pearl (PNAS 2016) and Mao et al. (CVPR 2022)
15 3/2 (Th) Interpretability vs. Explainability Presentation-6

Hayoung and Ryan
Rudin (Nature Machine Intelligence 2019) and Zhao and Hastie (2018)
16 3/7 (T) Counterfactual explanations Presentation-7

Ghazal and Srikar
Slack et al. (NeurIPS 2021) and Mothilal et al. (Fat* 2020)
17 3/9 (Th) Causal inference for relational data & review of topics so far Lecture-8

Sudeepa
Salimi et al. (SIGMOD 2020) and Galhotra et al. (SIGMOD 2022)
18 3/14 (T) No class - Spring break
19 3/16 (Th) No class - Spring break
20 3/21 (T) Study of bias in applications & Counterfactual Fairness Presentation-8

Jinyi and Yu
Obermeyer et al., (Science 2019) and Kusner et al., (NeurIPS 2017)
21 3/23 (Th) Fairnessin database research - Ranking and Selection Presentation-9

Fangzhu and Yuxi
Shetiya et al. (ICDE 2022) and Asudeh et al., (SIGMOD 2019)
22 3/28 (T) Explainable ML classifiers (LIME and ANCHOR) Presentation-10

Jason and Keyu
Ribeiro et al. (KDD 2016) and (AAAI 2018)
23 3/30 (Th) Explainable ML classifiers (SHAP) and Adversarial Attack Presentation-11

Lecture-9

Theo and Sudeepa
Lundberg and Lee (NeurIPS 2017) and Slack et al. (AIES 2020)
24 4/4 (T) Matching and Scalable Matching for Causal Inference with Continuous Covariates Lecture-10

Guest lecture by Harsh Parikh
Parikh et al. (JMLR 2023)

Whiteboard lecture
25 4/6 (Th) Auditing and Validating Causal Inference Methods Lecture-11

Guest lecture by Harsh Parikh
Parikh et al. (ICML 2023)
26 4/11 (T) Fairness and Proxy variables Presentation-12

Kiki and Tamara
Chen et al. (FAT* 2019) and Galhotra et al. (Entropy 2021)
27 4/13 (Th) Project presentations
28 4/18 (T) Project presentations