u CompSci 590.01 - Causal Inference in Data Analysis >with Applications to Fairness and Explanations - Spring 2023

# Causal Inference in Data Analysis with Applications to Fairness and Explanations

## Announcements for Spring'23:

• Welcome to CompSci 590.01 - Spring 2023!
• The first class will be on Tuesday January 17. There won't be any class on Thursday January 12 - we will make up for it later.
• Please check Ed/Sakai frequently for all logistics information and announcements, reading material, references, pointers to datasets, presentation information etc. If you have any questions, please reach out to the instructor.

## Overview

In this class, we will learn techniques to do formal and rigorous causal analysis based on observational (collected data), and see its applications in inferring fairness and explainability in data analysis. As commonly known as "Correlation is not Causation", the problem of causal inference goes far beyond simple correlation, association, or model-based prediction analysis, and is practically indispensable in health, medicine, social sciences, and other domains. For example, a medical researcher may want to find out whether a new drug is effective in curing cancer of a certain type, or an economist may want to understand whether a job-training program helps improve employment prospects. Causal inference lays the foundation of sound and robust policy making by providing a means to estimate the impact of a certain "intervention" to the world. While the gold standard of causal inference is performing randomized controlled experiments, often they are not possible due to ethical or cost reasons. Hence for practical applications, "observational studies" or causal inference based on observational data is used. A dataset can tell us very different stories based on how we look at it (e.g., Simpson Paradox), so it is important to understand the right way to look at a given dataset, in particular, what variables to condition on before making any conclusions, especially causal conclusions in data analysis. In this class we will discuss two models for observational causal inference: Probabilistic Graphical Causal Model (Pearl) more prevalent in Artificial Intelligence (AI) research, and the Potential Outcome Framework (Rubin) more prevalent in Statistical research, along with related concepts and techniques. We will also discuss recent applications of causal analysis to (1) Fairness and (2) eXplainable Artificial Intelligence (XAI). The growing concerns about the complexity and opacity of data-driven decision making systems, deployed for making consequential decisions in healthcare, criminal justice systems, and finance, has led to a surge of interest in research in these topics.

## Prerequisites

There are no hard prerequisites, but this is an advanced graduate-level seminar and basic knowledge in CS, e.g., graphs, probability theory, algorithms, machine learning, databases (equivalent of CompSci 201, 230, 330, 371, 316) will be assumed, otherwise the students taking this class should be willing to learn preliminary concepts as needed themselves. Students should also be willing to read a number of research papers, book chapters, and other materials. We will revise some of the concepts used in this class (not all) as needed.

## Time/Day

Tuesdays and Thursdays, 1:45 pm - 3:00 pm, LSRC D106

## Instructor

Sudeepa Roy

Office hour: Tuesdays 3-4 pm, LSRC D316

This course will have a mix of lectures and presentation of research papers, and will require reading and critiquing research papers. The grade will be given based on class participation, presenting and leading discussions on 1-2 research papers, a few assignments, and a semester-long class project in small groups.

There are not any exams. This also means that your grade will significantly depend on your participation and discussions in the class, along with your presentation, assignments, and the class project. We assume that you are taking this class because you are interested in this topic, and having experience in reading papers and doing research/project. This is a relatively small class and the instructor expects to know and frequently interact with each of you. If you think you might miss several lectures and others' presentations, this class might not be a good fit for you.

• Class participation (15%): This includes both attending lectures and frequent participation in classes including presentations led by other students. If you think you might miss more than 3 classes during the semester, talk to the instructor early.

• Assignments (15%): There will be a small number of (2-3) assignments during the semester, and depending on the assignments, we may have peer grading supervised by the instructor.
2/3rd of the assignment grades will be on short paper reviews, one per presentation, two excused, details on Ed.

• Presentation and leading discussion of a research topic (25%): We will post a list of potential research papers and topics. You can select a topic and 1-2 important papers on that topic to present and lead the discussion of in a class. Depending on the number of students enrolled and their interests, and number of important papers on that topic, it may be done in small groups of 1-2 students. Some topics may require > 1 presentations. Feel free to choose a topic related to your class project. Students are expected to cover the basics before presenting the research paper -- e.g., if you choose the topic "explainability of GNN", you should first give an overview of GNN. Note that all students are expected to read the papers and participate in the discussions, not only the students who are presenting/leading the discussions.

Presenters will send their slides to Sudeepa two days before the presentation, and will have a meeting with Sudeepa for feedback before the presentation the day before presentation.

• Class project (45%): There will be a semester-long class project on a topic of your interest and relevant to the class in small groups of 2-3 students. We will post some possible topics. It can be an open-ended research project that can potentially be a paper (you are encouraged to do so, especially if you are a PhD student or an MS/undergraduate student considering doing a PhD later - your effort decides the grade not the end results), implementation and analysis of algorithms, building a tool with GUI for an application related to causal inference, or analyzing real and synthetic datasets for a problem. Projects focusing on only reading papers/writing surveys are discouraged. There will be three checkpoints - an initial proposal, midterm update, and final report, and you are also encouraged to meet the instructor briefly and every few weeks. There will be a short in-class presentation at the end. Project grades will take into account your efforts/ results, and quality of related work survey, presentation, and final report.

### Books

• [Primer] Judea Pearl, Madelyn Glymour, Nicholas P. Jewell, Causal Inference in Statistics: A Primer[link] (available from Duke online library on loan)

• [Causality] Judea Pearl - Causality - Models, Reasoning, and Inference, 2nd Edition, 2009[link] (available from Duke online library on loan)

• [Why] Judea Pearl and Dana Mackenzie: The Book of Why: The New Science of Cause and Effect [link] (available from Duke online library on loan)

## Schedule

(mix of lectures and paper presentations, the first class on Jan 17 will be an introduction to causal inference, overview, and logistics information)

Slides will be uploaded after the lectures/presentations to keep the class interactive and brainstorm together

1 1/12 (Th) No class
2 1/17 (T) Overview and intro to Causal Inference Lecture-1
3 1/19 (Th) Simpson's Paradox, d-Separation, Structural/Graphical Causal Models Lecture-2 [Primer] Ch 1, Ch 2.1-2.4
4 1/24 (T) Intervention - Adjustment formula, backdoor criterion Lecture-3 [Primer] Ch 3
5 1/26 (Th) Counterfactuals Lecture-4 [Primer] Ch 4
6 1/31 (T) Rubin's potential outcome framework & statistical causal inference methods Lecture-5
7 2/2 (Th) Intro to fairness & causality for fairness Lecture-6
8 2/7 (T) Contd.
9 2/9 (Th) Intro to explanations & causality for explanations Lecture-7
10 2/14 (T) Causality for time-series data Presentation-1

Nathan and Gaurav
11 2/16 (Th) Almost Matching Exactly for causal inference Presentation-2

Yiyang and Zhehan
12 2/21 (T) Instrumental variables in causal inference Presentation-3

Sakina and Shota
Angrist & Imbens (1995) and Syrgkanis et al. (NeurIPS 2019)
13 2/23 (Th) GNN and GNN Explainer Presentation-4

Hanze and Jinze
Ying et al. (NeurIPS 2019)
14 2/28 (T) Do-calculus, data fusion, and Transportability for visual recognition Presentation-5

Frankie and Zach
Bareinboim and Pearl (PNAS 2016) and Mao et al. (CVPR 2022)
15 3/2 (Th) Interpretability vs. Explainability Presentation-6

Hayoung and Ryan
Rudin (Nature Machine Intelligence 2019) and Zhao and Hastie (2018)
16 3/7 (T) Counterfactual explanations Presentation-7

Ghazal and Srikar
Slack et al. (NeurIPS 2021) and Mothilal et al. (Fat* 2020)
17 3/9 (Th) Causal inference for relational data & review of topics so far Lecture-8

Sudeepa
Salimi et al. (SIGMOD 2020) and Galhotra et al. (SIGMOD 2022)
18 3/14 (T) No class - Spring break
19 3/16 (Th) No class - Spring break
20 3/21 (T) Study of bias in applications & Counterfactual Fairness Presentation-8

Jinyi and Yu
Obermeyer et al., (Science 2019) and Kusner et al., (NeurIPS 2017)
21 3/23 (Th) Fairnessin database research - Ranking and Selection Presentation-9

Fangzhu and Yuxi
Shetiya et al. (ICDE 2022) and Asudeh et al., (SIGMOD 2019)
22 3/28 (T) Explainable ML classifiers (LIME and ANCHOR) Presentation-10

Jason and Keyu
Ribeiro et al. (KDD 2016) and (AAAI 2018)
23 3/30 (Th) Explainable ML classifiers (SHAP) and Adversarial Attack Presentation-11

Lecture-9

Theo and Sudeepa
Lundberg and Lee (NeurIPS 2017) and Slack et al. (AIES 2020)
24 4/4 (T) Matching and Scalable Matching for Causal Inference with Continuous Covariates Lecture-10

Guest lecture by Harsh Parikh
Parikh et al. (JMLR 2023)

Whiteboard lecture
25 4/6 (Th) Auditing and Validating Causal Inference Methods Lecture-11

Guest lecture by Harsh Parikh
Parikh et al. (ICML 2023)
26 4/11 (T) Fairness and Proxy variables Presentation-12

Kiki and Tamara
Chen et al. (FAT* 2019) and Galhotra et al. (Entropy 2021)
27 4/13 (Th) Project presentations
28 4/18 (T) Project presentations