Announcements for Spring'23:
 Welcome to CompSci 590.01  Spring 2023!
 The first class will be on Tuesday January 17. There won't be any class on Thursday January 12  we will make up for it later.

Please check Ed/Sakai frequently for all logistics information and announcements, reading material, references, pointers to datasets, presentation information etc. If you have any questions, please reach out to the instructor.
Overview
In this class, we will learn techniques to do formal and rigorous causal analysis based on observational (collected data), and see its applications in inferring fairness and explainability in data analysis. As commonly known as "Correlation is not Causation", the problem of causal inference goes far beyond simple correlation, association, or modelbased prediction analysis, and is practically indispensable in health, medicine, social sciences, and other domains. For example, a medical researcher may want to find out whether a new drug is effective in curing cancer of a certain type, or an economist may want to understand whether a jobtraining program helps improve employment prospects. Causal inference lays the foundation of sound and robust policy making by providing a means to estimate the impact of a certain "intervention" to the world. While the gold standard of causal inference is performing randomized controlled experiments, often they are not possible due to ethical or cost reasons. Hence for practical applications, "observational studies" or causal inference based on observational data is used. A dataset can tell us very different stories based on how we look at it (e.g., Simpson Paradox), so it is important to understand the right way to look at a given dataset, in particular, what variables to condition on before making any conclusions, especially causal conclusions in data analysis. In this class we will discuss two models for observational causal inference: Probabilistic Graphical Causal Model (Pearl) more prevalent in Artificial Intelligence (AI) research, and the Potential Outcome Framework (Rubin) more prevalent in Statistical research, along with related concepts and techniques. We will also discuss recent applications of causal analysis to (1) Fairness and (2) eXplainable Artificial Intelligence (XAI). The growing concerns about the complexity and opacity of datadriven decision making systems, deployed for making consequential decisions in healthcare, criminal justice systems, and finance, has led to a surge of interest in research in these topics.Prerequisites
There are no hard prerequisites, but this is an advanced graduatelevel seminar and basic knowledge in CS, e.g., graphs, probability theory, algorithms, machine learning, databases (equivalent of CompSci 201, 230, 330, 371, 316) will be assumed, otherwise the students taking this class should be willing to learn preliminary concepts as needed themselves. Students should also be willing to read a number of research papers, book chapters, and other materials. We will revise some of the concepts used in this class (not all) as needed.Time/Day
Tuesdays and Thursdays, 1:45 pm  3:00 pm, LSRC D106Instructor
Sudeepa RoyOffice hour: Tuesdays 34 pm, LSRC D316
Grading
This course will have a mix of lectures and presentation of research papers, and will require reading and critiquing research papers. The grade will be given based on class participation, presenting and leading discussions on 12 research papers, a few assignments, and a semesterlong class project in small groups.There are not any exams. This also means that your grade will significantly depend on your participation and discussions in the class, along with your presentation, assignments, and the class project. We assume that you are taking this class because you are interested in this topic, and having experience in reading papers and doing research/project. This is a relatively small class and the instructor expects to know and frequently interact with each of you. If you think you might miss several lectures and others' presentations, this class might not be a good fit for you.
Grading criteria:
 Class participation (15%):
This includes both attending lectures and frequent participation in classes including presentations led by other students. If you think you might miss more than 3 classes during the semester, talk to the instructor early.
 Assignments (15%):
There will be a small number of (23) assignments during the semester, and depending on the assignments, we may have peer grading supervised by the instructor.
2/3rd of the assignment grades will be on short paper reviews, one per presentation, two excused, details on Ed.
 Presentation and leading discussion of a research topic (25%):
We will post a list of potential research papers and topics. You can select a topic and 12 important papers on that topic to present and lead the discussion of in a class. Depending on the number of students enrolled and their interests, and number of important papers on that topic, it may be done in small groups of 12 students. Some topics may require > 1 presentations. Feel free to choose a topic related to your class project. Students are expected to cover the basics before presenting the research paper  e.g., if you choose the topic "explainability of GNN", you should first give an overview of GNN. Note that all students are expected to read the papers and participate in the discussions, not only the students who are presenting/leading the discussions.
Presenters will send their slides to Sudeepa two days before the presentation, and will have a meeting with Sudeepa for feedback before the presentation the day before presentation.
 Class project (45%):
There will be a semesterlong class project on a topic of your interest and relevant to the class in small groups of 23 students. We will post some possible topics. It can be an openended research project that can potentially be a paper (you are encouraged to do so, especially if you are a PhD student or an MS/undergraduate student considering doing a PhD later  your effort decides the grade not the end results), implementation and analysis of algorithms, building a tool with GUI for an application related to causal inference, or analyzing real and synthetic datasets for a problem. Projects focusing on only reading papers/writing surveys are discouraged. There will be three checkpoints  an initial proposal, midterm update, and final report, and you are also encouraged to meet the instructor briefly and every few weeks. There will be a short inclass presentation at the end. Project grades will take into account your efforts/ results, and quality of related work survey, presentation, and final report.
Reading
Books
 [Primer] Judea Pearl, Madelyn Glymour, Nicholas P. Jewell, Causal Inference in Statistics: A Primer[link] (available from Duke online library on loan)
 [Causality] Judea Pearl  Causality  Models, Reasoning, and Inference, 2nd Edition, 2009[link] (available from Duke online library on loan)
 [Why] Judea Pearl and Dana Mackenzie: The Book of Why: The New Science of Cause and Effect [link]
(available from Duke online library on loan)
 (more to be added)
Papers
Please see the Google Doc  link shared on EdSchedule
(mix of lectures and paper presentations, the first class on Jan 17 will be an introduction to causal inference, overview, and logistics information)
Slides will be uploaded after the lectures/presentations to keep the class interactive and brainstorm together
Day  Topic  Slides  Reading  Comments/Optional Reading  

1  1/12 (Th)  No class  
2  1/17 (T)  Overview and intro to Causal Inference  Lecture1  
3  1/19 (Th)  Simpson's Paradox, dSeparation, Structural/Graphical Causal Models  Lecture2  [Primer] Ch 1, Ch 2.12.4  
4  1/24 (T)  Intervention  Adjustment formula, backdoor criterion  Lecture3  [Primer] Ch 3  
5  1/26 (Th)  Counterfactuals  Lecture4  [Primer] Ch 4  
6  1/31 (T)  Rubin's potential outcome framework & statistical causal inference methods  Lecture5  
7  2/2 (Th)  Intro to fairness & causality for fairness  Lecture6  
8  2/7 (T)  Contd.  
9  2/9 (Th)  Intro to explanations & causality for explanations  Lecture7  
10  2/14 (T)  Causality for timeseries data 
Presentation1 Nathan and Gaurav 

11  2/16 (Th)  Almost Matching Exactly for causal inference  Presentation2 Yiyang and Zhehan 

12  2/21 (T)  Instrumental variables in causal inference  Presentation3 Sakina and Shota 
Angrist & Imbens (1995) and Syrgkanis et al. (NeurIPS 2019)  
13  2/23 (Th)  GNN and GNN Explainer  Presentation4 Hanze and Jinze 
Ying et al. (NeurIPS 2019)  
14  2/28 (T)  Docalculus, data fusion, and Transportability for visual recognition  Presentation5 Frankie and Zach 
Bareinboim and Pearl (PNAS 2016) and Mao et al. (CVPR 2022)  
15  3/2 (Th)  Interpretability vs. Explainability  Presentation6 Hayoung and Ryan 
Rudin (Nature Machine Intelligence 2019) and Zhao and Hastie (2018)  
16  3/7 (T)  Counterfactual explanations 
Presentation7 Ghazal and Srikar 
Slack et al. (NeurIPS 2021) and Mothilal et al. (Fat* 2020)  
17  3/9 (Th)  Causal inference for relational data & review of topics so far 
Lecture8 Sudeepa 
Salimi et al. (SIGMOD 2020) and Galhotra et al. (SIGMOD 2022)  
18  3/14 (T)  No class  Spring break  
19  3/16 (Th)  No class  Spring break  
20  3/21 (T)  Study of bias in applications & Counterfactual Fairness 
Presentation8 Jinyi and Yu 
Obermeyer et al., (Science 2019) and Kusner et al., (NeurIPS 2017)  
21  3/23 (Th)  Fairnessin database research  Ranking and Selection  Presentation9 Fangzhu and Yuxi 
Shetiya et al. (ICDE 2022) and Asudeh et al., (SIGMOD 2019)  
22  3/28 (T)  Explainable ML classifiers (LIME and ANCHOR)  Presentation10 Jason and Keyu 
Ribeiro et al. (KDD 2016) and (AAAI 2018)  
23  3/30 (Th)  Explainable ML classifiers (SHAP) and Adversarial Attack 
Presentation11 Lecture9 Theo and Sudeepa 
Lundberg and Lee (NeurIPS 2017) and Slack et al. (AIES 2020)  
24  4/4 (T)  Matching and Scalable Matching for Causal Inference with Continuous Covariates 
Lecture10 Guest lecture by Harsh Parikh 
Parikh et al. (JMLR 2023) Whiteboard lecture 

25  4/6 (Th)  Auditing and Validating Causal Inference Methods  Lecture11 Guest lecture by Harsh Parikh 
Parikh et al. (ICML 2023)  
26  4/11 (T)  Fairness and Proxy variables  Presentation12 Kiki and Tamara 
Chen et al. (FAT* 2019) and Galhotra et al. (Entropy 2021)  
27  4/13 (Th)  Project presentations  
28  4/18 (T)  Project presentations 