These resources are listed here in case you want to explore the topics further. The section on programming is important if you are not familiar with Python, or with installing Python libraries.

all   open all close all

  • S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning, Cambridge University Press, 2014.
  • M. P. Deisenroth, A. A. Faisal, and C. S. Ong, Mathematics for Machine Learning, Cambridge University Press, 2020. This is good also for the prerequisites.
  • I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press 2016
  • C. M. Bishop, Pattern Recognition and Machine Learning, Springer 2006
  • T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning — Data Mining, Inference, and Prediction, Springer 2001
  • K. P. Murphy, Machine Learning — A Probabilistic Perspective, MIT Press 2012


  • Anaconda is a stable and coherent distribution of Python for data science. It is strongly recommended that you uninstall any version of Python 3 you may have on your computer and install the Anaconda version 3.9 (or later, if available). This distribution includes Python, several basic libraries for data science (numpy, scipy, and more), visualization libraries (including matplotlib), machine learning libraries, including scikit-learn and PyTorch. This is essentially all you need for this course, except for a good IDE (see next bullet). This distribution places all relevant files in the appropriate places, and you won't have to struggle with linking libraries, etc. Once you installed Anaconda, run the Anaconda Navigator and familiarize yourself with the tools. Pay attention in particular to the Jupiter notebook launcher, as you will submit homework as Jupiter notebooks.
  • Any program that is longer than a few lines of code requires debugging, and debugging is a nightmare in a Python notebook. You are urged to download the (free) PyCharm Integrated Development Environment (IDE). If you do a lot of programming outside this course, you may want to download the professional version, which is available for free here if you access that page from a Duke computer. The professional version has tools that are very useful for professional development but you won't need in this course.
  • Google's Python class is a leisurely but clear Python tutorial.
  • The official Python 3 Documentation also includes a tutorial. Use the library reference and the language reference as your official sources of information about Python 3. You can also find information by googling, but make sure you refer to version 3 of Python if you do so.
  • Several tutorials on Jupyter notebooks can be found online. Here is one from Dataquest.

Most of these data sets are available for easy download from within most of the learning packages and frameworks mentioned in the section on software above. However, it is still useful to peruse the original sites above for details on each data set.

COMPSCI 371, Duke University, Site based on the fluid 960 grid system