
Overview
The course will focus on the principles and practice of decision making for autonomous agents, and robots in particular. We will cover rationality, decision theory, probabilistic reasoning, multiarm bandits, Markov decision processes, partially observable MDPs, beliefspace learning and planning, inverse reinforcement learning, and learning from demonstration.
Instructor
George Konidaris
Office: LSRC D224
Email: gdk at cs dot duke dot edu
[Back to top]
Prerequisites
There are no formal prerequisites.
However, note that it is a graduate class, so it will assume that you are familiar
with the necessary mathematics (this means probability, linear algebra, and multivariable calculus), and enough background in AI to be able to make a good attempt at reading research papers on these topics.
[Back to top]
Schedule
The first class is on August 25th. The class meets on Tuesdays and Thursdays
from 3:05pm  4:20pm, in Allen 318.
Date  Topic  Slides and Readings 
August 25th 
Introduction: Agents, Robots, Models, and Rationality 
Slides 
August 27th 
Probabilistic Reasoning 
Slides 
September 1st 
Utility Theory 
Slides 
September 3rd 
Multiarm Bandits 
Slides 
September 8th 
Contextual Bandits 
Slides
Li et al.: Contextual Bandits for News Article Recommendations 
September 10th 
Markov Decision Processes 
Slides 
September 15th 
Reinforcement Learning 
Slides 
September 17th 
No class 
Sutton and Barto, Chapters 38 
September 22nd 
Reinforcement Learning II 
Slides 
September 24th 
Reinforcement Learning III 
Slides 
September 29th 
Reinforcement Learning III (Policy Search) 
Slides 
October 1st 
Hierarchical RL 
Slides
Sutton, Precup, and Singh, 1999 
October 6th 
No class 

October 8th 
No class (Fall break) 

October 13th 
No class (Fall break) 

October 15th 
Review Session 

October 20th 
Midterm (ROOM 311, NORTH BUILDING) 

October 22nd 
Hierarchical RL (resumed) 
Assignment 1 due (before class)
Slides 
October 27th 
Learning from Demonstration 
Slides 
October 29th 
No class 

November 3rd 
Inverse Reinforcement Learning 
Slides
Abbeel and Ng, Inverse RL, ICML 2004. 
November 5th 
Partially observable MDPs 
Slides 
November 10th 
Kalman Filters 
Slides
An Introduction to the Kalman Filter, Welch and Bishop 
November 12th 
BeliefSpace Planning 
Slides 
November 17th 
Solution Methods for POMDPs 
Slides 
November 19th 
Revision 

November 24th 
Final day of graduate classes No class (Thanksgiving) 

[Back to top]
Assignments
Academic Honesty
 We take academic honesty very seriously. This matrix should leave no ambiguity about what is permitted and what is not permitted.
You should check if you have any confusion about what is permitted.
Lateness policy
 You may request an extension before the due date of the
assignment. Valid reasons for extensions include (but are not
necessarily limited to) interviews, travel
for research or academic purposes, and illness.
 Late assignments (without a previously granted extension) will be
penalized 10% per day. Assignments will not be accepted more than 5
days after the due date.
[Back to top]
Grading
Course evaluation will be as follows:
 Assignment 1 (25%)
 Midterm exam (25%)
 Assignment 2 (25%)
 Final exam (25%)
I expect all Duke students to
conduct themselves with the highest integrity, according to the
Duke Community Standard. If you are unsure what this means,
please refer to this link.
For a more concrete description, this matrix outlines what
forms of collaboration with others are and are not allowed during this
course.
[Back to top]
Resources
A very good introduction to the fundamentals of probability theory:
 Introduction to Probability, Bertsekas and Tsitsiklis.
[Amazon]
A useful guide to utility theory and uncertainty in MDPs:
 Decision Making Under Uncertainty: Theory and Application,
Kochenderfer. [Amazon]
Sutton and Barto: a great introduction to reinforcement learning (chapter 2 is on bandits):
[Back to top]
