A key challenge for AI is how to learn, plan, and represent knowledge at multiple levels of temporal abstraction. In this talk I develop an approach based on the mathematical framework of reinforcement learning and Markov decision processes (MDPs). The usual framework is extended to include closed-loop multi-step options---whole courses of behavior that may be temporally extended, stochastic, and contingent on events. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint torques. Options can be used interchangeably with primitive actions in reinforcement learning and planning methods, and can be analyzed in terms of a generalized kind of MDP known as a semi-Markov decision process (SMDP) (e.g., Puterman, 1994; Bradtke and Duff, 1995; Parr, 1998; Precup and Sutton, 1997). In this talk I focus on the interplay between the MDP and SMDP levels of analysis. I show how a set of options can be improved by changing their termination conditions to improve over SMDP planning methods with no additional cost. I also present novel intra-option temporal-difference methods that substantially improve over SMDP methods. Finally, I discuss how options themselves can be learned, introducing a new notion of subgoal and subtask into reinforcement learning. Overall, I argue that options and models of options provide hitherto missing aspects of a powerful, clear, and expressive framework for representing and organizing knowledge.
[This is joint work with Doina Precup and Satinder Singh.]
Date: Thurs., March 5; Time: 4:15-5:30PM; Place: Gates 100
Return to seminar schedule.