Seminar on Computational Learning and Adaptation




Average Reward Reinforcement Learning


Prasad Tadepalli
Department of Computer Science
Oregon State University
Corvallis, OR
tadepall@cs.berkeley.edu



Reinforcement Learning is the study of programs that improve their performance by taking actions and receiving rewards and punishments from the environment. Most methods for reinforcement learning optimize the discounted total reward received by an agent, while, in many domains, the natural criterion is to optimize the average reward per time step. In this talk, we describe a model-based method for average-reward reinforcement learning called H-learning and show that it converges more quickly and robustly than its discounted counterpart in the domain of scheduling simulated automatic guided vehicles. I describe several extensions of H-Learning, including automatic exploration using optimism under uncertainty, constraining action models with dynamic Bayesian networks, and approximating the value function using local linear regression. I show that all these extensions are effective in significantly reducing the space requirements of H-learning and making it converge faster in some vehicle scheduling tasks.


Date: Thurs., April 9; Time: 4:15-5:30PM; Place: Gates 100


The goal of this seminar is to increase communication among local researchers with interests in computational approaches to learning and adaptation. If you would like to be added to (or removed from) the mailing list, or if you are interested in giving a talk in the seminar, please send email to iba@isle.org.


Return to seminar schedule.