Reinforcement Learning is the study of programs that improve
their performance by taking actions and receiving rewards and
punishments from the environment. Most methods for reinforcement
learning optimize the discounted total reward received by an agent,
while, in many domains, the natural criterion is to optimize the
average reward per time step. In this talk, we describe a model-based
method for average-reward reinforcement learning called H-learning and
show that it converges more quickly and robustly than its discounted
counterpart in the domain of scheduling simulated automatic guided
vehicles. I describe several extensions of H-Learning, including
automatic exploration using optimism under uncertainty, constraining
action models with dynamic Bayesian networks, and approximating the
value function using local linear regression. I show that all these
extensions are effective in significantly reducing the space
requirements of H-learning and making it converge faster in some
vehicle scheduling tasks.
Date: Thurs., April 9; Time: 4:15-5:30PM; Place: Gates 100
Return to seminar schedule.