Teach Time Encyclopedia - Learn About Our World
Home Page
Teach Time
Featured Topics

United States
by state

CITYology

Academic Disciplines

Historical Timelines

Themed Timelines

Calendars

Reference Tables

Biographies

How-tos



Saturday, July 26, 2008

Reinforcement learning

A class of problems in machine learning which postulate an agent exploring an environment in which the agent perceives its current state and takes actions. The environment, in return, provides a reward (which can be positive or negative). Reinforcement learning algorithms attempt to find a policy for maximizing cumulative reward for the agent over the course of the problem.

The environment is typically formulated as a finite-state Markov decision process (MDP), and reinforcement learning algorithms for this context are highly related to dynamic programming techniques. State transition probabilities and reward probabilities in the MDP are typically stochastic but stationary over the course of the problem.

Reinforcement learning differs from the supervised learning problem in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).

Formally, the basic reinforcement learning model consists of:

  1. a set of environment states S;
  2. a set of actions A; and
  3. a set of scalar "rewards" in ℜ.

At each time t, the agent perceives its state stS and the set of possible actions A(st). It chooses an action aA(st) and receives from the environment the new state st+1 and a reward rt+1. Based on these interactions, the reinforcement learning agent must develop a policy π:SA which maximizes the quantity r0+r1+...+rn for MDPs which have a terminal state, or the quantity Σtγtrt for MDPs without terminal states (where γ is some "future reward" discounting factor between 0.0 and 1.0).

Reinforcement learning applies particularly well to problems where long-term reward can be had at the expense of short-term reward, this class of problems is normally handled using a reinforcement learning technique known as Temporal Difference. It has been applied successfully to various problems, including robot control, elevator scheduling, and backgammon.

References

Leslie Kaelbling, Michael Littman, Andrew Moore. Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4 (1996) pp. 237–285. (CiteSeer reference)

Richard Sutton and Andrew Barto. Reinforcement Learning. MIT Press, 1998. (available online)



Internet Hotel Solutions

Site Sponsors
AC Units
Baltimore Harbor
Boot Camp Grads
Bra Size
Burkittsville
College Hotels
Digital Harbor
Free Cell Phones
Golden Hare Travel
Golf Vacations
Golf Courses
Gourmet
Hair Styles
Hippodrome
iWoman
Lesson Plans
Maryland Hotels
MD Genealogy
Minor League Stuff
Motel Site
Ocean City
OC Real Estate
Old Agers
Office Supplies
Orlando
Pet Friendly Hotel
Room Prices
Savannah, GA
Ski Vacations
South Baltimore
Student Teaching
Travel Sources
University Hotels
Visit Military Bases
Washington, DC

Brought to you by NoChildLeftBehind.com and the Beaches and Towns Network, LLC.