LSPI: Least-Squares Policy Iteration
Introduction
Least-Squares Policy Iteration (LSPI)
is a reinforcement learning algorithm designed to solve control
problems. It uses value function approximation to cope with large state
spaces and batch processing for efficient use of training data. LSPI has
been used successfully to solve several large scale problems using
relatively few training data. This page contains information about LSPI,
examples, research papers, and a code distribution that can be used for
academic and/or research purposes.
Authors
Michail G. Lagoudakis
Ph.D. Candidate,
Department of Computer Science, Duke University
mgl @ cs . duke . edu
Ronald
Parr
Assistant
Professor, Department of Computer Science, Duke University
parr @ cs . duke . edu
Papers
This is the paper that introduced LSPI:
Model-Free
Least-Squares Policy Iteration
Michail G. Lagoudakis and Ronald Parr
Proceedings of NIPS*2001: Neural Information Processing Systems:
Natural and Synthetic
Vancouver, BC, December 2001, pp. 1547-1554.
A longer journal version is also available:
Least-Squares Policy Iteration
Michail G. Lagoudakis and Ronald Parr
Journal of Machine
Learning Research, 4, 2003, pp. 1107-1149.
Several other papers have been published since then. They are available
from Michail's publications page.
LSPI Code Distribution
This is a MatLab implementation of LSPI with certain parts written in
C. It should run on any Unix or Linux architecture with MatLab installed
without any problems. It has not been tested on a Windows machine.
At the moment, the distribution includes the core LSPI code, the chain
and the pendulum domain. Additional domains will be added soon. Check
back for updates!
Distribution and use of this code is subject to the following agreement:
This Program is provided by Duke University and the
authors as a service to the research community. It is provided without
cost or restrictions, except for the User's acknowledgement that the
Program is provided on an "As Is" basis and User understands that Duke
University and the authors make no express or implied warranty of any
kind. Duke University and the authors specifically disclaim any implied
warranty or merchantability or fitness for a particular purpose, and
make no representations or warranties that the Program will not infringe
the intellectual property rights of others. The User agrees to indemnify
and hold harmless Duke University and the authors from and against any
and all liability arising out of User's use of the Program.
LSPI
- Download: lspi.tar.gz (Last Update: Nov
8, 2002)
- Decompress: % gunzip lspi.tar.gz
- Unpack: % tar -xvf lspi.tar
- Read the README file in the LSPI directory
CHAIN
- Download: chain.tar.gz (Last Update:
Nov 8, 2002)
- Decompress: % gunzip chain.tar.gz
- Unpack: % tar -xvf chain.tar
- Read the README file in the CHAIN directory
PENDULUM
- Download: pendulum.tar.gz (Last
Update: Jan 16, 2003)
- Decompress: % gunzip pendulum.tar.gz
- Unpack: % tar -xvf pendulum.tar
- Read the README file in the PENDULUM directory
Email Michail at mgl @ cs . duke . edu if you encounter any problems.