Optional Feature Vector Generation for Linear Value Function Approximation with Binary Features

Article Preview

Abstract:

Linear value function approximation with binary features is important in the research of Reinforcement Learning (RL). When updating the value function, it is necessary to generate a feature vector which contains the features that should be updated. In high dimensional domains, the generation process will take lot more time, which reduces the performance of algorithm a lot. Hence, this paper introduces Optional Feature Vector Generation (OFVG) algorithm as an improved method to generate feature vectors that can be combined with any online, value-based RL method that uses and expands binary features. This paper shows empirically that OFVG performs well in high dimensional domains.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 756-759)

Pages:

3967-3971

Citation:

Online since:

September 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Sutton, R.S. and Barto, A.G. (1998) Reinforcement Learning: An Introduction, MIT Press.

Google Scholar

[2] Parr, R., Li, L., Taylor, G., Painter-Wakefield, C., and Littman, M. (2008).

Google Scholar

[3] Tsitsiklis, J.N. and Van R.B. (1997) An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, 42(5): 674-690.

DOI: 10.1109/9.580874

Google Scholar

[4] Parr, R., Painter-Wakefield, C., Li, L., and Littman, M. (2007) Analyzing feature generation for value-function approximation, in ICML 2007: Proceedings of the 24th International Conference on Machine learning, New York, NY, USA, pp.737-744.

DOI: 10.1145/1273496.1273589

Google Scholar

[5] Petrik, M., Taylor, G., Parr, R., and Zilberstein, S. (2010).

Google Scholar

[6] Geramifard, A., Doshi, F., Redding, J., Roy, N., and How, J.P. (2011) Online Discovery of Feature Dependencies,. Paper Presented at the 28th International Conference on Machine Learning, Bellevue, WA, USA.

Google Scholar

[7] Buro, M. (1998) From Simple Features to Sophisticated Evaluation Functions, in CG 1998: Proceedings of the First International Conference on Computers and Games, pp.126-145.

DOI: 10.1007/3-540-48957-6_8

Google Scholar

[8] Sutton, R.S. (1988) Learning to predict by the methods of temporal differences, in Machine Learning, 3: 9-44.

Google Scholar

[9] Gomes, E.R. and Kowalczyk, R. (2009) Dynamic Analysis of Multiagent Q-learning with e-greedy Exploration, in ICML 2009: Proceedings of the 26th International Conference on Machine Learning, Montreal, Canada, pp.369-376.

DOI: 10.1145/1553374.1553422

Google Scholar