Point-Based Monte Carto Online Planning in POMDPs

Article Preview

Abstract:

The online planning and learning in partially observable Markov decision processes are often intractable because belief states space has two curses: dimensionality and history. In order to address this problem, this paper proposes a point-based Monte Carto online planning approach in POMDPs. This approach involves performing value backup at specific reachable belief points, rather than over the entire belief simplex, to speed up computation processes. Then Monte Carlo tree search algorithm is exploited to share the value of actions across each subtree of the search tree so as to minimise the mean squared error. The experimental results show that the proposed algorithm is effective in real-time system.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 846-847)

Pages:

1388-1391

Citation:

Online since:

November 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] J. Pineau, G. Gordon and Thrun S, Anytime point-based approximations for large POMDPs. J Artif Intell Res, vol. 27(2006), pp.335-380.

DOI: 10.1613/jair.2078

Google Scholar

[2] M. T. J. Spaan and N. Vlassis, Perseus: randomized point-based value iteration for POMDPs. J Artif Intell Res, vol. 24 (2005), pp.195-220.

DOI: 10.1613/jair.1659

Google Scholar

[3] T. Smith and R. Simmons, Point-based POMDP algorithms: improved analysis and implementation, in: Proceedings of uncertainty in artificial intelligence(UAI-05), Cambridge, MA, pp.542-547. (2005).

Google Scholar

[4] H. B. McMahan, M. Likhachev and G. Gordon, Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees, in: Proceedings of the twenty-second international conference on machine learning, New York, pp.569-576. (2005).

DOI: 10.1145/1102351.1102423

Google Scholar

[5] G. Shani, R. Brafman and S. Shimony S, Forward search value iteration for POMDPs, in: Proceedings of the 20th international joint conference on artificial intelligence, Hyderabad, India, pp.2619-2624. (2007).

Google Scholar

[6] D. McAllester and S. Singh, Approximate planning for factored POMDPs using belief state simplification, in: Proceedings of the fifteenth conference on uncertainty in artificial intelligence, Stockholm, Sweden, pp.409-416. (1999).

Google Scholar

[7] S. Paquet, L. Tobin and B. Chaib-draa, An online POMDP algorithm for complex multiagent environments. in: Proceedings of fourth international joint conference on autonomous agents and multiagent systems, Utrecht, Netherlands, pp.970-977. (2005).

DOI: 10.1145/1082473.1082620

Google Scholar

[8] S. Ross and B. Chaib-draa, Aems: an anytime online search algorithm for approximate policy refinement in large POMDPs, in: Proceedings of the 20th international joint conference on artificial intelligence, Hyderabad, India, pp.2592-2598. (2007).

Google Scholar

[9] S. Gelly and D. Silver, Monte-Carlo tree search and rapid action value estimation in computer Go, Artificial Intelligence, vol. 175(11) (2011), pp.1856-1875.

DOI: 10.1016/j.artint.2011.03.007

Google Scholar