SHP-VI Method of Solving DEC-POMDP Problem

Article Preview

Abstract:

DEC-POMDP(Distributed Partially Observable Markov Decision Process) model is a multi-agent model of collaborative decision-making is important, but due to an alarming number of DEC-POMDP problem state space and great strategy solution space, so DEC-POMDP solution of the problem becomes very difficult. The agent from the initial state to the target state during the interaction with the environment, the system's maximum benefit is often only with some small amount of a higher reward states. This article by searching from the initial belief state to the target state to get a shortest Hamiltonian path, according to the corresponding sequence of actions on the path forward search to get faith belief state space trajectory, and then along the trajectory reverse convictions value function iteration, thus forming the state with the largest gains beliefs trajectory corresponding optimal strategy. In this paper, shortest Hamiltonian path-based value iteration to search the optimal path of faith so as to solve the state Hamiltonian larger DEC-POMDP problem.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 926-930)

Pages:

3245-3249

Citation:

Online since:

May 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Littman M L, Dean T, Kaelbling L P, On the Complexity of Solving Markov Decision Problems. Proc. of the Eleventh International Conference on Uncertainty in Artificial Intelligence, UAI1995.

Google Scholar

[2] Wu Feng, Chen Xiaoping. Solving Large-Scale and Sparse-Reward DEC-POMDPs with Correlation-MDPs, Lecture Notes in Computer Science, (2008).

DOI: 10.1007/978-3-540-68847-1_18

Google Scholar

[3] Smith T, Simmons R. Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic. Proc. of AAAI2006, 2006.

Google Scholar

[4] Fuzhong WANG. A decision support system for logistics distribution network planning based on multi-agent systems, Proceedings of the Ninth International Symposium on Distributed Computing and Applications to Business, Engineering and Science, 2010, 8-10.

DOI: 10.1109/dcabes.2010.50

Google Scholar

[5] Barils Eker, H. Levent Akln. Using Evolution Strategies To Solve DEC-POMDP Problems. Soft Computing, 2010, 14: 35-47.

DOI: 10.1007/s00500-008-0388-7

Google Scholar

[6] Christopher Amato, Daniel S. Bernstein, Shlomo Zilberstein. Optimizing Fixed-Size Stochastic Controllers For POMDPs And Decentralized POMDPs, Autonomous Agents And Multi-Agent Systems, 2010, 3(21)293-320.

DOI: 10.1007/s10458-009-9103-z

Google Scholar

[7] Feng Qi, Zhou Xuezhong, Huang Houkuan, Zhang Xiaoping. SHP-VI: SHP-Ⅵ: A Shortest Hamiltonian Path-Based POMDP Value Iteration Algorithm, Journal of Computer Research and Development, 2011, 12, 2343-2351.

Google Scholar

[8] Li L H, Walsh T J, Littman M. Towards a Unified Theory of State Abstraction for MDPs. Ninth International Symposium on Artificial Intelligence and Mathematics, 2006.

Google Scholar

[9] Ding Chao Zhang Jian Wang Songlin . Disaster Prevention Decision-making Method based on Bayesian Analysis. Proceedings of 2010 3rd IEEE International Conference on Computer Science and Information Technology VOL. 9, (2010).

DOI: 10.1109/iccsit.2010.5565049

Google Scholar

[10] Guy Shani, Joelle Pineau, Robert Kaplow. A Survey Of Point-Based POMDP Solvers. Autonomous Agents and Multi-Agent Systems, 2013, 1(27) 663-704.

DOI: 10.1007/s10458-012-9200-2

Google Scholar

[11] Shani G, Brafman RI, Shimony SSE. Prioritizing Point-Based POMDP Solvers. IEEE Trans Syst Man Cybern B Cybern. 2008, 38(6)1592-605.

DOI: 10.1109/tsmcb.2008.928222

Google Scholar

[12] Weiguo Zhang, Tianyu Lu. The Research of Genetic Ant Colony Algorithm And Its Application, Procedia Engineering, 2012, (37) 1132-1136.

Google Scholar

[13] Lelouarn F X, Gendreau M, Potvin J Y. GENI Ants For The Traveling Salesman Problem, Annals Of Operations Research, 2004, 187-201.

DOI: 10.1023/b:anor.0000039518.73626.a5

Google Scholar

[14] Wong, Kuan Yew, See, Phen Chiak. A Gentic Ant Colony Optimisation System Gen ANT For Quadratic Assignment Problems, International Joumal of Mathematics in Operational Research, 2009, 1(4) 456-475.

DOI: 10.1504/ijmor.2009.026277

Google Scholar

[15] Yaodong Ni, Zhi-Qiang Liu, Policy Iteration for Bounded-Parameter POMDPs, Soft Computing, 17(4): 537-548, April, (2013).

DOI: 10.1007/s00500-012-0932-3

Google Scholar