Modified UCT Algorithm with Risk Dominance Methods in Imperfect Information Game

Article Preview

Abstract:

UCT (Upper confidential bounds on Trees) has been applied quite well as a selection approach in MCTS(Monte Carlo Tree Search) in imperfect information games like poker. By using risk dominance as complementary part of decision method besides payoff dominance, opponent strategies is better characterized as their risk factors, like bluff actions in Texas Hold’em Poker . In this paper, estimation method about the influence of risk factors on computing game equilibrium is provided. A novel algorithm, UCT-risk is proposed as modification about UCT algorithm basing on risk estimation methods. To verify the performance of new algorithm, Texas Hold’em, a popular test-bed for AI research is chosen as the experiment platform. The Agent adopted UCT-risk algorithm performs as well or better as the best previous approaches in experiments. And also it is applied in a poker agent named HITSZ_CS_13 in the 2013 AAAI Computer Poker Competition, which confirms the effectiveness of the UCT-risk provided in this paper.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

367-376

Citation:

Online since:

August 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Howard James Bampton, Solving imperfect information games using the Monte Carlo heuristic, Knoxville, Master thesis: University of Tennessee ( 1994).

Google Scholar

[2] Denis Papp, Dealing with imperfect information in Poker, MSC. Thesis: University of Alberta (1998).

Google Scholar

[3] Long J R, Sturtevant N R, Buro M, et al, Understanding the success of perfect information monte carlo sampling in game tree search, Proc. Assoc. Adv. Artif. Intell (2010) 134-140.

DOI: 10.1609/aaai.v24i1.7562

Google Scholar

[4] Sturtevant N R, Current challenges in multi-player game search, in Proceedings of the 4th International Conference on Computers and Games, Ramat-Gan, Israel (2004) 285–300.

DOI: 10.1007/11674399_20

Google Scholar

[5] Harsanyi J C, Selten R, A general theory of equilibrium selection in games, Discrete Applied Mathematics, vol. 26(1) (1990) 126-127.

DOI: 10.1016/0166-218x(90)90029-c

Google Scholar

[6] Straub P G. Risk dominance and coordination failures in static games[J]. The Quarterly Review of Economics and Finance, 35(4) (1996) 339-363.

DOI: 10.1016/1062-9769(95)90048-9

Google Scholar

[7] Cooper R, et al, Forward induction in coordination games, Economics Letters, 40(2) (1992) 167—172.

Google Scholar

[8] Schmidt D, Shupp R, Walker J M, et al. Playing safe in coordination games: the roles of risk dominance, payoff dominance, and history of play[J]. Games and Economic Behavior, 42(2) (2003) 281-299.

DOI: 10.1016/s0899-8256(02)00552-3

Google Scholar

[9] Heinemann F, Nagel R, Ockenfels P. Measuring strategic uncertainty in coordination games[J]. The Review of Economic Studies, 76(1) (2009) 181-221.

DOI: 10.1111/j.1467-937x.2008.00512.x

Google Scholar

[10] Farshchi S M R, Kehvazadeh I. Finding risk dominance strategy in imperfect game, a theory perspective[C]/Control (CONTROL), 2012 UKACC International Conference on. IEEE, (2012) 798-802.

DOI: 10.1109/control.2012.6334732

Google Scholar

[11] Billings D, Papp D, Schaeffer J, et al. Opponent modeling in poker[C]/AAAI/IAAI. (1998) 493-499.

Google Scholar

[12] Sturtevant N R. An analysis of UCT in multi-player games[M]/Computers and Games. Springer Berlin Heidelberg, ( 2008) 37-49.

DOI: 10.1007/978-3-540-87608-3_4

Google Scholar

[13] Iolis B, Bontempi G. Comparison of selection strategies in monte carlo tree search for computer poker[C]/Proceedings of the Annual Machine Learning Conference of Belgium and The Netherlands, BeNeLearn (2010).

Google Scholar

[14] Straub P G, Risk dominance and coordination failures in static games, Quart. Rev. Econ. Finance, vol. 35 (1995) 339–363.

DOI: 10.1016/1062-9769(95)90048-9

Google Scholar

[15] Jiajia Zhang, Xuan Wang, The research of risk ananlysis and estimate method for machine game, Journal of High Technology Letters, 9 (2013) 965-972.

Google Scholar

[16] Levente Kocsis, Csaba Szepesvari, Bandit based Monte-Carlo Planning, 15th European Conference on Machine Learning (ECML), (2006) 282-293.

DOI: 10.1007/11871842_29

Google Scholar

[17] Xuan Wang, Jiajia Zhang, Xinxin Xu, et al, Risk dominance strategy in imperfect information multi-player military chess game, International Swaps and Derivatives Association, Gaoxiong, China, (2008) 596-601.

Google Scholar

[18] http: /www. computerpokercompetition. org/index. php/competitions/results/94-2013-results.

Google Scholar