[1]
C. Lochert, H Hartenstein, J Tian, etc, "A routing strategy for vehicular ad hoc networks in city environments" Proceedings of the 2003 IEEE International Conference on intelligent Vehicles Symposium, 2003, pp, 156-161.
DOI: 10.1109/ivs.2003.1212901
Google Scholar
[2]
G. Malkin, "Rip version 2: carrying additional information", http://etherpad.tools.ietf.org/html/rfc1723
Google Scholar
[3]
K. Kompella, Ed. and Y. Rekhter, Ed. ,"OSPF Extensions in Support of Generalized Multi-Protocol Label witching (GMPLS)", http://trac.tools.ietf.org/html/rfc4203
DOI: 10.17487/rfc4203
Google Scholar
[4]
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Massachusetts London, England: The MIT Press Cambridge, 1998.
Google Scholar
[5]
Wen Shang and Dong Sun, "Distributed neural network-based policy gradient reinforcement learning for multi-robot formations", Proceedings of the 2008 IEEE International Conference on Information and Automation, pp.113-118.
DOI: 10.1109/icinfa.2008.4607978
Google Scholar
[6]
L. Peshkin and V.Savova , "Reinforcement learning for adaptive routing", International Joint Conference on Neural Networks (IJCNN), 2002, pp, 1825-1830
DOI: 10.1109/ijcnn.2002.1007796
Google Scholar
[7]
JA. Boyan and ML. Littman, "Packet routing in dynamically changing networks: a reinforcement learning approach", In Advances in Neural Information Processing Systems, 1994, pp, 671-678.
Google Scholar
[8]
Z. H. Hu, D. B. Zhao, "Reinforcement learning for multi-agent patrol policy," IEEE International Conference on Cognitive Informatics, 2010, pp.530-535
DOI: 10.1109/coginf.2010.5599681
Google Scholar
[9]
C. Watkins, "Q-learning," Machine Learning, 1992, vol. 8, no.3, pp.279-292.
Google Scholar
[10]
A. G. Barto, T. G. Dietterich, "Reinforcement learning and its relationship to supervised learning," in J. Si, A. Barto, W. Powell, and D. Wunsch. Handbook of Learning and Approximate Dynamic Programming, IEEE Press, John Wiley & sons, Inc., 2004, pp.47-63.
DOI: 10.1109/9780470544785
Google Scholar
[11]
D. B. Zhao, Z. Zhang, Y. J. Dai. "Self-teaching adaptive dynamic programming for Go-Moku," Neurocomputing, vol.78, 2012, pp.23-29.
DOI: 10.1016/j.neucom.2011.05.032
Google Scholar