Sarjant, S., Pfahringer, B., Driessens, K. & Smith, T. (2011). Using the online cross-entropy method to learn relational policies for playing different games. In Proceeding of 2011 IEEE Conference on Computational Intelligence and Games, Seoul, South Korea, 31 August - 3 September (pp. 182-189).
Permanent Research Commons link: http://hdl.handle.net/10289/5837
By defining a video-game environment as a collection of objects, relations, actions and rewards, the relational reinforcement learning algorithm presented in this paper generates and optimises a set of concise, human-readable relational rules for achieving maximal reward. Rule learning is achieved using a combination of incremental specialisation of rules and a modified online cross-entropy method, which dynamically adjusts the rate of learning as the agent progresses. The algorithm is tested on the Ms. Pac-Man and Mario environments, with results indicating the agent learns an effective policy for acting within each environment.
© 2011 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.