Research Commons
      • Browse 
        • Communities & Collections
        • Titles
        • Authors
        • By Issue Date
        • Subjects
        • Types
        • Series
      • Help 
        • About
        • Collection Policy
        • OA Mandate Guidelines
        • Guidelines FAQ
        • Contact Us
      • My Account 
        • Sign In
        • Register
      View Item 
      •   Research Commons
      • University of Waikato Theses
      • Higher Degree Theses
      • View Item
      •   Research Commons
      • University of Waikato Theses
      • Higher Degree Theses
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Policy Search Based Relational Reinforcement Learning using the Cross-Entropy Method

      Sarjant, Samuel
      Thumbnail
      Files
      thesis.pdf
      Main text, 2.482Mb
      CERRLASource.zip
      Supplementary material, 60.51Mb
      CERRLA Videos.zip
      Supplementary material, 38.10Mb
      CERRLA-ExperimentOutputs.7z
      Supplementary material, 129.4Mb
      CERRLA-AgentObservations.7z
      Supplementary material, 318.4Kb
      Citation
      Export citation
      Sarjant, S. (2013). Policy Search Based Relational Reinforcement Learning using the Cross-Entropy Method (Thesis, Doctor of Philosophy (PhD)). University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/7671
      Permanent Research Commons link: https://hdl.handle.net/10289/7671
      Abstract
      Relational Reinforcement Learning (RRL) is a subfield of machine learning in which a learning agent seeks to maximise a numerical reward within an environment, represented as collections of objects and relations, by performing actions that interact with the environment. The relational representation allows more dynamic environment states than an attribute-based representation of reinforcement learning, but this flexibility also creates new problems such as a potentially infinite number of states.

      This thesis describes an RRL algorithm named Cerrla that creates policies directly from a set of learned relational “condition-action” rules using the Cross-Entropy Method (CEM) to control policy creation. The CEM assigns each rule a sampling probability and gradually modifies these probabilities such that the randomly sampled policies consist of ‘better’ rules, resulting in larger rewards received. Rule creation is guided by an inferred partial model of the environment that defines: the minimal conditions needed to take an action, the possible specialisation conditions per rule, and a set of simplification rules to remove redundant and illegal rule conditions, resulting in compact, efficient, and comprehensible policies.

      Cerrla is evaluated on four separate environments, where each environment has several different goals. Results show that compared to existing RRL algorithms, Cerrla is able to learn equal or better behaviour in less time on the standard RRL environment. On other larger, more complex environments, it can learn behaviour that is competitive to specialised approaches. The simplified rules and CEM’s bias towards compact policies result in comprehensive and effective relational policies created in a relatively short amount of time.
      Date
      2013
      Type
      Thesis
      Degree Name
      Doctor of Philosophy (PhD)
      Supervisors
      Pfahringer, Bernhard
      Driessens, Kurt
      Smith, Tony C.
      Publisher
      University of Waikato
      Rights
      All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
      Collections
      • Higher Degree Theses [1721]
      Show full item record  

      Usage

      Downloads, last 12 months
      119
       
       

      Usage Statistics

      For this itemFor all of Research Commons

      The University of Waikato - Te Whare Wānanga o WaikatoFeedback and RequestsCopyright and Legal Statement