摘要
The quality of a knowledge representation directly influences an agent's ability to interact with an environment. Temporal-difference (TD) networks, a recently introduced knowledge representation framework, model a world with a set of action-conditional predictions about sensations. Some key characteristics of TD networks are that they: (1) relate knowledge to sensations, (2) allow the agent to make predictions about other predictions (compositionality) and (3) provide a means for abstraction. The focus of this thesis is connecting high-level concepts to data by abstracting over space and time. Spatial abstraction in TD networks helps with scaling issues by grouping situations with similar sets of predictions into abstract states. A set of experiments demonstrate the advantages of using the abstract states as a representation for reinforcement learning. Temporal abstraction is added to TD networks by extending the framework to predict arbitrarily distant future outcomes. This extension is based on the options framework, an approach to including temporal abstraction in reinforcement-learning algorithms. Including options in the TD-network framework brings about a challenging problem: learning about multiple options from a single stream of data (also known as off-policy learning). The first algorithm for the off-policy learning of predictions about option outcomes is introduced in this thesis.
摘要译文
知识表示的质量直接影响代理人与环境交互的能力。时间差(TD)网络,一个最近引入的知识表示框架,用一套关于感觉的动作条件预测来模拟一个世界。TD网络的一些关键特征是:(1)将知识与感觉联系起来,(2)允许代理人对其他预测(组合性)进行预测,(3)提供抽象的手段。本文的重点是通过抽象的空间和时间将高层次概念与数据连接起来。TD网络中的空间抽象通过将具有相似预测集合的情况分组为抽象状态来帮助缩放问题。一组实验证明了使用抽象状态作为强化学习的表示的优点。通过扩展框架来预测任意遥远的未来结果,时域抽象被添加到TD网络中。这个扩展基于选项框架,一种将时间抽象包含在强化学习算法中的方法。在TD网络框架中包含选项会带来一个具有挑战性的问题:从单一数据流(也称为非政策学习)中学习多个选项。本文首先介绍了第一种关于期权结果预测的关闭策略学习算法。
Rafols, Eddie JR. Temporal abstraction in temporal-difference networks[D]. CA: University of Alberta (Canada), 2006