摘要
We present a generalization of temporal-difference networks to include temporally abstract options on the links of the question network. Temporal-difference (TD) networks have been proposed as a way of representing and learning a wide variety of predictions about the interaction between an agent and its environment. These predictions are compositional in that their targets are defined in terms of other predictions, and subjunctive in that that they are about what would happen if an action or sequence of actions were taken. In conventional TD networks, the inter-related predictions are at successive time steps and contingent on a single action; here we generalize them to accommodate extended time intervals and contingency on whole ways of behaving. Our generalization is based on the options framework for temporal abstraction. The primary contribution of this paper is to introduce a new algorithm for intra-option learning in TD networks with function approximation and eligibility traces. We present empirical examples of our algorithm's effectiveness and of the greater representational expressiveness of temporally-abstract TD networks.
摘要译文
我们提出时间差异网络的泛化,以在问题网络的链接上包括时间抽象选项。时差(TD)网络被提出作为代表和学习关于代理与其环境之间的相互作用的各种预测的方式。这些预测是组合的,因为它们的目标是根据其他预测来定义的,而且这些是关于如果采取行动或一连串行动会发生什么的话。在传统的TD网络中,相互关联的预测是连续的时间步骤,取决于单一动作;这里我们将它们概括为适应延长的时间间隔和整个行为方式的偶然性。我们的泛化是基于时间抽象的选项框架。本文的主要贡献是在具有功能近似和资格追踪的TD网络中引入一种用于选项内学习的新算法。我们提出了我们的算法的有效性的实证例子和时间抽象的TD网络的更大的代表性表现力。
Richard S. Sutton[1];Eddie J. Rafols[1];Anna Koop[1]. "Temporal abstraction in temporal-difference networks"[C]//NIPS'05:Proceedings of the 18th International Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada , December 05 - 08, 2005, CA: ACM, 2005: 1313-1320