会议论文

"Temporal-difference networks" 收藏

“时代差异网络”

机构暂未开通该资源的服务权限，如有疑问请联系图书馆。

作者

Richard S. Sutton[1];Brian Tanner[1]

作者单位

[1]Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada

页码

1377-1384

来源信息

Neural Information Processing Systems, 2004年, 卷, 1377-1384页

摘要

We introduce a generalization of temporal-difference (TD) learning to networks of interrelated predictions. Rather than relating a single prediction to itself at a later time, as in conventional TD methods, a TD network relates each prediction in a set of predictions to other predictions in the set at a later time. TD networks can represent and apply TD learning to a much wider class of predictions than has previously been possible. Using a random-walk example, we show that these networks can be used to learn to predict by a fixed interval, which is not possible with conventional TD methods. Secondly, we show that if the inter-predictive relationships are made conditional on action, then the usual learning-efficiency advantage of TD methods over Monte Carlo (supervised learning) methods becomes particularly pronounced. Thirdly, we demonstrate that TD networks can learn predictive state representations that enable exact solution of a non-Markov problem. A very broad range of inter-predictive temporal relationships can be expressed in these networks. Overall we argue that TD networks represent a substantial extension of the abilities of TD methods and bring us closer to the goal of representing world knowledge in entirely predictive, grounded terms.

摘要译文

我们引入时间差（TD）学习到相互关联的预测网络的泛化。而不是在稍后的时间将单个预测与自身相关联，如在常规TD方法中，TD网络将一组预测中的每个预测与以后的集合中的其他预测相关联。TD网络可以将TD学习代表和应用于比以前可能的更广泛的预测类别。使用随机行走的例子，我们表明这些网络可以用来学习以固定的间隔预测，这是传统的TD方法是不可能的。其次，我们表明，如果预测间关系是以行动为条件的，那么TD方法在蒙特卡罗（监督学习）方法中通常的学习效率优势变得尤为明显。第三，我们展示了TD网络可以学习预测状态表示，使非马尔可夫问题的精确解得到解决。在这些网络中可以表现出非常广泛的预测间时间关系。总的来说，我们认为，TD网络代表了TD方法能力的实质性延伸，并使我们更接近于完全预测世界知识的目标，接地条款。

Richard S. Sutton[1];Brian Tanner[1]. "Temporal-difference networks"[C]//NIPS'04:Proceedings of the 17th International Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, CA: ACM, 2004: 1377-1384