Publications
Ludvig, E. A., & Koop, A. (2008). Learning to Generalize through Predictive Representations : A Computational Model of Mediated Conditioning. In From Animals to Animats 10: Proceedings of Simulation of Adaptive Behavior 2008 (pp. 342-351).
Learning when and how to generalize knowledge from past experience to novel circumstances is a challenging problem many agents face. In animals, this generalization can be caused by mediated conditioning—when two stimuli gain a relationship through the mediation of a third stimulus. For example, in sensory preconditioning, if a light is always followed by a tone, and that tone is later paired with a shock, the light will come to elicit a fear reaction, even though the light was never directly paired with shock. In this paper, we present a computational model of mediated conditioning based on reinforcement learning with predictive representations. In the model, animals learn to predict future observations through the temporal-difference algorithm. These predictions are generated using both current observations and other predictions. The model was successfully applied to a range of animal learning phenomena, including sensory preconditioning, acquired equivalence, and mediated aversion. We suggest that animals and humans are fruitfully understood as representing their world as a set of chained predictions and propose that generalization in artificial agents may benefit from a similar approach.
Koop, A (2007). Investigating Experience: Temporal Coherence and Empirical Knowledge Representation. University of Alberta Master’s thesis.
This thesis investigates the idea of artificial intelligence as an agent making sense of its experience, illustrating some of the benefits of representing knowledge as predictions of future experience. Experience is here defined as the temporal sequence of sensations and actions that are the inputs and outputs of the agent. One characteristic of this sequence is that it can have temporal coherence: what is experienced in a short period of time is likely to be consistent. The first part of this thesis examines how an agent with dynamic memory can take advantage of the temporal coherence of its experience. Results in a simple prediction task and the more complex problem of Computer Go show how such an agent can dramatically improve on the performance of the best stationary solutions. The prediction task is then used to illustrate how temporal coherence can provide a natural testbed for meta-learning.
Sutton, R. S., Koop, A., Silver, D. (2007). On the Role of Tracking in
Stationary Environments. In Proceedings of the 2007
International Conference on Machine Learning.
best solution, as opposed to converging to it, are important only on nonstationary problems. We present three results suggesting that this is not so. First we illustrate in a simple concrete example, the Black and White problem, that tracking can perform better than any converging algorithm on a stationary problem. Second, we show the same point on a larger, more realistic problem, an application of temporal-difference learning to computer Go. Our third result suggests that tracking in stationary problems could be important for meta-learning research (e.g., learning to learn, feature selection, transfer). We apply a meta-learning algorithm for step-size adaptation, IDBD,e to the Black and White problem, showing that meta-learning has a dramatic long-term effect on performance whereas, on an analogous converging problem, meta-learning has only a small second-order effect. This small result suggests a way of eventually overcoming a major obstacle to meta-learning research: the lack of an independent methodology for task selection.
Tanner, B., Bulitko, V., Koop, A.,
Paduraru, C. (2007). Grounding
Abstractions in Predictive State Representations. In Proceedings
of the International Joint Conference on Artificial
Intelligence (IJCAI), pages 1077-1082.
Precup, D., Sutton, R. S., Paduraru, C.,
Koop,
A., Singh, S. (2006). Off-policy
Learning with
Recognizers (online proceedings version, Nov 11 2005). Advances in Neural Information Processing
Systems 18(NIPS*05).
Sutton, R. S., Rafols, E. J., Koop, A. (2006). Temporal
abstraction in
temporal-difference networks (online proceedings version, Nov 11
2005). Advances in Neural
Information Processing
Systems 18 (NIPS*05).