AMRL: Aggregated Memory For Reinforcement Learning

AMRL: Aggregated Memory For Reinforcement Learning

Abstract

In many partially observable scenarios, Reinforcement Learning (RL) agents must rely on long-term memory in order to learn an optimal policy. We demonstrate that using techniques from NLP and supervised learning fails at RL tasks due to stochasticity from the environment and from exploration. Utilizing our insights on the limitations of traditional memory methods in RL, we propose AMRL, a class of models that can learn better policies with greater sample efficiency and are resilient to noisy inputs. Specifically, our models use a standard memory module to summarize short-term context, and then aggregate all prior states from the standard model without respect to order. We show that this provides advantages both in terms of gradient decay and signal-to-noise ratio over time. Evaluating in Minecraft and maze environments that test long-term memory, we find that our model improves average return by 19% over a baseline that has the same number of parameters and by 9% over a stronger baseline that has far more parameters.

Grafik Top
Authors
  • Beck, Jacob
  • Ciosek, Kamil
  • Devlin, Sam
  • Tschiatschek, Sebastian
  • Zhang, Cheng
  • Hofmann, Katja
Grafik Top
Shortfacts
Category
Paper in Conference Proceedings or in Workshop Proceedings (Poster)
Event Title
International Conference on Learning Representations
Divisions
Data Mining and Machine Learning
Event Location
Addis Ababa, Ethiopia
Event Type
Conference
Event Dates
April 26 to April 30, 2020
Date
26 June 2020
Official URL
https://openreview.net/forum?id=Bkl7bREtDr
Export
Grafik Top