How to achieve better performance and faster training using Prioritized Experience Replay?
From DeepMind original paper: Prioritized Experience Replay
Experience replay lets our reinforcement learning agent collect, remember and reuse experiences from the past. These experience transitions were usually uniformly sampled from a replay memory, and then used to train the agent.
However, this approach simply replays transitions at the same frequency that they were originally experienced, regardless of their significance or their temporal-difference (TD) error magnitude.
It turns out that sequences associated with rewards (non-zero rewards) appear to be replayed more frequently in biological brains (Atherton et al., 2015; Ólafsdóttir et al., 2015; Foster & Wilson, 2006) as experiences with high magnitude TD error, also appear to be (Singer & Frank, 2009; McNamara et al., 2014)).
The TD error itself provides one way to measure these update priorities, but DeepMind approach for model-free RL use a stochastic prioritization that is proven to be more robust when learning a function approximator from samples.
Based on this paper we are implementing in our agent this type of prioritizing experience replay. This means we want to replay important transitions more frequently, and therefore learn faster and more efficiently.
The key idea is that an RL agent can learn more effectively from some transitions than from others and thus, we want to compare any benefits of a stochastic prioritization instead of a simple TD error based prioritization.
We are testing prioritized experience replay with Deep Q-Recurrent Networks(DQRNN) in a customized environment, expecting that this agent will achieve new state of-the-art, outperforming our previous model with uniform replay.