Category: Technical
-
A case for Fixed-Horizon Temporal Difference methods in RL
Introduction Value learning algorithms usually centre around the infinite horizon Bellman equation. When we make estimates of the value of an action, we are estimating the value of the entire future given a current state and proposed action. However, such value learning approaches are notoriously unstable. Finite-Horizon Temporal Difference methods were recently reintroduced to shorten…