Category: Technical
-
Are you sure? Testing ChatGPT’s Confidence
I found that after giving a correct answer, GPT-3.5 will change its mind to an incorrect answer more than 50% of the time if you simply ask it “Are you sure?”.
-
A case for Fixed-Horizon Temporal Difference methods in RL
Introduction Value learning algorithms usually centre around the infinite horizon Bellman equation. When we make estimates of the value of an action, we are estimating the value of the entire future given a current state and proposed action. However, such value learning approaches are notoriously unstable. Finite-Horizon Temporal Difference methods were recently reintroduced to shorten…