Reinforcement learning (RL) has delivered some of AI’s most striking successes, from human-level Atari 1 play to world-class performance in Go2. Yet when applied to messy, real-world combinatorial optimisation (CO) problems such as energy grid management or autonomous logistics, even state-of-the-art RL systems can stall. Despite being trained to convergence, policies often hit a performance… Read more »

