Tag: RL

Articles

Celebrating multiple firsts at the premier international event for machine learning and computational neuroscience

NeurIPS 2025

The 39th Annual Conference on Neural Information Processing Systems (NeurIPS) was capped off by a series of firsts for the InstaDeep team, with researchers leading the way in exchanging ideas and showcasing innovation within the AI community during our visit to the San Diego Convention Center from the 02-07 December.  With five workshops, three accepted… Read more »

Breaking the performance ceiling in Reinforcement Learning

Breaking the Performance Ceiling in Reinforcement Learning

Reinforcement learning (RL) has delivered some of AI’s most striking successes, from human-level Atari 1 play to world-class performance in Go2. Yet when applied to messy, real-world combinatorial optimisation (CO) problems such as energy grid management or autonomous logistics, even state-of-the-art RL systems can stall. Despite being trained to convergence, policies often hit a performance… Read more »

Oryx InstaDeep’s scalable sequence model for multi-agent coordination in offline settings

Oryx: InstaDeep’s scalable sequence model for multi-agent coordination in offline settings

Multi-agent reinforcement learning (MARL) holds significant promise across domains such as autonomous driving, warehouse logistics, intelligent rail networks, and satellite alignment. Yet deploying MARL in the real world remains difficult. Training typically requires vast amounts of interactive data, which is both costly and potentially risky, particularly in safety-critical settings where trial and error is not… Read more »

Introducing DEgym: A framework for developing Reinforcement Learning Environments for Dynamical Systems

Introducing DEgym: A framework for developing Reinforcement Learning Environments for Dynamical Systems

Reinforcement learning (RL) is increasingly being applied to complex processes across science and engineering, with promising results in manufacturing, biology, and energy systems. By learning through trial and error, RL agents can optimise behaviour without explicit supervision1.  Many of these processes are governed by differential-algebraic equations (DAEs). These combine time-dependent dynamics with algebraic constraints, making… Read more »