Combinatorial optimisation (CO) problems are among the hardest challenges in computer science, with real-world importance in logistics, transport, and energy systems. These problems involve choosing the “best” option from an astronomically large set of possibilities and are often NP-hard, meaning they become computationally intractable as the problem size grows.
Neural solvers, advanced AI approaches designed to tackle CO problems, can produce strong results, but even the best existing methods struggle to efficiently search for solutions to new problem instances.
Since exhaustive search is infeasible, solvers instead rely on heuristics, problem-solving strategies that use rules of thumb to find practical solutions efficiently. Reinforcement learning (RL) provides a flexible way to learn such heuristics, typically by constructing solutions step by step. However, as finding an optimal solution in a single attempt is virtually impossible for NP-hard problems, real-world performance depends on how well solvers can make efficient use of their compute budget to adapt and search online.
Most existing neural solvers fall into one of two categories. Diversity-based approaches rely on a collection of complementary strategies that generate very different solutions, betting that at least one of them will be particularly good. However, they generally lack mechanisms to fully exploit past attempts to improve future ones. In contrast, policy improvement approaches do update the solver based on past attempts, but the updates are computationally heavy, irreversible, and often unstable, making them difficult to apply in practice.
MEMENTO has the potential to change how we address these CO problems. By allowing the neural solvers to dynamically remember similar situations to adjust its next decisions, MEMENTO enables adaptation during inference and gives AI something it has long lacked: a memory.
How it works
MEMENTO makes adaptation more efficient and effective by storing and reusing information across attempts. Instead of relying on traditional policy gradient updates that depend heavily on backpropagation and learning rates, it can directly retrieve past information to update action logits with a learned rule.
At its core, the process is a simple feedback loop:
- Attempt: the solver tries to construct a solution.
- Store: key data from the attempt, such as actions, outcomes, trajectory quality, and remaining compute budget, is saved to memory.
- Retrieve: when faced with the next decision, the solver retrieves relevant data from memory.
- Update: the memory module updates the solver’s action distribution, nudging it toward more promising choices.
The result is a lightweight, model-agnostic module that enables neural solvers to adapt dynamically and make better use of their search budget.

Results
MEMENTO was evaluated on two standard benchmarks in CO problem solving: the Travelling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP). MEMENTO consistently outperformed both policy-gradient fine-tuning methods, such as Efficient Active Search (EAS), and tree-search approaches like Simulation-Guided Beam Search (SGBS).
When benchmarked against EAS, MEMENTO delivered substantial gains over its base model, POMO, confirming the superiority of its adaptive search strategy over stochastic sampling. The difference was particularly striking on TSP instances of size 100 (TSP100), where MEMENTO achieved 60% search improvement over stochastic sampling, while EAS delivered only a 6% improvement. This margin highlights the effectiveness of MEMENTO’s learned policy updates over traditional gradient-based adaptation methods.
Crucially, MEMENTO is model-agnostic and can be combined zero-shot with pretrained solvers, instantly improving performance without retraining. When integrated with our neural solver COMPASS, it achieved state-of-the-art results across 12 benchmark tasks. Notably, during inference, activating MEMENTO’s adaptation mechanism midway through the search budget produced a clear reduction in tour length, signalling faster convergence toward optimal solutions.
MEMENTO also proved highly scalable, extending to problems with 500 decision points (nodes) for both TSP and CVRP, the first time auto-regressive RL methods like POMO have been trained at this scale. Adaptation performance scaled reliably with instance size, and MEMENTO remained efficient and robust even in low-budget regimes. It further demonstrated strong time-performance scalability, using the available compute budget 20% more efficiently than EAS on large instances (size 1000), while being 40% faster on smaller instances (size 100) when using a deeper decoder.
Finally, on the largest benchmarks (size 500), the zero-shot combination of MEMENTO and COMPASS achieved state-of-the-art results on three of four tasks, reducing the gap to optimality by more than 10% relative to leading prior methods.

What’s next?
MEMENTO shows that giving AI a memory can push combinatorial optimisation into new territory. By combining instinct with experience, it memorises past solving attempts and enables solvers to adapt dynamically while making full use of available compute.
The possible real-world impacts are significant: more efficient delivery routes and supply chains, smarter and more stable energy grids, and faster, more adaptive industrial optimisation tools. Looking ahead, we plan to train MEMENTO on broader distributions with variable instance sizes and simplify the learned update rule to reduce overhead further.
🚀 We’re proud to share that this research has been selected among the top 3% of submitted papers to be a Spotlight at NeurIPS 2025! Discover the science behind the breakthrough, read the paper and explore the module on GitHub
Disclaimer: All claims made are supported by our research paper: Memory-enhanced Neural Solvers for Routing Problems unless explicitly cited otherwise.
