Decision-Making AI For The Enterprise

InstaDeep delivers AI-powered decision-making systems for the Enterprise. With expertise in both machine intelligence research and concrete business deployments, we provide a competitive advantage to our customers in an AI-first world.

Learn More

Building AI systems for the industry

Leveraging its expertise in GPU-accelerated computing, deep learning and reinforcement learning, InstaDeep has built AI systems to tackle the most complex challenges across a range of industries and sectors.

Biology Biology

Biology

Read More
Logistics Logistics

Logistics

Read More
Electronic Design Electronic Design

Electronic Design

Read More
Energy Energy

Energy

Read More

Latest

Our latest updates from across our channels

Oryx InstaDeep’s scalable sequence model for multi-agent coordination in offline settings

Oryx: InstaDeep’s scalable sequence model for multi-agent coord...

on Nov 11, 2025 | 01:51pm

Multi-agent reinforcement learning (MARL) holds significant promise across domains such as autonomous driving, warehouse logistics, intelligent rail networks, and satellite alignm...

Genome annotation with SegmentNT

Genome annotation with SegmentNT...

on Oct 29, 2025 | 11:30am

Nucleotides are the fundamental units of DNA, and when linked together by a sugar-phosphate backbone, they form the strands that define our genome. Analysing the precise role o...

AI Day 2025 Powering biology with a full-stack AI ecosystem

AI Day 2025: Powering biology with a full-stack AI ecosystem...

on Oct 13, 2025 | 10:11am

BioNTech hosted its annual AI Day at the Science Museum in London, the second event in their Innovation Series. The day brought together investors, analysts, media representatives...

Introducing DEgym: A framework for developing Reinforcement Learning Environments for Dynamical Systems

Introducing DEgym: A framework for developing Reinforcement Learn...

on Sep 16, 2025 | 02:59pm

Reinforcement learning (RL) is increasingly being applied to complex processes across science and engineering, with promising results in manufacturing, biology, and energy systems...

Advancing AI with Bayesian Flow Networks

Advancing AI with Bayesian Flow Networks...

on Sep 04, 2025 | 11:57am

By Alex Graves The story of Bayesian Flow Networks (BFNs) took years to take shape. In 2010, I was working with autoregressive models and repeatedly ran into the same constr...

Hand in Hand for Africa’s AI Future - InstaDeep at Deep Learning Indaba 2025

Hand in Hand for Africa’s AI Future – InstaDeep at Deep L...

on Aug 29, 2025 | 12:57pm

Africa's AI community gathers annually for the Deep Learning Indaba to exchange ideas, learn, and dream big. As InstaDeep we have supported this journey from the start, both throu...

Talking biology with ChatNT

Talking biology with ChatNT...

on Jun 06, 2025 | 10:29am

As AI continues to reshape our understanding of the biological landscape, DNA, RNA, and protein sequence models are rapidly emerging—each promising to be faster, more capable, a...

Accelerate molecular simulations with mlip

Accelerate molecular simulations with mlip...

on May 29, 2025 | 01:29pm

Understanding molecular behaviour allows researchers to predict the physical and chemical properties of complex systems1, such as how a protein folds or how a drug binds to its ta...

InstaDeep at IndabaX Tunisia 2025 - Empowering Through Knowledge

InstaDeep at IndabaX Tunisia 2025 – Empowering Through Know...

on May 14, 2025 | 11:43am

Continuing our commitment to supporting AI talent in the region, InstaDeep was proud to sponsor and take part at the 6th edition of IndabaX Tunisia 2025, held on May 3-4 at the Hi...

Flexible antibody design with AbBFN2

Flexible antibody design with AbBFN2...

on May 06, 2025 | 10:42am

Antibodies play an important role in the adaptive immune response. By selectively recognising and binding to specific antigens—such as viruses or bacteria—they neutralise thre...

Research

Are genomic language models all you need? Exploring genomic language models on protein downstream tasks

Sam Boshar | Evan Trop | Bernardo P. de Almeida | Liviu Copoiu | Thomas Pierrot

Bioinformatics (2024) Sep 2025
Motivation Large language models, trained on enormous corpora of biological sequences, are state-of-the-art for downstream genomic and proteomic tasks. Since the genome contains the information to encode all proteins, genomic language models (gLMs) hold the potential to make downstream predictions not only about DNA sequences, but also about proteins. However, the performance of gLMs on protein tasks remains unknown, due to few tasks pairing proteins with the coding DNA sequences (CDS) that can be processed by gLMs. Results In this work, we curated five such datasets and used them to evaluate the performance of gLMs and proteomic language models (pLMs). We show that gLMs are competitive and even outperform their pLMs counterparts on some tasks. The best performance was achieved using the retrieved CDS compared to sampling strategies. We found that training a joint genomic-proteomic model outperforms each individual approach, showing that they capture different but complementary sequence representations, as we demonstrate through model interpretation of their embeddings. Lastly, we explored different genomic tokenization schemes to improve downstream protein performance. We trained a new Nucleotide Transformer (50M) foundation model with 3mer tokenization that outperforms its 6mer counterpart on protein tasks while maintaining performance on genomics tasks. The application of gLMs to proteomics offers the potential to leverage rich CDS data, and in the spirit of the central dogma, the possibility of a unified and synergistic approach to genomics and proteomics.

Multi-Agent Reinforcement Learning with Selective State-Space Models

Jemma Daniel | Ruan John de Kock | Louay Ben Nessir | Sasha Abramowitz | Omayma Mahjoub | Wiem Khlifi | Juan Claude Formanek | Arnu Pretorius

AAMAS 2025 Sep 2025
The left-hand plot in Figure 1 compares MAM, MAT, and MAPPO, aggregated over all tasks and environments. MAM achieves performance on par with MAT, the current state-of-the-art, while learning faster.

Sable: a Performant, Efficient and Scalable Sequence Model for MARL

Omayma Mahjoub | Sasha Abramowitz | Ruan de Kock | Wiem Khlifi | Simon du Toit | Jemma Daniel | Louay Ben Nessir | Claude Formanek | Louise Beyers | Liam Clark | Arnu Pretorius

ICML 2025 Jul 2025
Figure 1. Performance, memory, and scaling properties of Sable compared to the Multi-Agent Transformer (MAT) (Wen et al., 2022), the previous state-of-the-art, aggregated over 45 cooperative MARL tasks. Left: Sable ranks best in 34 out of 45 tasks, outperforming all other MARL algorithms tested across 6 environments: RWARE, LBF, MABrax, SMAX, Connector, and MPE. MAT ranked best of 3/45. Middle: Sable exhibits superior throughput, processing up to 6.5 times more steps per second compared to MAT as we scale to 512 agents. Right: Sable scales efficiently to thousands of agents, maintaining stable performance, while using GPU memory significantly more efficiently than MAT.

Bimodal masked language modeling for bulk RNA-seq and DNA methylation representation learning

Maxence Gélard | Hakim Benkirane | Thomas Pierrot | Guillaume Richard | Paul-Henry Cournède

ICML 2025 Workshop Jul 2025
Figure 1: MOJO pipeline. (a) Each modality is first tokenized using linear binning. (b) MOJO, whose core architecture is composed of a mix of convolution and attention operations, is firstly pre-trained through bimodal masked language modeling. (c) Embeddings are probed from MOJO to fine-tune a task-specific head tailored for cancer-type classification or survival analysis.

InstaNovo-P: A de novo peptide sequencing model for phosphoproteomics

Jesper Lauridsen | Pathmanaban Ramasamy | Rachel Catzel | Vahap Canbay | Amandla Mabona | Kevin Eloff | Paul Fullwood | Jennifer Ferguson | Annekatrine Kirketerp-Møller | Ida Sofie Goldschmidt | Tine Claeys | Sam van Puyenbroeck | Santiago Nicolas Lopez Carranza | Erwin M. Schoof | Lennart Martens | Jeroen Van Goey | Chiara Frankavilla | Timothy Patrick Jenkins | Konstantinos Kalogeropoulos

Jul 2025
InstaNovo-P: A de novo peptide sequencing model for phosphoproteomics

ChatNT: A Multimodal Conversational Agent for DNA, RNA and Protein Tasks

Bernardo P. de Almeida | Guillaume Richard | Hugo Dalla-Torre | Christopher Blum | Lorenz Hexemer | Priyanka Pandey | Stefan Laurent | Chandana Rajesh | Marie Lopez | Alexandre Laterre | Maren Lang | Uğur Şahin | Karim Beguir | Thomas Pierrot

Nature Machine Intelligence (2025) Jul 2025
ChatNT, the first multimodal conversational agent with an advanced understanding of biological sequences. ChatNT achieves new state-of-the-art results on the Nucleotide Transformer benchmark while being able to solve all tasks at once, in English, and to generalize to unseen questions.

In the Press

Partners