InstaDeep presents six papers at ICLR 2024

Published

Categories

InstaDeep maintains its strong commitment to open research with six papers accepted for presentation at the 2024 ICLR conference being held in Vienna this week.

The accepted papers cover a diverse range of subjects including Decision-making AI and Machine Learning for biology. Detailed summaries of each paper and their respective presentation slots are provided below.

“The research team at InstaDeep has once again demonstrated exemplary work across various domains. The acceptance of these six papers serves as a significant recognition of our contributions to the broader AI research community. We are excited to share our findings at ICLR in Vienna!” commented Alex Laterre, Head of Research.

Publications

Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX

Jumanji offers a varied collection of scalable reinforcement learning environments written in JAX. The goal of Jumanji is to help pioneer a new wave of hardware-accelerated research and development in the field of reinforcement learning. Jumanji’s high-speed environments enable faster iteration and large-scale experimentation while simultaneously reducing complexity.


Model-Based Reinforcement Learning for Protein Backbone Design

We propose an AlphaZero algorithm tailored for the design of protein backbone meeting shape and structural scoring requirements.  We extend an existing Monte Carlo tree search (MCTS) framework by incorporating a novel threshold-based reward and secondary objectives to improve design precision, improving upon the baseline by more than 100% in top-down protein design tasks..


Machine Learning of Force Fields for Molecular Dynamics Simulations of Proteins at DFT Accuracy

We present a Deep Learning-based molecular force field by combining the MACE architecture with a physics-informed loss function inspired by the PhysNet approach. We demonstrate running stable and accurate GPU-accelerated Molecular Dynamics simulations and energy minimisations with our method. Moreover, we provide an in-depth discussion of the strengths and limitations of the approach.


Protein binding affinity prediction under multiple substitutions based on eGNNs with residue and atomic graphs and language model information: eGRAL

We apply SE(3) equivariant graph neural networks (eGNN) on protein complex structures to score mutational effects from multiple substitution on protein-protein interaction binding affinity. eGRAL works with both atom and residue level graphs, as well as with embeddings generated with protein language models, leveraging information from three scales: atomic, residue and evolutionary.


Exploring Genomic Language Models on Protein Downstream Tasks

We apply genomic language models (gLM) to protein tasks with a view towards a unified approach to modeling genomics and proteomics. Our work shows that gLMs excel with curated true coding sequences (CDS) over sampling strategies. On our curated true CDS benchmark, gLMs match or exceed protein models performance on three of the five tasks. Notably, a hybrid gLM-pLM architecture surpasses individual model performance.


Advancing DNA Language Models: The Genomics Long-Range Benchmark

The Genomics Long-Range Benchmark encompasses a set of tasks to test DNA LM’s capabilities to capture long range sequence interactions. We evaluated a set of DNA LM’s on our benchmark, comparing against supervised baseline Enformer. A study of context length extension methods for one short range DNA LM was conducted, showcasing an increase in performance with larger context length. 


Come and meet the InstaDeep team on site during ICLR, and learn more about their work. You can also check out all our open opportunities at www.instadeep.com/careers.