Imagine waking to a world where every genetic sequence is a code waiting to be deciphered, pieces in a giant puzzle.
That’s everyday life for Maša Roller, a senior computational geneticist at InstaDeep, who is using large language models (LLMs) to advance our understanding of genomics. She is an expert on finding the right questions, data and models in biology that our team at InstaDeep need.
What inspired you to work in AI?
I’ve always believed that classical statistical methods, and heuristics, which is what most of bioinformatics and genomics, and computational genomics is based on, can really only get us so far in getting answers to even more complex questions about the way genomes work. In genomics, the arrangement of genetic “words” matters as much, if not more, than the words themselves. I’m intrigued by how we can leverage these models to decode the genome’s secrets and communicate in its language. It’s an exciting field with many researchers, including myself, exploring the application of large language models in genomics.
Particularly, what inspires me to continue working in AI is that large language models and genomics have both reached maturity. It’s like a marriage made in heaven. Understanding genomics is like understanding language: both have their own alphabet, but in genomics, there are only four letters compared to many in spoken languages. What truly matters in genomics, as in language, are the “words” formed by these letters. The complexity lies in how these genetic “words” are arranged, which is vital for the functions of genomes, from creating organisms to affecting our health. Large language models are powerful for understanding the syntax of spoken language, which got me interested in using similar methods to understand the entire human genomes.
What path led you to where you are today?
In university, I studied molecular biology, but very much from a classic scientist – lab coats and pipettes and petri dishes – perspective. But towards the end of university, I got interested in the computational aspects of genomics in particular. I saw the power of using computers to analyse whole genomes and genes. From my PhD onwards, I’ve just been using computational methods. I think that was a little pivot in my career where I specialised in computation, after initially getting a broad molecular biology and genomics background. A year ago, I joined InstaDeep to learn more, and to apply these more advanced models to key questions about genomics to try and look to learn more about them.
What advice would you give to someone beginning their journey in the field?
If you’re starting out, or if you’re a woman in AI, don’t hesitate to step out of your comfort zone and explore new fields. I didn’t have much experience in machine learning (ML) when I applied for my job, but I took the leap anyway, and I’m glad I did. Even though I still have gaps in my knowledge, they’re gradually shrinking because I work with a fantastic team who are happy to share what they know. Don’t be afraid to try new things and learn on the job. Working on real projects with experienced ML engineers is one of the best ways to learn. Seeing how ML methods are applied to solve problems you understand helps you grasp the concepts better.
Here’s your chance to explore new fields in AI and learn on the job:
https://www.instadeep.com/careers