ABSTRACT
Predicting the biophysical and functional properties of proteins is essential for in silico protein design. Machine learning has emerged as a promising technique for learning such prediction tasks. However, the relative scarcity of in vitro annotations means that these models often have little, or no, specific data on the desired fitness prediction task. As a result of limited data, protein language models (PLMs) are typically trained on general protein sequence modeling tasks, and then finetuned, or applied zero-shot, to protein fitness prediction. When no task data is available, the models make strong assumptions about the correlation between the protein sequence likelihood and fitness scores. In contrast, instead of restricting the representations, we propose meta-learning over a distribution of standard fitness prediction tasks, and demonstrate positive transfer to unseen fitness prediction tasks. Our method, called Metalic (Meta-Learning In-Context), makes use of in-context learning and fine-tuning, when data is available, to adapt to new tasks. Crucially, the fine-tuning enables considerable generalization, even though it is not accounted for during meta-training. The fine-tuned models achieve strong results with 18 times fewer parameters than state-of-the-art models. Moreover, our method sets a new state-of-the-art on ProteinGym, an established fitness-prediction benchmark. We believe that meta-learning across protein fitness tasks will play a vital role in advancing protein fitness prediction methods.