ABSTRACT
Designing sequences with desired properties is a common problem in Biology. Relying exclusively on wet-lab experiments to select sequences is costly and time-consuming so in-silico design is often used as a preliminary step. The latter is hard for three reasons. First, the search space is discrete and large. Second, scoring functions quantifying target properties may be inaccurate, especially if fitted on a limited dataset. Third, not all properties can be modeled in silico or measured in vitro, thus requiring in-vivo experiments for evaluation. Strategies have been developed in the literature to address the first two challenges. As for the third one, there is a consensus that concurrently evaluating batches of sequences, supposedly high-performing and diverse, is a good strategy to maximize the chances that at least one design will meet all desidera. Ideally, this is achieved in one shot. We develop a Quality Diversity approach, to guarantee diversity for any batch size. We show that our method outperforms existing ones in terms of diversity, performance, and hyperparameter sensitivity on three datasets from the literature.