Advancing AI with Bayesian Flow Networks

Advancing AI with Bayesian Flow Networks

Published

Categories

By Alex Graves

The story of Bayesian Flow Networks (BFNs) took years to take shape.

In 2010, I was working with autoregressive models and repeatedly ran into the same constraint: their fixed prediction order, in which a single variable is completely revealed at every step. This rigid structure felt unnatural compared to how we typically gather information, gradually, in varying orders, and with varying levels of confidence. I began to wonder whether a model could instead learn how much to predict at each step, adjusting its confidence dynamically.

A decade later, I conceptualised introducing noise to the data and applying Bayesian updates, enabling a generative process that flows smoothly from complete uncertainty to high certainty. This work developed independently of diffusion models, which were first introduced in 20151 but, remarkably, not on my radar until 2020. While the final equations sometimes look similar, they arise from fundamentally different starting points.

For me, BFNs sit somewhere between autoregressive and diffusion models. Starting from Bayesian principles opened doors that are difficult to reach from the diffusion perspective, particularly for modelling discrete data in a continuous, probabilistic way. Diffusion focuses on reversing a corruption process, forcing you to think backwards through a predefined noise schedule. BFNs instead move forward, updating beliefs as information arrives—a process that feels almost organic, as if the model learns by absorbing knowledge rather than making rigid, pre-set predictions.

Diffusion models don’t diminish BFNs; they’re part of the story, showing how different questions can sometimes lead to similar answers. Much of AI progresses through incremental refinements of established paradigms, but BFNs emerged from asking a bigger question: could we unify generative modelling and Bayesian inference in a continuous-time framework?

Where autoregressive and masked diffusion approaches predict data one discrete step at a time, BFNs model a continuous flow of belief updates, capturing uncertainty as it evolves. Diffusion models excel in continuous domains, but adapting them to discrete data often introduces discontinuities that make training unstable or inefficient. BFNs aim to overcome these limitations, offering generative processes that are both continuous and probabilistically grounded—even for discrete data.

So, how does it work?

A Bayesian approach

At the heart of BFNs is a simple principle: learning by updating beliefs.

In Bayesian inference, we start with a prior view and refine it as new data arrives 2. This mirrors how humans learn and adapt under uncertainty. Bayes’ theorem formalises the process as:

Posterior = Prior × Likelihood

Or more intuitively as: 

What you believe now = What you believed before × How well the new data fits

This principle defines a natural division of labour within the BFN framework: Bayesian inference provides mathematically grounded updates for individual variables, while deep learning captures the complex dependencies that arise in high-dimensional data.

Alice and Bob: the BFN process

The combination of Bayesian inference and deep learning creates a generative framework that is flexible, continuous, and well-suited to discrete and discretised data.

High-dimensional data, such as images or text, is notoriously hard to model because of the intricate dependencies between variables. In the paper, I illustrate this challenge as a dialogue between two fictional characters: Alice, who knows the true data, and Bob, who is trying to infer it 3.

Training unfolds as a conversation. Bob starts with an uninformative prior, his initial guess about the data. At each time step:

  • Alice adds controlled noise to the true data, creating a sender distribution. Early messages are vague; later, they become sharper.
  • Bob predicts what he expects to receive based on his current belief and the noise schedule, forming a receiver distribution.
  • He then updates his belief using a Bayesian rule, progressively narrowing his distribution around the true data based on the noisy samples sent by Alice.

Throughout this process, the neural network observes all variables jointly, learning patterns and dependencies across them. By the final step, Bob’s belief is highly concentrated near the true data, and the network outputs a final sample that integrates the full context of the entire conversation.

This collaboration allows BFNs to combine the mathematically grounded updates of Bayesian inference with the context-aware power of deep learning, refining beliefs holistically and efficiently.

Figure 1: System Overview. The figure represents one step of the modelling process of a Bayesian Flow Network. The data in this example is a ternary symbol sequence, of which the first two variables (‘B’ and ‘A’) are shown. At each step the network emits the parameters of the output distribution based on the parameters of the previous input distribution. The sender and receiver distributions (both of which are continuous, even when the data is discrete) are created by adding random noise to the data and the output distribution respectively. A sample from the sender distribution is then used to update the parameters of the input distribution, following the rules of Bayesian inference. Conceptually, this is the message sent by Alice to Bob, and its contribution to the loss function is the KL divergence from the receiver to the sender distribution.
Figure 1: System Overview. The figure represents one step of the modelling process of a Bayesian Flow Network. The data in this example is a ternary symbol sequence, of which the first two variables (‘B’ and ‘A’) are shown. At each step the network emits the parameters of the output distribution based on the parameters of the previous input distribution. The sender and receiver distributions (both of which are continuous, even when the data is discrete) are created by adding random noise to the data and the output distribution respectively. A sample from the sender distribution is then used to update the parameters of the input distribution, following the rules of Bayesian inference. Conceptually, this is the message sent by Alice to Bob, and its contribution to the loss function is the KL divergence from the receiver to the sender distribution.

What this means

BFNs offer a generative process that works across discrete, discretised, and continuous data, achieving performance on par with existing approaches within a single model. They eliminate the need for a forward corruption step, handle discrete data in a differentiable way, and provide flexibility in the number of sampling steps, sometimes requiring fewer inference steps in practice.

For me, this is more than just another model. It is the realisation of an idea that lingered for a decade, a hope that we could move beyond fixed prediction orders and build generative models that update beliefs as they learn.

The framework is already showing its versatility in biological discovery:

  • ProtBFN for protein sequence modelling
  • AbBFN2 for multi-objective, steerable antibody generation
  • AMix-1 for in silico directed evolution of protein sequences
  • GeoBFN for three-dimensional molecular generation 

Looking ahead, I hope BFNs continue to evolve, perhaps even learning the schedule in which data is predicted and extending to new domains. Over time, I see their potential to move well beyond diffusion-like processes, particularly in settings where the corruption/reversal paradigm is less natural or where discrete structure dominates. By unifying generative modelling and Bayesian inference, BFNs offer a framework that can decide what to predict and when, adaptively updating beliefs without a fixed trajectory. With broader adoption, they could reshape generative modelling itself, especially in high-dimensional, structured, or multimodal data.

This journey began as a question in 2010. Fifteen years later, BFNs represent one possible answer, and the story is still unfolding. The framework is openly available, and I encourage researchers to explore, adapt, and build upon it.

Ready to continue the BFN story? Check out the premiere episode of our new podcast: Let’s Talk Research. Don’t forget to download the paper, and access the model on GitHub.

1 Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., & Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. Proceedings of the 32nd International Conference on Machine Learning (ICML), PMLR 37:2256–2265. arXiv:1503.03585

2  Bayes, T. (1763). An Essay towards solving a Problem in the Doctrine of Chances. Communicated by Mr. Price to John Canton. Philosophical Transactions of the Royal Society of London, 53, 370–418. Retrieved from https://bayes.wustl.edu/Manual/an.essay.pdf

3  Graves, A., Srivastava, R. K., Atkinson, T., & Gómez, F. (2023). Bayesian Flow Networks. arXiv preprint arXiv:2308.07037 (page 1) 


Disclaimer: All claims made are supported by our research paper: Bayesian Flow Networks unless explicitly cited otherwise.