So far we've treated the target function
To summarize, instead of individual solutions
All measurements, models, and discretizations that we are working with exhibit uncertainties. For measurements and observations, they typically appear in the form of measurement errors. Model equations, on the other hand, usually encompass only parts of a system we're interested in (leaving the remainder as an uncertainty), while for numerical simulations we inevitably introduce discretization errors. In the context of machine learning, we additionally have errors introduced by the trained model. All these errors and unclear aspects make up the uncertainties of the predicted outcomes, the predictive uncertainty. For practical applications it's crucial to have means for quantifying this uncertainty. This is a central motivation for working with probabilistic models, and for adjacent fields such as in "uncertainty quantification" (UQ).
The predictive uncertainty in many cases can
be distinguished in terms of two types of uncertainty:
- _Aleatoric_ uncertainty denotes uncertainty within the data, e.g., noise in measurements.
- _Epistemic_ uncertainty, on the other hand, describes uncertainties within a model such as a trained neural network.
A word of caution is important here:
while this distinction seems clear cut, both effects overlay and can be difficult to tell apart. E.g., when facing discretization errors, uncertain outcomes could be caused by unknown ambiguities in the data, or by a suboptimal discrete representation.
These aspects can be very difficult to disentangle in practice.
Closely aligned, albeit taking a slightly different perspective, are so-called simulation-based inference (SBI) methods. Here the main motivation is to estimate likelihoods in computer-based simulations, so that reliable probability distributions for the solutions can be obtained. The SBI viewpoint provides a methodological approach for working with computer simulations and uncertainties, and will provide a red thread for the following sections.
At this point it's important to revisit the central distinction between forward and inverse ("backward") problems: most classic numerical methods target ➡️ forward ➡️ problems to compute solutions for steady-state or future states of a system.
Forward problems arise in many settings, but across the board, at least as many problems are ⬅️ inverse ⬅️ problems, where a forward simulation plays a central role, but the main question is not a state that it generates, but rather the value of parameter of simulator to explain a certain measurement or observation. To formalize this, our simulator
In the following, we will focus on inverse problems, as these best illustrate the capabilities of the probabilistic modeling, but the algorithms discussed are not exclusively applicable to inverse problems (an example will follow).
For inverse problems, it is in practice not sufficient to match a single observation
To formalize these inverse problems let's consider
a vector-valued input$x$ that can contain states and / or
the aforementioned parameters (like
For
The function for the conditional probability $p(y|x)$ is called the likelihood function, and is a crucial value in the following. Note that it does not depend on
With a function for the likelihood we can compute the
distribution of the posterior, the main quantity we're after,
in the following way:
The evidence can be computed with stochastic methods such as Markov Chain Monte Carlo (MCMC). It primarily "normalizes" our posterior distribution and is typically easier to obtain than the likelihood, but nonetheless still a challenging term.
:class: tip
This is were deep learning turns out to be extremely useful: we can use it to train a conditional density estimator $q_\theta(x|y)$ for the posterior $p(x|y)$ that allows sampling, and can be trained from simulations $y \sim p(y|x)$ alone.
Deep learning has been instrumental to provide new ways of addressing the classic challenges of obtaining accurate estimates of posterior distributions, and this is what we'll focus on in this chapter. Previously, we called our neural networks
Looking ahead, the learned SBI methods, i.e. approaches for computing posterior distributions, have the following properties:
✅ Pro:
- Fast inference (once trained)
- Less affected by curse of dimensionality
- Can represent arbitrary priors
❌ Con:
- Require costly upfront training
- Lacks rigorous theoretical guarantees
In the following we'll explain how to obtain and derive a very popular and powerful family of methods that can be summarized as diffusion models. We could simply provide the final algorithm (which will turn out to be surprisingly simple), but it's actually very interesting to see where it all comes from. We'll focus on the basics, and leave the physics-based extensions (i.e. including differentiable simulators) for a later section. The path towards diffusion models also introduces a few highly interesting concepts from machine learning along the way, and provides a nice "red thread" for discussing seminal papers from the past few years. Here we go...
A classic variant that should be mentioned here are "Bayesian Neural Networks". They
follow Bayes more closely, and pre-scribe a prior distribution on the neural network
parameters to learn the posterior distribution. Every weight and bias in the NN are assumed to be Gaussian with an own mean and variance, which are adjusted at training time. For inference, we can then "sample" a network, and use it like any regular NN.
Despite being a very good idea on paper, this method turned out to have problems with learning complex distributions, and requires careful tuning of the hyperparameters involved. Hence, these days, it's strongly recommended to use flow matching (or at least a diffusion model) instead.
If you're interested in details, BNNs with a code example can be found, e.g., in v0.3 of PBDL: https://arxiv.org/abs/2109.05237v3 .