Bayesian Inference

When fitting a model with PEtab.jl the unknown model parameters are estimated within a frequentist framework, and the goal is to find the maximum likelihood estimate. When prior knowledge about the parameters is available, Bayesian inference is an alternative approach to fitting the model to data. The aim with Bayesian inference is to infer the posterior distribution of unknown parameters given the data, $p(\theta | y)$ by running Markov chain Monte Carlo (MCMC) algorithm that samples from the Posterior.

PEtab.jl supports Bayesian inference via three packages:

Adaptive Metropolis Hastings Samplers available in AdaptiveMCMC.jl
Hamiltonian Monte Carlo (HMC) Samplers: available in AdvancedHMC.jl. Here the default choice is the NUTS sampler, which is used by Turing.jl, and is also the default in Stan. HMC samplers are often more efficient than other methods.
Adaptive Parallel Tempering Samplers: available in Pigeons.jl. Parallel tempering is most suitable for multi-modal (it can jump modes) or non-identifiable posteriors.

This document covers how to create a PEtabODEProblem with priors, and how to use AdaptiveMCMC.jl, AdvancedHMC.jl, and Pigeons.jl for Bayesian inference.

Note

To use the Bayesian inference functionality in PEtab.jl, the Bijectors, LogDensityProblems, and LogDensityProblemsAD packages must be loaded. For parallel tempering Pigeons and MCMCChains must also be loaded.

Setting up a Bayesian inference problems

If the PEtab problem is in the PEtab standard format, priors for model parameters can be defined in the Parameter table. Here, we show how to set up priors for a PEtabODEProblem defined directly in Julia, using a simple saturated growth model as an example. First, we create the model and simulate some data:

using ModelingToolkit, OrdinaryDiffEq, PEtab, Plots

# Dynamic model
@parameters b1, b2
@variables t x(t)
D = Differential(t)
eqs = [
    D(x) ~ b2*(b1 - x)
]
specie_map = [x => 0]
parameter_map = [b1 => 1.0, b2 => 0.2]
@named sys = ODESystem(eqs)

# Simulate data
using Random, Distributions
Random.seed!(1234)
oprob = ODEProblem(sys, specie_map, (0.0, 2.5), parameter_map)
tsave = collect(range(0.0, 2.5, 101))
dist = Normal(0.0, 0.03)
_sol = solve(oprob, Rodas4(), abstol=1e-12, reltol=1e-12, saveat=tsave, tstops=tsave)
# Normal measurement noise with σ = 0.03
obs = _sol[:x] .+ rand(Normal(0.0, 0.03), length(tsave))
plot(_sol.t, obs, seriestype=:scatter, title = "Observed data")

Now we can provide the rest of information needed for setting up a PEtabODEProblem (for a starter on setting up a PEtabODEProblem see here):

using DataFrames
# Measurement data
measurements = DataFrame(
    obs_id="obs_X",
    time=_sol.t,
    measurement=obs)
# Observable
@parameters sigma
obs_X = PEtabObservable(x, sigma)
observables = Dict("obs_X" => obs_X)

Dict{String, PEtabObservable{Symbolics.Num, Symbolics.Num}} with 1 entry:
  "obs_X" => PEtabObservable: h = x(t), noise-formula = sigma and normal (Gauss…

When defining the parameters to infer, we can assign a prior using any continuous distribution available in Distributions.jl. For instance, we can set the following priors:

$b_1$: Uniform distribution between 0.0 and 5.0; $b_1 \sim \mathcal{U}(0.0, 5.0)$.
$\mathrm{log}_{10}(b_2)$ : Uniform distribution between -6.0 and $\mathrm{log}_{10}(5.0)$, $\mathrm{log}_{10}(b_2) \sim \mathcal{U}\big(-6.0, \mathrm{log}_{10}(5.0) \big)$.
$\sigma$ : Gamma distribution with shape and rate parameters both set to 1.0, $\sigma \sim \mathcal{G}(1.0, 1.0)$.

_b1 = PEtabParameter(b1, value=1.0, lb=0.0, ub=5.0, scale=:log10, prior_on_linear_scale=true, prior=Uniform(0.0, 5.0))
_b2 = PEtabParameter(b2, value=0.2, scale=:log10, prior_on_linear_scale=false, prior=Uniform(-6, log10(5.0)))
_sigma = PEtabParameter(sigma, value=0.03, lb=1e-3, ub=1e2, scale=:lin, prior_on_linear_scale=true, prior=Gamma(1.0, 1.0))
parameters_ets = [_b1, _b2, _sigma]

3-element Vector{PEtabParameter}:
 PEtabParameter b1. Estimated on log10-scale with bounds [0.0e+00, 5.0e+00] and prior Distributions.Uniform{Float64}(a=0.0, b=5.0)
 PEtabParameter b2. Estimated on log10-scale with bounds [1.0e-03, 1.0e+03] and prior Distributions.Uniform{Float64}(a=-6.0, b=0.6989700043360189)
 PEtabParameter sigma. Estimated on lin-scale with bounds [1.0e-03, 1.0e+02] and prior Distributions.Gamma{Float64}(α=1.0, θ=1.0)

When specifying priors in PEtab.jl, it is important to note that parameters are by default estimated on the $\mathrm{log}_{10}$ scale (can be changed by scale argument). When prior_on_linear_scale=false the prior applies to this parameter scale (default $\mathrm{log}_{10}$), therefore the prior for b2 is on the $\mathrm{log}_{10}$ scale. If prior_on_linear_scale=true, the prior is in the linear scale, which in this case holds for b1 and sigma. If a prior is not specified, the default prior is a Uniform distribution on the parameter scale, with bounds that correspond to upper and lower bounds specified for the PEtabParameter.

With the priors defined, we can proceed to create a PEtabODEProblem.

petab_model = PEtabModel(sys, observables, measurements, parameters_ets; state_map=specie_map, verbose=false)
petab_problem = PEtabODEProblem(petab_model; ode_solver=ODESolver(Rodas5(), abstol=1e-6, reltol=1e-6), verbose=false)

Bayesian inference (general setup)

The first step to performing Bayesian inference is to construct a PEtabLogDensity. This structure supports the LogDensityProblems.jl interface, meaning it supports all the necessary methods for performing Bayesian inference. To create it provide the PEtabODEProblem as an argument.

using Bijectors, LogDensityProblems, LogDensityProblemsAD
target = PEtabLogDensity(petab_problem)

PEtabLogDensity with 3 parameters to infer.

When later performing Bayesian inference, the settings for the ODE solver and gradient computations are those in petab_problem. For instance, in this case, we use the default gradient method (ForwardDiff) and simulate the ODE model with the Rodas5 ODE solver.

Inference can now be performed. Although the choice of a starting point for the inference process is crucial, for simplicity we use the parameter vector used for simulating the data (note that the second parameter is on $\mathrm{log}_{10}$ scale).

xpetab = petab_problem.θ_nominalT

3-element Vector{Float64}:
  0.0
 -0.6989700043360187
  0.03

Lastly, when conducting Bayesian inference with PEtab.jl, an important note is that inference is performed on the prior scale. For instance, if a parameter is set with scale=:log10, but the prior is defined on the linear scale (prior_on_linear_scale=true), then inference is performed on the linear scale. Moreover, Bayesian inference algorithms typically prefer to operate in an unconstrained space, that is a prior such as $b_1 \sim \mathcal{U}(0.0, 5.0)$, where the parameter is bounded is not ideal. To address this, bounded parameters are transformed to be unconstrained.

In summary, for a parameter vector on the PEtab parameter scale (xpetab), for inference we must transform to the prior scale (xprior), and then to the inference scale (xinference). This can be done via:

xprior = to_prior_scale(petab_problem.θ_nominalT, target)
xinference = target.inference_info.bijectors(xprior)

3-element Vector{Float64}:
 -1.3862943611198906
  1.3329014098421619
 -3.506557897319982

Warn

To get correct inference results, it is important that the starting value is on the transformed parameter scale (as xinference above).

Bayesian inference with AdvancedHMC.jl (NUTS)

Given a starting point we can run the NUTS sampler with 2000 samples, and 1000 adaptation steps:

using AdvancedHMC
# δ=0.8 - acceptance rate (default in Stan)
sampler = NUTS(0.8)
Random.seed!(1234)
res = sample(target, sampler,
             2000;
             nadapts = 1000,
             initial_params = xinference,
             drop_warmup=true,
             progress=false)

[ Info: Found initial step size 0.00625

Any other algorithm found in AdvancedHMC.jl documentation can also be used.

To put the output into an easy to interact with format, we can convert it to a MCMCChains

using MCMCChains
chain_hmc = PEtab.to_chains(res, target)

Chains MCMC chain (2000×3×1 Array{Float64, 3}):

Iterations        = 1:1:2000
Number of chains  = 1
Samples per chain = 2000
parameters        = b1, b2, sigma

Summary Statistics
  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   e ⋯
      Symbol   Float64   Float64   Float64    Float64    Float64   Float64     ⋯

          b1    1.8903    0.9385    0.0663   229.4717   265.8819    1.0004     ⋯
          b2   -0.9551    0.2144    0.0145   230.3404   273.5797    1.0004     ⋯
       sigma    0.0316    0.0024    0.0001   850.8247   865.1639    1.0005     ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

          b1    0.8427    1.2146    1.5631    2.3041    4.3950
          b2   -1.4008   -1.1038   -0.9175   -0.7929   -0.6042
       sigma    0.0274    0.0299    0.0313    0.0331    0.0364

which we can also plot:

using Plots, StatsPlots
plot(chain_hmc)

Note

When converting the output to a MCMCChains the parameters are transformed to the prior-scale (inference scale).

Bayesian inference with AdaptiveMCMC.jl

Given a starting point we can run the robust adaptive MCMC sampler with $200 \, 000$ by:

using AdaptiveMCMC
Random.seed!(123)
# target.logtarget = posterior logdensity
res = adaptive_rwm(xinference, target.logtarget, 200000; progress=false)

and we can convert the output to a MCMCChains

chain_adapt = to_chains(res, target)
plot(chain_adapt)

Any other algorithm found in AdaptiveMCMC.jl documentation can also be used.

Bayesian inference with Pigeons.jl

When using a parallel tempering algorithm an easy to sample from reference distribution that covers a wide range of parameter space is needed to, for example, jump between modes. For a PEtabODEProblem, this reference is the prior distribution, and to set it up do:

reference = PEtabPigeonReference(petab_problem)

Given a starting point we can now take $2^{10}$ adaptive parallel tempering samples by:

using Pigeons
Random.seed!(123)
target.initial_value .= xinference
pt = pigeons(target = target,
             reference = reference,
             n_rounds=10, # 2^10 samples
             record = [traces; record_default()])

──────────────────────────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)    min(αₑ)   mean(αₑ)
────────── ────────── ────────── ────────── ────────── ────────── ────────── ────────── ──────────
        2       2.25       3.95   6.59e+08        189   8.44e-14       0.75          1          1
        4       2.75       0.56   9.24e+08        197      0.013      0.695          1          1
        8          3      0.964   1.65e+09        198      0.338      0.667          1          1
       16       2.89       1.93   3.37e+09        195      0.222      0.679          1          1
       32       3.54       3.82   6.68e+09        196      0.356      0.606          1          1
       64       3.79        7.6   1.32e+10        194      0.429      0.579          1          1
      128       3.46       15.4   2.66e+10        196      0.478      0.615          1          1
      256       3.75       30.4   5.27e+10        195      0.542      0.584          1          1
      512        3.7       60.8   1.05e+11        195      0.564      0.589          1          1
 1.02e+03       3.72        123   2.12e+11        195       0.55      0.587          1          1
──────────────────────────────────────────────────────────────────────────────────────────────────

and we can convert the output to a MCMCChains

pt_chain = to_chains(pt, target)
plot(pt_chain)

Here we used the default SliceSampler as local explorer. Alternative explorers, such as an adaptive MALA, can be used instead by setting explorer=AutoMALA() in the pigeons function call. More information can be found in Pigeons.jl documentation.