Bayesian Inference

When performing parameter estimation for a model with PEtab.jl, the unknown model parameters are estimated within a frequentist framework, where the goal is to find the maximum likelihood estimate. When prior knowledge about the parameters is available, Bayesian inference offers an alternative approach to fitting a model to data. The aim of Bayesian inference is to infer the posterior distribution of unknown parameters given the data, $\pi(\mathbf{x} \mid \mathbf{y})$, by running a Markov chain Monte Carlo (MCMC) algorithm to sample from the posterior. A major challenge, aside from creating a good model, is to effectively sample the posterior. PEtab.jl supports Bayesian inference via two packages that implement different sampling algorithms:

  • Adaptive Metropolis Hastings Samplers available in AdaptiveMCMC.jl [12].
  • Hamiltonian Monte Carlo (HMC) Samplers available in AdvancedHMC.jl. The default HMC sampler is the NUTS sampler, which is the default in Stan [13, 14]. HMC samplers are often efficient for continuous targets (models with non-discrete parameters).

This tutorial covers how to create a PEtabODEProblem with priors and how to use AdaptiveMCMC.jl and AdvancedHMC.jl for Bayesian inference. It should be noted that this part of PEtab.jl is planned to be moved to a separate package, so the syntax will change and be made more user-friendly in the future.

Note

To use the Bayesian inference functionality in PEtab.jl, the Bijectors.jl, LogDensityProblems.jl, and LogDensityProblemsAD.jl packages must be loaded.

Creating a Bayesian Inference Problem

If a PEtab problem is in the PEtab standard format, priors are defined in the parameter table. Here, we focus on the case when the model is defined directly in Julia, using a simple saturated growth model. First, we create the model and simulate some data:

using Distributions, ModelingToolkit, OrdinaryDiffEq, Plots
using ModelingToolkit: t_nounits as t, D_nounits as D
@mtkmodel SYS begin
    @parameters begin
        b1
        b2
    end
    @variables begin
        x(t) = 0.0
    end
    @equations begin
        D(x) ~ b2 * (b1 - x)
    end
end
@mtkbuild sys = SYS()

# Simulate data with normal measurement noise and σ = 0.03
oprob = ODEProblem(sys, [0.0], (0.0, 2.5), [1.0, 0.2])
tsave = range(0.0, 2.5, 101)
dist = Normal(0.0, 0.03)
sol = solve(oprob, Rodas4(), abstol=1e-12, reltol=1e-12, saveat=tsave)
obs = sol[:x] .+ rand(Normal(0.0, 0.03), length(tsave))
plot(sol.t, obs, seriestype=:scatter, title = "Observed data")
Example block output

Given this, we can now create a PEtabODEProblem (for an introduction, see the starting tutorial):

using DataFrames, PEtab
measurements = DataFrame(obs_id="obs_X", time=sol.t, measurement=obs)
@parameters sigma
obs_X = PEtabObservable(:x, sigma)
observables = Dict("obs_X" => obs_X)

When defining parameters to estimate via PEtabParameter, a prior can be assigned using any continuous distribution available in Distributions.jl. For instance, we can set the following priors:

  • b_1: Uniform distribution between 0.0 and 5.0; Uniform(0.0, 5.0).
  • log10_b2: Uniform distribution between -6.0 and log10(5.0); Uniform(-6.0, log10(5.0)).
  • sigma: Gamma distribution with shape and rate parameters both set to 1.0, Gamma(1.0, 1.0).

Using the following code:

p_b1 = PEtabParameter(:b1, value=1.0, lb=0.0, ub=5.0, scale=:log10, prior_on_linear_scale=true, prior=Uniform(0.0, 5.0))
p_b2 = PEtabParameter(:b2, value=0.2, scale=:log10, prior_on_linear_scale=false, prior=Uniform(-6, log10(5.0)))
p_sigma = PEtabParameter(:sigma, value=0.03, lb=1e-3, ub=1e2, scale=:lin, prior_on_linear_scale=true, prior=Gamma(1.0, 1.0))
pest = [p_b1, p_b2, p_sigma]
3-element Vector{PEtabParameter}:
 PEtabParameter: b1 estimated on log10-scale with bounds [0.0e+00, 5.0e+00] and prior Uniform(a=0.0, b=5.0)
 PEtabParameter: b2 estimated on log10-scale with bounds [1.0e-03, 1.0e+03] and prior Uniform(a=-6.0, b=0.6989700043360189)
 PEtabParameter: sigma estimated on lin-scale with bounds [1.0e-03, 1.0e+02] and prior Gamma(α=1.0, θ=1.0)

When specifying priors, it is important to keep in mind the parameter scale (where log10 is the default). In particular, when prior_on_linear_scale=false, the prior applies to the parameter scale, so for b2 above, the prior is on the log10 scale. If prior_on_linear_scale=true (the default), the prior is on the linear scale, which applies to b1 and sigma above. If a prior is not specified, the default prior is a Uniform distribution on the parameter scale, with bounds corresponding to the upper and lower bounds specified for the PEtabParameter. With these priors, we can now create the PEtabODEProblem.

osolver = ODESolver(Rodas5P(), abstol=1e-6, reltol=1e-6)
model = PEtabModel(sys, observables, measurements, pest)
petab_prob = PEtabODEProblem(model; odesolver=osolver)
PEtabODEProblem: ODESystemModel with ODE-states 1 and 3 parameters to estimate
---------------- Problem options ---------------
Gradient method: ForwardDiff
Hessian method: ForwardDiff
ODE-solver nllh: Rodas5P
ODE-solver gradient: Rodas5P

Bayesian Inference (General Setup)

The first step in in order to run Bayesian inference is to construct a PEtabLogDensity. This structure supports the LogDensityProblems.jl interface, meaning it contains all the necessary methods for running Bayesian inference:

using Bijectors, LogDensityProblems, LogDensityProblemsAD
target = PEtabLogDensity(petab_prob)
PEtabLogDensity with 3 parameters to infer

When performing Bayesian inference, the settings for the ODE solver and gradient computations are those specified in petab_prob. In this case, we use the default gradient method (ForwardDiff) and simulate the ODE model using the Rodas5P ODE solver.

One important consideration before running Bayesian inference is the starting point. For simplicity, we here use the parameter vector that was used for simulating the data, but note that typically inference should be performed using at least four chains from different starting points [15]:

x = get_x(petab_prob)

Lastly, when performing Bayesian inference with PEtab.jl, it is important to note that inference is performed on the prior scale. For instance, if a parameter has scale=:log10, but the prior is defined on the linear scale (prior_on_linear_scale=true), inference is performed on the linear scale. Additionally, Bayesian inference algorithms typically prefer to operate in an unconstrained space, so a bounded prior like Uniform(0.0, 5.0) is not ideal. To address this, bounded parameters are transformed to be unconstrained.

In summary, for a parameter vector on the PEtab parameter scale (x), for inference we must transform to the prior scale (xprior), and then to the inference scale (xinference). This can be done via:

xprior = to_prior_scale(petab_prob.xnominal_transformed, target)
xinference = target.inference_info.bijectors(xprior)
3-element Vector{Float64}:
 -1.3862943611198906
  1.3329014098421619
 -3.506557897319982
Warn

To get correct inference results, it is important that the starting value is on the transformed parameter scale (as xinference above).

Bayesian inference with AdvancedHMC.jl (NUTS)

Given a starting point we can run the NUTS sampler with 2000 samples, and 1000 adaptation steps:

using AdvancedHMC
# δ=0.8 - acceptance rate (default in Stan)
sampler = NUTS(0.8)
res = sample(target, sampler, 2000; n_adapts = 1000, initial_params = xinference,
             drop_warmup=true, progress=false)
[ Info: Found initial step size 0.00625

Any other algorithm found in AdvancedHMC.jl documentation can also be used. To get the output in an easy to interact with format, we can convert it to a MCMCChains

using MCMCChains
chain_hmc = PEtab.to_chains(res, target)
Chains MCMC chain (2000×3×1 Array{Float64, 3}):

Iterations        = 1:1:2000
Number of chains  = 1
Samples per chain = 2000
parameters        = b1, b2, sigma

Summary Statistics
  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   e     Symbol   Float64   Float64   Float64    Float64    Float64   Float64     ⋯

          b1    1.9017    0.9883    0.0574   377.6980   263.1413    0.9996     ⋯
          b2   -0.9545    0.2196    0.0117   381.0287   262.2573    0.9996     ⋯
       sigma    0.0320    0.0023    0.0001   698.3782   561.7823    1.0069     ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5% 
      Symbol   Float64   Float64   Float64   Float64   Float64 

          b1    0.8483    1.1982    1.5552    2.2964    4.6526
          b2   -1.4258   -1.1033   -0.9173   -0.7886   -0.6055
       sigma    0.0278    0.0303    0.0318    0.0334    0.0368

which we can also plot:

using Plots, StatsPlots
plot(chain_hmc)
Example block output
Note

When converting the output to a MCMCChains the parameters are transformed to the prior-scale (inference scale).

Bayesian inference with AdaptiveMCMC.jl

Given a starting point we can run the robust adaptive MCMC sampler for $100 \, 000$ iterations with:

using AdaptiveMCMC
# target.logtarget = posterior logdensity
res = adaptive_rwm(xinference, target.logtarget, 100000; progress=false)

and we can convert the output to a MCMCChains

chain_adapt = to_chains(res, target)
plot(chain_adapt)
Example block output

Any other algorithm found in AdaptiveMCMC.jl documentation can also be used.

References

[12]
M. Vihola. Ergonomic and reliable Bayesian inference with adaptive Markov chain Monte Carlo. Wiley statsRef: statistics reference online, 1–12 (2014).
[13]
M. D. Hoffman, A. Gelman and others. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1593–1623 (2014).
[14]
B. Carpenter, A. Gelman, M. D. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. A. Brubaker, J. Guo, P. Li and A. Riddell. Stan: A probabilistic programming language. Journal of statistical software 76 (2017).
[15]
A. Gelman, A. Vehtari, D. Simpson, C. C. Margossian, B. Carpenter, Y. Yao, L. Kennedy, J. Gabry, P.-C. Bürkner and M. Modrák. Bayesian workflow, arXiv preprint arXiv:2011.01808 (2020).