Parameter Estimation Methods

The main function of PEtab.jl is to create parameter estimation problems and provide runtime-efficient gradient and Hessian functions for estimating unknown model parameters using suitable numerical optimization algorithms. Specifically, the parameter estimation problems considered by PEtab.jl are on the form:

\[\min_{\mathbf{x} \in \mathbb{R}^N} -\ell(\mathbf{x}), \quad \text{subject to} \\ \mathbf{lb} \leq \mathbf{x} \leq \mathbf{ub}\]

Where, since PEtab.jl works with likelihoods (see the API documentation on PEtabObservable), $-\ell(\mathbf{x})$ is a negative log-likelihood, and $\mathbf{lb}$ and $\mathbf{ub}$ are the lower and upper parameter bounds. For a good introduction to parameter estimation for ODE models in biology (which is applicable to other fields as well), see [9].

This advanced section of the documentation focuses on PEtab.jl's parameter estimation functionality, and before reading this part, we recommended the starting tutorial. Specifically, this section of the documentation covers available and recommended optimization algorithms, how to plot optimization results, and how to perform automatic model selection. First though, it covers how to perform parameter estimation. While the PEtabODEProblem contains all the necessary information for wrapping a suitable optimizer to solve the problem (see here), manually wrapping optimizers is cumbersome. Therefore, PEtab.jl provides convenient wrappers for:

  • Single-start parameter estimation
  • Multi-start parameter estimation
  • Creating an OptimizationProblem to access the solvers in Optimization.jl

As a working example, we use the Michaelis-Menten enzyme kinetics model from the starting tutorial. Even though the code below presents the model as a ReactionSystem, everything works the same if the model is provided as an ODESystem.

using Catalyst, PEtab

# Create the dynamic model
t = default_t()
rn = @reaction_network begin
    @parameters S0 c3=1.0
    @species S(t)=S0
    c1, S + E --> SE
    c2, SE --> S + E
    c3, SE --> P + E
end
speciemap = [:E => 50.0, :SE => 0.0, :P => 0.0]

# Observables
@unpack E, S = rn
obs_sum = PEtabObservable(S + E, 3.0)
@unpack P = rn
@parameters sigma
obs_p = PEtabObservable(P, sigma)
observables = Dict("obs_p" => obs_p, "obs_sum" => obs_sum)

# Parameters to estimate
p_c1 = PEtabParameter(:c1)
p_c2 = PEtabParameter(:c2)
p_s0 = PEtabParameter(:S0)
p_sigma = PEtabParameter(:sigma)
pest = [p_c1, p_c2, p_s0, p_sigma]

# Simulate measurement data with 'true' parameters
using OrdinaryDiffEq, DataFrames
ps = [:c1 => 1.0, :c2 => 10.0, :c3 => 1.0, :S0 => 100.0]
u0 = [:S => 100.0, :E => 50.0, :SE => 0.0, :P => 0.0]
tspan = (0.0, 10.0)
oprob = ODEProblem(rn, u0, tspan, ps)
sol = solve(oprob, Rodas5P(); saveat = 0:0.5:10.0)
obs_sum = (sol[:S] + sol[:E]) .+ randn(length(sol[:E]))
obs_p = sol[:P] + .+ randn(length(sol[:P]))
df_sum = DataFrame(obs_id = "obs_sum", time = sol.t, measurement = obs_sum)
df_p = DataFrame(obs_id = "obs_p", time = sol.t, measurement = obs_p)
measurements = vcat(df_sum, df_p)

model = PEtabModel(rn, observables, measurements, pest; speciemap = speciemap)
petab_prob = PEtabODEProblem(model)

Single-Start Parameter Estimation

Single-start parameter estimation is an approach where a numerical optimization algorithm is run from a starting point x0 until it hopefully reaches a local minimum. When performing parameter estimation, the objective function generated by a PEtabODEProblem expects the parameters to be in a specific order. The most straightforward way to obtain a correctly ordered vector is via the get_x function:

PEtab.get_xFunction
get_x(prob::PEtabODEProblem; linear_scale = false)::ComponentArray

Get the nominal parameter vector with parameters in the correct order expected by prob for parameter estimation/inference. Nominal values can optionally be specified when creating a PEtabParameter, or in the parameters table if the problem is provided in the PEtab standard format.

For ease of interaction (e.g., changing values), the parameter vector is returned as a ComponentArray. For how to interact with a ComponentArray, see the documentation and the ComponentArrays.jl documentation.

See also PEtabParameter.

Keyword argument

  • linear_scale: Whether to return parameters on the linear scale. By default, parameters are returned on the scale they are estimated, which by default is log10 as this often improves parameter estimation performance.
source

For our working example, we have:

x0 = get_x(petab_prob)
ComponentVector{Float64}(log10_c1 = 2.6989704386302837, log10_c2 = 2.6989704386302837, log10_S0 = 2.6989704386302837, log10_sigma = 2.6989704386302837)

As discussed in the starting tutorial, x0 is a ComponentArray, meaning it holds both parameter values and names. Additionally, parameters like c1 have a log10 prefix, as the parameter (by default) is estimated on the log10 scale, which typically improves performance [2, 3]. Interacting with a ComponentArray is straightforward, for example, to change c1 to 10.0 do:

x0.log10_c1 = log10(10.0)

or alteratively:

x0[:log10_c1] = log10(10.0)
Note

When setting values in the starting point vector x0, the new value should be provided on the parameter's scale, which is log10 by default.

Given a starting point, parameter estimation can be performed with the calibrate function:

PEtab.calibrateFunction
calibrate(prob::PEtabODEProblem, x0, alg; kwargs...)::PEtabOptimisationResult

From starting point x0 using optimization algorithm alg, estimate unknown model parameters for prob, and get results as a PEtabOptimisationResult.

x0 can be a Vector or a ComponentArray, where the individual parameters must be in the order expected by prob. To get a vector in the correct order, see get_x.

A list of available and recommended optimization algorithms (alg) can be found in the documentation. Briefly, supported algorithms are from:

  • Optim.jl: LBFGS(), BFGS(), or IPNewton() methods.
  • Ipopt.jl: IpoptOptimizer() interior-point Newton method.
  • Fides.py: Fides() Newton trust region method.

Different ways to visualize the parameter estimation result can be found in the documentation.

See also PEtabOptimisationResult, Fides and IpoptOptimizer

Keyword Arguments

  • save_trace::Bool = false: Whether to save the optimization trace of the objective function and parameter vector. Only applicable for some algorithms; see the documentation for details.
  • options = DEFAULT_OPTIONS: Configurable options for alg. The type and available options depend on which package alg belongs to. For example, if alg = IPNewton() from Optim.jl, options should be provided as an Optim.Options() struct. A list of configurable options can be found in the documentation.
source

For information and recommendations on algorithms (alg), see this page. For our working example, following the recommendations, we use Optim.jl's Interior-Point Newton method (IPNewton):

using Optim
res = calibrate(petab_prob, x0, IPNewton())
PEtabOptimisationResult
---------------- Summary ---------------
min(f)                = 7.58e+01
Parameters estimated  = 4
Optimiser iterations  = 23
Runtime               = 6.0e-01s
Optimiser algorithm   = Optim_IPNewton

The result from calibrate is returned as a PEtabOptimisationResult which holds the relevant statistics from the optimization:

PEtab.PEtabOptimisationResultType
PEtabOptimisationResult

Parameter estimation statistics from single-start optimization with calibrate.

See also: calibrate

Fields

  • xmin: Minimizing parameter vector found by the optimization.
  • fmin: Minimum objective value found by the optimization.
  • x0: Starting parameter vector.
  • alg: Parameter estimation algorithm used.
  • niterations: Number of iterations for the optimization.
  • runtime: Runtime in seconds for the optimization.
  • xtrace: Parameter vector optimization trace. Empty if save_trace = false was provided to calibrate.
  • ftrace: Objective function optimization trace. Empty if save_trace = false was provided to calibrate.
  • converged: Convergence flag from alg.
  • original: Original result struct returned by alg. For example, if alg = IPNewton() from Optim.jl, original is the Optim return struct.
source

The result from calibrate can also be plotted. For example, to see how well the model fits the data, the fit can be plotted as:

using Plots
plot(res, petab_prob)
Example block output

Information on other available plots can be found on this page. Now, even though the plot above may look good, it is important to remember that the objective function ($-\ell$ above) often has multiple local minima. To ensure the global optimum is found, a global optimization approach is needed. One effective global method is multi-start parameter estimation.

Multi-Start Parameter Estimation

Multi-start parameter estimation is an approach where n parameter estimation runs are initiated from n random starting points. The rationale is that a subset of these runs should, hopefully, converge to the global optimum. While simple, empirical benchmark studies have shown that this method performs well for ODE models in biology [2, 2].

The first step for multi-start parameter estimation is to generate n starting points. While random uniform sampling may initially seem like a good approach, random points tend to cluster. Instead, it's better to use a Quasi-Monte Carlo method, such as Latin hypercube sampling, to generate more spread-out starting points. This approach has been shown to improve performance [2]. The difference can quite clearly be seen generating 100 random points and 50 Latin hypercube-sampled points on the plane.

using Distributions, QuasiMonteCarlo, Plots
s1 = QuasiMonteCarlo.sample(100, [-1.0, -1.0], [1.0, 1.0], Uniform())
s2 = QuasiMonteCarlo.sample(100, [-1.0, -1.0], [1.0, 1.0], LatinHypercubeSample())
p1 = plot(s1[1, :], s1[2, :], title = "Uniform sampling", seriestype=:scatter)
p2 = plot(s2[1, :], s2[2, :], title = "Latin Hypercube Sampling", seriestype=:scatter)
plot(p1, p2)
Example block output

For a PEtabODEProblem, Latin hypercube sampled points within the parameter bounds can be generated with the get_startguesses function:

PEtab.get_startguessesFunction
get_startguesses(prob::PEtabODEProblem, n::Integer; kwargs...)

Generate n random parameter vectors within the parameter bounds in prob.

If n = 1, a single random vector is returned. For n > 1, a vector of random parameter vectors is returned. In both cases, parameter vectors are returned as a ComponentArray. For details on how to interact with a ComponentArray, see the documentation and the ComponentArrays.jl documentation.

See also calibrate and calibrate_multistart.

Keyword Arguments

  • sampling_method = LatinHypercubeSample(): Method for sampling a diverse (spread out) set of parameter vectors. Any algorithm from QuasiMonteCarlo is allowed, but the default LatinHypercubeSample is recommended as it usually performs well.
  • sample_prior::Bool = true: Whether to sample random parameter values from the prior distribution if a parameter has a prior.
  • allow_inf::Bool = false: Whether to return parameter vectors for which the likelihood cannot be computed (typically happens because the ODEProblem cannot be solved). Often it only makes sense to use starting points with a computable likelihood for parameter estimation, hence it typically does not make sense to change this option.
source

For our working example, we can generate 50 starting guesses with:

x0s = get_startguesses(petab_prob, 50)

In principle, x0s can now be used together with calibrate to perform multi-start parameter estimation. But, to further simplify this process, PEtab.jl provides a convenient function, calibrate_multistart, which combines start-guess generation and parameter estimation in one step:

PEtab.calibrate_multistartFunction
calibrate_multistart(prob::PEtabODEProblem, alg, nmultistarts::Integer; nprocs = 1,
                     dirsave=nothing, kwargs...)::PEtabMultistartResult

Perform nmultistarts parameter estimation runs from randomly sampled starting points using the optimization algorithm alg to estimate the unknown model parameters in prob.

A list of available and recommended optimisation algorithms (alg) can be found in the package documentation and in the calibrate documentation.

If nprocs > 1, the parameter estimation runs are performed in parallel using the pmap function from Distributed.jl with nprocs processes. If parameter estimation on a single process (nprocs = 1) takes longer than 5 minutes, we strongly recommend setting nprocs > 1, as this can greatly reduce runtime. Note that nprocs should not be larger than the number of cores on the computer.

If dirsave is provided, intermediate results for each run are saved in dirsave. It is strongly recommended to provide dirsave for larger models, as parameter estimation can take hours (or even days!),and without dirsave, all intermediate results will be lost if something goes wrong.

Different ways to visualize the parameter estimation result can be found in the documentation.

See also PEtabMultistartResult, get_startguesses, and calibrate.

Keyword Arguments

  • sampling_method = LatinHypercubeSample(): Method for sampling a diverse (spread out) set of starting points. See the documentation for get_startguesses, which is the function used for sampling.
  • sample_prior::Bool = true: See the documentation for get_startguesses.
  • seed = nothing: Seed used when generating starting points.
  • options = DEFAULT_OPTIONS: Configurable options for alg. See the documentation for calibrate.
source

Two important keyword arguments for calibrate_multistart are dirsave and nprocs. If nprocs > 1, the parameter estimation runs are performed in parallel using the pmap function from Distributed.jl with nprocs processes. Even though pmap introduces some overhead because it must load and compile the code on each process, setting nprocs > 1 often reduces runtime when the parameter estimation is expected to take longer than 5 minutes. Meanwhile, dirsave specifies an optional directory to continuously save the results from each individual run. We strongly recommend providing such a directory, as parameter estimation for larger models can take hours or even days. If something goes wrong with the computer during that time, it is, to put it mildly, frustrating to lose all the results. For our working example, we can perform 50 multistarts in parallel on two processes with:

ms_res = calibrate_multistart(petab_prob, IPNewton(), 50; nprocs = 2,
                              dirsave="path_to_save_directory")
PEtabMultistartResult
---------------- Summary ---------------
min(f)                = 7.58e+01
Parameters estimated  = 4
Number of multistarts = 50
Optimiser algorithm   = Optim_IPNewton
Results saved at path_to_save_directory

The results are returned as a PEtabMultistartResult, which, in addition to printout statistics, contains relevant information for each run:

PEtab.PEtabMultistartResultType
PEtabMultistartResult

Parameter estimation statistics from multi-start optimization with calibrate_multistart.

See also calibrate_multistart and PEtabOptimisationResult.

Fields

  • xmin: Best minimizer across all runs.
  • fmin: Best minimum across all runs.
  • alg: Parameter estimation algorithm.
  • nmultistarts: Number of parameter estimation runs.
  • sampling_method: Sampling method used for generating starting points.
  • dirsave: Path of directory where parameter estimation run statistics are saved if dirsave was provided to calibrate_multistart.
  • runs: Vector of PEtabOptimisationResult with the parameter estimation results for each run.
PEtabMultistartResult(dirres::String; which_run::String="1")

Import multistart parameter estimation results saved at dirres.

Each time a new optimization run is performed, results are saved with unique numerical endings. Results from a specific run can be retrieved by specifying the numerical ending with which_run.

source

Finally, a common approach to evaluate the result of multi-start parameter estimation is through plotting. One widely used evaluation plot is the waterfall plot, which shows the final objective values for each run:

plot(ms_res; plot_type=:waterfall)
Example block output

In the waterfall plot, each plateau corresponds to different local optima (represented by different colors). Since many runs (dots) are found on the plateau with the smallest objective value, we can be confident that the global optimum has been found. In addition to waterfall plots, more plotting options can be found on this page.

Creating an OptimizationProblem

Optimization.jl is a Julia package that provides a unified interface for over 100 optimization algorithms (see their documentation for the complete list). While Optimization.jl is undoubtedly useful, it is currently undergoing heavy updates, so at the moment we do not recommend it as the default choice for parameter estimation.

The central object in Optimization.jl is the OptimizationProblem, and PEtab.jl directly supports converting a PEtabODEProblem into an OptimizationProblem:

PEtab.OptimizationProblemFunction
OptimizationProblem(prob::PEtabODEProblem; box_constraints::Bool = true)

Create an Optimization.jl OptimizationProblem from prob.

To use algorithms not compatible with box constraints (e.g., Optim.jl NewtonTrustRegion), set box_constraints = false. Note that with this option, optimizers may move outside the bounds, which can negatively impact performance. More information on how to use an OptimizationProblem can be found in the Optimization.jl documentation.

source

For our working example, we can create an OptimizationProblem with:

using Optimization
opt_prob = OptimizationProblem(petab_prob)
OptimizationProblem. In-place: true
u0: ComponentVector{Float64}(log10_c1 = 1.0, log10_c2 = 2.6989704386302837, log10_S0 = 2.6989704386302837, log10_sigma = 2.6989704386302837)

Given a start-guess x0, we can then estimate the parameters using, for example, Optim.jl's ParticleSwarm() method, with:

using OptimizationOptimJL
opt_prob.u0 .= x0
res = solve(opt_prob, Optim.ParticleSwarm())
retcode: Failure
u: ComponentVector{Float64}(log10_c1 = 1.9645946199603268, log10_c2 = 3.0, log10_S0 = 1.9998090306762408, log10_sigma = 0.06291465794443092)

which returns an OptimizationSolution. For more information on options and how to interact with OptimizationSolution, see the Optimization.jl documentation.

References

[2]
A. Raue, M. Schilling, J. Bachmann, A. Matteson, M. Schelke, D. Kaschek, S. Hug, C. Kreutz, B. D. Harms, F. J. Theis and others. Lessons learned from quantitative dynamical modeling in systems biology. PloS one 8, e74335 (2013).
[3]
H. Hass, C. Loos, E. Raimundez-Alvarez, J. Timmer, J. Hasenauer and C. Kreutz. Benchmark problems for dynamic modeling of intracellular processes. Bioinformatics 35, 3073–3082 (2019).
[9]
A. F. Villaverde, D. Pathirana, F. Fröhlich, J. Hasenauer and J. R. Banga. A protocol for dynamic model calibration. Briefings in bioinformatics 23, bbab387 (2022).