Parameter Estimation Methods
The main function of PEtab.jl is to create parameter estimation problems and provide runtime-efficient gradient and Hessian functions for estimating unknown model parameters using suitable numerical optimization algorithms. Specifically, the parameter estimation problems considered by PEtab.jl are on the form:
\[\min_{\mathbf{x} \in \mathbb{R}^N} -\ell(\mathbf{x}), \quad \text{subject to} \\ \mathbf{lb} \leq \mathbf{x} \leq \mathbf{ub}\]
Where, since PEtab.jl works with likelihoods (see the API documentation on PEtabObservable
), $-\ell(\mathbf{x})$ is a negative log-likelihood, and $\mathbf{lb}$ and $\mathbf{ub}$ are the lower and upper parameter bounds. For a good introduction to parameter estimation for ODE models in biology (which is applicable to other fields as well), see [9].
This advanced section of the documentation focuses on PEtab.jl's parameter estimation functionality, and before reading this part, we recommended the starting tutorial. Specifically, this section of the documentation covers available and recommended optimization algorithms, how to plot optimization results, and how to perform automatic model selection. First though, it covers how to perform parameter estimation. While the PEtabODEProblem
contains all the necessary information for wrapping a suitable optimizer to solve the problem (see here), manually wrapping optimizers is cumbersome. Therefore, PEtab.jl provides convenient wrappers for:
- Single-start parameter estimation
- Multi-start parameter estimation
- Creating an
OptimizationProblem
to access the solvers in Optimization.jl
As a working example, we use the Michaelis-Menten enzyme kinetics model from the starting tutorial. Even though the code below presents the model as a ReactionSystem
, everything works the same if the model is provided as an ODESystem
.
using Catalyst, PEtab
# Create the dynamic model
t = default_t()
rn = @reaction_network begin
@parameters S0 c3=1.0
@species S(t)=S0
c1, S + E --> SE
c2, SE --> S + E
c3, SE --> P + E
end
speciemap = [:E => 50.0, :SE => 0.0, :P => 0.0]
# Observables
@unpack E, S = rn
obs_sum = PEtabObservable(S + E, 3.0)
@unpack P = rn
@parameters sigma
obs_p = PEtabObservable(P, sigma)
observables = Dict("obs_p" => obs_p, "obs_sum" => obs_sum)
# Parameters to estimate
p_c1 = PEtabParameter(:c1)
p_c2 = PEtabParameter(:c2)
p_s0 = PEtabParameter(:S0)
p_sigma = PEtabParameter(:sigma)
pest = [p_c1, p_c2, p_s0, p_sigma]
# Simulate measurement data with 'true' parameters
using OrdinaryDiffEq, DataFrames
ps = [:c1 => 1.0, :c2 => 10.0, :c3 => 1.0, :S0 => 100.0]
u0 = [:S => 100.0, :E => 50.0, :SE => 0.0, :P => 0.0]
tspan = (0.0, 10.0)
oprob = ODEProblem(rn, u0, tspan, ps)
sol = solve(oprob, Rodas5P(); saveat = 0:0.5:10.0)
obs_sum = (sol[:S] + sol[:E]) .+ randn(length(sol[:E]))
obs_p = sol[:P] + .+ randn(length(sol[:P]))
df_sum = DataFrame(obs_id = "obs_sum", time = sol.t, measurement = obs_sum)
df_p = DataFrame(obs_id = "obs_p", time = sol.t, measurement = obs_p)
measurements = vcat(df_sum, df_p)
model = PEtabModel(rn, observables, measurements, pest; speciemap = speciemap)
petab_prob = PEtabODEProblem(model)
Single-Start Parameter Estimation
Single-start parameter estimation is an approach where a numerical optimization algorithm is run from a starting point x0
until it hopefully reaches a local minimum. When performing parameter estimation, the objective function generated by a PEtabODEProblem
expects the parameters to be in a specific order. The most straightforward way to obtain a correctly ordered vector is via the get_x
function:
PEtab.get_x
— Functionget_x(prob::PEtabODEProblem; linear_scale = false)::ComponentArray
Get the nominal parameter vector with parameters in the correct order expected by prob
for parameter estimation/inference. Nominal values can optionally be specified when creating a PEtabParameter
, or in the parameters table if the problem is provided in the PEtab standard format.
For ease of interaction (e.g., changing values), the parameter vector is returned as a ComponentArray
. For how to interact with a ComponentArray
, see the documentation and the ComponentArrays.jl documentation.
See also PEtabParameter
.
Keyword argument
linear_scale
: Whether to return parameters on the linear scale. By default, parameters are returned on the scale they are estimated, which by default islog10
as this often improves parameter estimation performance.
For our working example, we have:
x0 = get_x(petab_prob)
ComponentVector{Float64}(log10_c1 = 2.6989704386302837, log10_c2 = 2.6989704386302837, log10_S0 = 2.6989704386302837, log10_sigma = 2.6989704386302837)
As discussed in the starting tutorial, x0
is a ComponentArray
, meaning it holds both parameter values and names. Additionally, parameters like c1
have a log10
prefix, as the parameter (by default) is estimated on the log10
scale, which typically improves performance [2, 3]. Interacting with a ComponentArray
is straightforward, for example, to change c1
to 10.0
do:
x0.log10_c1 = log10(10.0)
or alteratively:
x0[:log10_c1] = log10(10.0)
When setting values in the starting point vector x0
, the new value should be provided on the parameter's scale, which is log10
by default.
Given a starting point, parameter estimation can be performed with the calibrate
function:
PEtab.calibrate
— Functioncalibrate(prob::PEtabODEProblem, x0, alg; kwargs...)::PEtabOptimisationResult
From starting point x0
using optimization algorithm alg
, estimate unknown model parameters for prob
, and get results as a PEtabOptimisationResult
.
x0
can be a Vector
or a ComponentArray
, where the individual parameters must be in the order expected by prob
. To get a vector in the correct order, see get_x
.
A list of available and recommended optimization algorithms (alg
) can be found in the documentation. Briefly, supported algorithms are from:
- Optim.jl:
LBFGS()
,BFGS()
, orIPNewton()
methods. - Ipopt.jl:
IpoptOptimizer()
interior-point Newton method. - Fides.py:
Fides()
Newton trust region method.
Different ways to visualize the parameter estimation result can be found in the documentation.
See also PEtabOptimisationResult
, Fides
and IpoptOptimizer
Keyword Arguments
save_trace::Bool = false
: Whether to save the optimization trace of the objective function and parameter vector. Only applicable for some algorithms; see the documentation for details.options = DEFAULT_OPTIONS
: Configurable options foralg
. The type and available options depend on which packagealg
belongs to. For example, ifalg = IPNewton()
from Optim.jl,options
should be provided as anOptim.Options()
struct. A list of configurable options can be found in the documentation.
For information and recommendations on algorithms (alg
), see this page. For our working example, following the recommendations, we use Optim.jl's Interior-Point Newton method (IPNewton
):
using Optim
res = calibrate(petab_prob, x0, IPNewton())
PEtabOptimisationResult
---------------- Summary ---------------
min(f) = 7.58e+01
Parameters estimated = 4
Optimiser iterations = 24
Runtime = 4.1e-01s
Optimiser algorithm = Optim_IPNewton
The result from calibrate
is returned as a PEtabOptimisationResult
which holds the relevant statistics from the optimization:
PEtab.PEtabOptimisationResult
— TypePEtabOptimisationResult
Parameter estimation statistics from single-start optimization with calibrate
.
See also: calibrate
Fields
xmin
: Minimizing parameter vector found by the optimization.fmin
: Minimum objective value found by the optimization.x0
: Starting parameter vector.alg
: Parameter estimation algorithm used.niterations
: Number of iterations for the optimization.runtime
: Runtime in seconds for the optimization.xtrace
: Parameter vector optimization trace. Empty ifsave_trace = false
was provided tocalibrate
.ftrace
: Objective function optimization trace. Empty ifsave_trace = false
was provided tocalibrate
.converged
: Convergence flag fromalg
.original
: Original result struct returned byalg
. For example, ifalg = IPNewton()
from Optim.jl,original
is the Optim return struct.
The result from calibrate
can also be plotted. For example, to see how well the model fits the data, the fit can be plotted as:
using Plots
plot(res, petab_prob)
Information on other available plots can be found on this page. Now, even though the plot above may look good, it is important to remember that the objective function ($-\ell$ above) often has multiple local minima. To ensure the global optimum is found, a global optimization approach is needed. One effective global method is multi-start parameter estimation.
Multi-Start Parameter Estimation
Multi-start parameter estimation is an approach where n
parameter estimation runs are initiated from n
random starting points. The rationale is that a subset of these runs should, hopefully, converge to the global optimum. While simple, empirical benchmark studies have shown that this method performs well for ODE models in biology [2, 2].
The first step for multi-start parameter estimation is to generate n
starting points. While random uniform sampling may initially seem like a good approach, random points tend to cluster. Instead, it's better to use a Quasi-Monte Carlo method, such as Latin hypercube sampling, to generate more spread-out starting points. This approach has been shown to improve performance [2]. The difference can quite clearly be seen generating 100 random points and 50 Latin hypercube-sampled points on the plane.
using Distributions, QuasiMonteCarlo, Plots
s1 = QuasiMonteCarlo.sample(100, [-1.0, -1.0], [1.0, 1.0], Uniform())
s2 = QuasiMonteCarlo.sample(100, [-1.0, -1.0], [1.0, 1.0], LatinHypercubeSample())
p1 = plot(s1[1, :], s1[2, :], title = "Uniform sampling", seriestype=:scatter)
p2 = plot(s2[1, :], s2[2, :], title = "Latin Hypercube Sampling", seriestype=:scatter)
plot(p1, p2)
For a PEtabODEProblem
, Latin hypercube sampled points within the parameter bounds can be generated with the get_startguesses
function:
PEtab.get_startguesses
— Functionget_startguesses(prob::PEtabODEProblem, n::Integer; kwargs...)
Generate n
random parameter vectors within the parameter bounds in prob
.
If n = 1
, a single random vector is returned. For n > 1
, a vector of random parameter vectors is returned. In both cases, parameter vectors are returned as a ComponentArray
. For details on how to interact with a ComponentArray
, see the documentation and the ComponentArrays.jl documentation.
See also calibrate
and calibrate_multistart
.
Keyword Arguments
sampling_method = LatinHypercubeSample()
: Method for sampling a diverse (spread out) set of parameter vectors. Any algorithm from QuasiMonteCarlo is allowed, but the defaultLatinHypercubeSample
is recommended as it usually performs well.sample_prior::Bool = true
: Whether to sample random parameter values from the prior distribution if a parameter has a prior.allow_inf::Bool = false
: Whether to return parameter vectors for which the likelihood cannot be computed (typically happens because theODEProblem
cannot be solved). Often it only makes sense to use starting points with a computable likelihood for parameter estimation, hence it typically does not make sense to change this option.
For our working example, we can generate 50 starting guesses with:
x0s = get_startguesses(petab_prob, 50)
In principle, x0s
can now be used together with calibrate
to perform multi-start parameter estimation. But, to further simplify this process, PEtab.jl provides a convenient function, calibrate_multistart
, which combines start-guess generation and parameter estimation in one step:
PEtab.calibrate_multistart
— Functioncalibrate_multistart(prob::PEtabODEProblem, alg, nmultistarts::Integer; nprocs = 1,
dirsave=nothing, kwargs...)::PEtabMultistartResult
Perform nmultistarts
parameter estimation runs from randomly sampled starting points using the optimization algorithm alg
to estimate the unknown model parameters in prob
.
A list of available and recommended optimisation algorithms (alg
) can be found in the package documentation and in the calibrate
documentation.
If nprocs > 1
, the parameter estimation runs are performed in parallel using the pmap
function from Distributed.jl with nprocs
processes. If parameter estimation on a single process (nprocs = 1
) takes longer than 5 minutes, we strongly recommend setting nprocs > 1
, as this can greatly reduce runtime. Note that nprocs
should not be larger than the number of cores on the computer.
If dirsave
is provided, intermediate results for each run are saved in dirsave
. It is strongly recommended to provide dirsave
for larger models, as parameter estimation can take hours (or even days!),and without dirsave
, all intermediate results will be lost if something goes wrong.
Different ways to visualize the parameter estimation result can be found in the documentation.
See also PEtabMultistartResult
, get_startguesses
, and calibrate
.
Keyword Arguments
sampling_method = LatinHypercubeSample()
: Method for sampling a diverse (spread out) set of starting points. See the documentation forget_startguesses
, which is the function used for sampling.sample_prior::Bool = true
: See the documentation forget_startguesses
.seed = nothing
: Seed used when generating starting points.options = DEFAULT_OPTIONS
: Configurable options foralg
. See the documentation forcalibrate
.
Two important keyword arguments for calibrate_multistart
are dirsave
and nprocs
. If nprocs > 1
, the parameter estimation runs are performed in parallel using the pmap
function from Distributed.jl with nprocs
processes. Even though pmap
introduces some overhead because it must load and compile the code on each process, setting nprocs > 1
often reduces runtime when the parameter estimation is expected to take longer than 5 minutes. Meanwhile, dirsave
specifies an optional directory to continuously save the results from each individual run. We strongly recommend providing such a directory, as parameter estimation for larger models can take hours or even days. If something goes wrong with the computer during that time, it is, to put it mildly, frustrating to lose all the results. For our working example, we can perform 50 multistarts in parallel on two processes with:
ms_res = calibrate_multistart(petab_prob, IPNewton(), 50; nprocs = 2,
dirsave="path_to_save_directory")
PEtabMultistartResult
---------------- Summary ---------------
min(f) = 7.58e+01
Parameters estimated = 4
Number of multistarts = 50
Optimiser algorithm = Optim_IPNewton
Results saved at path_to_save_directory
The results are returned as a PEtabMultistartResult
, which, in addition to printout statistics, contains relevant information for each run:
PEtab.PEtabMultistartResult
— TypePEtabMultistartResult
Parameter estimation statistics from multi-start optimization with calibrate_multistart
.
See also calibrate_multistart
and PEtabOptimisationResult
.
Fields
xmin
: Best minimizer across all runs.fmin
: Best minimum across all runs.alg
: Parameter estimation algorithm.nmultistarts
: Number of parameter estimation runs.sampling_method
: Sampling method used for generating starting points.dirsave
: Path of directory where parameter estimation run statistics are saved ifdirsave
was provided tocalibrate_multistart
.runs
: Vector ofPEtabOptimisationResult
with the parameter estimation results for each run.
PEtabMultistartResult(dirres::String; which_run::String="1")
Import multistart parameter estimation results saved at dirres
.
Each time a new optimization run is performed, results are saved with unique numerical endings. Results from a specific run can be retrieved by specifying the numerical ending with which_run
.
Finally, a common approach to evaluate the result of multi-start parameter estimation is through plotting. One widely used evaluation plot is the waterfall plot, which shows the final objective values for each run:
plot(ms_res; plot_type=:waterfall)
In the waterfall plot, each plateau corresponds to different local optima (represented by different colors). Since many runs (dots) are found on the plateau with the smallest objective value, we can be confident that the global optimum has been found. In addition to waterfall plots, more plotting options can be found on this page.
Creating an OptimizationProblem
Optimization.jl is a Julia package that provides a unified interface for over 100 optimization algorithms (see their documentation for the complete list). While Optimization.jl is undoubtedly useful, it is currently undergoing heavy updates, so at the moment we do not recommend it as the default choice for parameter estimation.
The central object in Optimization.jl is the OptimizationProblem
, and PEtab.jl directly supports converting a PEtabODEProblem
into an OptimizationProblem
:
PEtab.OptimizationProblem
— FunctionOptimizationProblem(prob::PEtabODEProblem; box_constraints::Bool = true)
Create an Optimization.jl OptimizationProblem
from prob
.
To use algorithms not compatible with box constraints (e.g., Optim.jl NewtonTrustRegion
), set box_constraints = false
. Note that with this option, optimizers may move outside the bounds, which can negatively impact performance. More information on how to use an OptimizationProblem
can be found in the Optimization.jl documentation.
For our working example, we can create an OptimizationProblem
with:
using Optimization
opt_prob = OptimizationProblem(petab_prob)
OptimizationProblem. In-place: true
u0: ComponentVector{Float64}(log10_c1 = 1.0, log10_c2 = 2.6989704386302837, log10_S0 = 2.6989704386302837, log10_sigma = 2.6989704386302837)
Given a start-guess x0
, we can then estimate the parameters using, for example, Optim.jl's ParticleSwarm()
method, with:
using OptimizationOptimJL
opt_prob.u0 .= x0
res = solve(opt_prob, Optim.ParticleSwarm())
retcode: Failure
u: ComponentVector{Float64}(log10_c1 = 1.9645945990258122, log10_c2 = 3.0, log10_S0 = 1.9998090314009143, log10_sigma = 0.06291468275046222)
which returns an OptimizationSolution
. For more information on options and how to interact with OptimizationSolution
, see the Optimization.jl documentation.
References
- [2]
- A. Raue, M. Schilling, J. Bachmann, A. Matteson, M. Schelke, D. Kaschek, S. Hug, C. Kreutz, B. D. Harms, F. J. Theis and others. Lessons learned from quantitative dynamical modeling in systems biology. PloS one 8, e74335 (2013).
- [3]
- H. Hass, C. Loos, E. Raimundez-Alvarez, J. Timmer, J. Hasenauer and C. Kreutz. Benchmark problems for dynamic modeling of intracellular processes. Bioinformatics 35, 3073–3082 (2019).
- [9]
- A. F. Villaverde, D. Pathirana, F. Fröhlich, J. Hasenauer and J. R. Banga. A protocol for dynamic model calibration. Briefings in bioinformatics 23, bbab387 (2022).