Plots Evaluating Parameter Estimation
Following parameter estimation, it is prudent to evaluate the estimation results. The most straightforward approach is to simulate the model with the estimated parameters and inspect the fit. While informative, this kind of plot does not help determine whether:
- There are unfound parameter sets that yield better fits than those found (indicating that a local minimum was reached).
- There are additional parameter sets yielding equally good fits to those found (suggesting a parameter identifiability problem).
This page demonstrates various plots implemented in PEtab for evaluating parameter estimation results. These plots can be generated by calling plot
on the output of calibrate_model
(a PEtabOptimisationResult
structure) or calibrate_multistart
(a PEtabMultistartResult
structure), with the plot_type
argument allowing you to select the type of plot. There are two main types of plots that can be generated: (i) those that evaluate parameter estimation results (e.g., objective value, model parameter values), and (ii) those that evaluate and visualize how well the model fits the measured data. This tutorial covers both types.
Parameter Estimation Result Plots
As a working example, we use already pre-computed parameter estimation results for a published signaling model, which we load into a PEtabMultistartResult
struct (the files to load can be found here):
using PEtab, Plots
path_res = joinpath(@__DIR__, "assets", "optimization_results", "boehm")
ms_res = PEtabMultistartResult(path_res)
Objective Function Evaluations Plots
The objective function evaluation plot can be generated by setting plot_type=waterfall
, and is available for both single-start and multi-start optimization results. This plot shows the objective value for each iteration of the optimization process. For single-start optimization results, a single trajectory of dots is plotted. For multi-start optimization results, each run corresponds to a trajectory of dots.
plot(ms_res; plot_type=:objective)
If the objective function fails to successfully simulate the model for a particular parameter set (indicating a poor fit), the trajectory for said run is marked with crosses instead of circles.
In this and other plots for multi-start parameter estimation, different runs are separated by color. The color is assigned via a clustering process that identifies runs converging to the same local minimum (details and tunable options can be found here). Additionally, when many multi-starts are performed, plots like the one above can become too cluttered. Therefore, only the 10 best runs are shown by default, but this can be customized.
Best Objective Function Evaluations Plots
The best objective function evaluation plot can be generated by setting plot_type=waterfall
, and is available for both single-start and multi-start optimization results. This plot is similar to the objective function evaluation plot, but instead, it shows the best value reached so far during the process (and is therefore a decreasing function). This is the default plot type for single-start parameter estimation.
plot(ms_res; plot_type=:best_objective)
Waterfall Plots
The waterfall plot can be generated by setting plot_type=waterfall
, and is only available for multi-start optimization results. This plot-type shows final objective values of all runs, sorted from best to worst. Typically, local minima can be identified as plateaus in the plot, and in PEtab runs with similar final objective value are grouped by colors. This plot is also the default for multi-start optimization results.
plot(ms_res; plot_type=:waterfall)
We strongly recommend to include a waterfall plot when reporting results from a multistart parameter estimation run. For more information on interpreting a waterfall plot, see Fig. 5 in [2].
Parallel Coordinates Plots
The parallel coordinates plot can be generated by setting plot_type=parallel_coordinates
, and it is available for multi-start optimization results. This plot shows parameter values for the best parameter estimation runs, with each run represented by a line. The parameter values are normalized, where 0
corresponds to the minimum value encountered for that parameter and 1
to the maximum. If several runs share similar parameter values, the runs in this cluster likely converged to the same local minimum. Meanwhile, if runs in the same cluster show widely different values for a particular parameter, it indicates that the parameter is unidentifiable (i.e., multiple values of that parameter fit the data equally well).
plot(ms_res; plot_type=:parallel_coordinates)
Runtime Evaluation Plots
The runtime evaluation plot can be generated by setting the plot_type=runtime_eval
, and it is only available for multi-start optimization results. It is a scatter plot that shows the relationship between runtime and final objective value for each run:
plot(ms_res; plot_type=:runtime_eval)
Multi-Start Run Color Clustering
When using the calibrate_multistart
function, multiple parameter estimation runs are performed. When plotting results for multiple runs, a clustering function is applied by default to identify runs that have likely converged to the same local minimum, and these runs are assigned the same color. The default clustering method is the objective_value_clustering
function, which clusters runs if their objective function values are within 0.1
of each other. Users can define their own clustering function and supply it to the plot
command via the clustering_function
argument. The custom clustering function should take a Vector{PEtabOptimisationResult}
as input and return a Vector{Int64}
of the same size, where each index corresponds to the cluster assignment for that run.
Sub-selecting Runs to Plot
When plotting multi-start parameter estimation results with the :objective
, :best_objective
, or :parallel_coordinates
plot types, the output becomes hard to read if more than 10 runs are performed. Therefore, for these plot types, only the 10
runs with the best final objective values are plotted by default. This can be adjusted using the best_idxs_n
optional argument, an Int64
specifying how many runs to include in the plot (starting with the best one). Alternatively, the idxs
optional argument can be used to specify the indexes of the runs to plot.
For the :waterfall
and :runtime_eval
plot types, all runs are plotted by default. However, both the best_idxs_n
and idxs
arguments can be provided.
Plotting the Model Fit
After fitting the model, it is useful to compare the model output against the measurement data. This can be done by providing both the optimization solution and the PEtabODEProblem
to the plot command. By default, the plot will show the output solution for all observables for the first simulation condition. However, any subset of observables can be selected using the obsid
option, and any simulation condition can be specified using the cid
option.
This tutorial covers how to plot the model fit for different observables and simulation conditions. It assumes you are familiar with PEtab simulation conditions; if not, see this tutorial. As a working example, we use the Michaelis-Menten enzyme kinetics model from the starting tutorial, which we fit to for two simulation conditions (cond1
and cond2
) and two observables (obs_e
and obs_p
). Even though the code below encodes the model as a ReactionSystem, everything works exactly the same if the model is encoded as an ODESystem
.
using Catalyst
rn = @reaction_network begin
kB, S + E --> SE
kD, SE --> S + E
kP, SE --> P + E
end
u0 = [:E => 1.0, :SE => 0.0, :P => 0.0]
p_true = [:kB => 1.0, :kD => 0.1, :kP => 0.5]
# Simulate data.
using OrdinaryDiffEq
# cond1
oprob_true_cond1 = ODEProblem(rn, [:S => 1.0; u0], (0.0, 10.0), p_true)
true_sol_cond1 = solve(oprob_true_cond1, Rodas5P())
data_sol_cond1 = solve(oprob_true_cond1, Rodas5P(); saveat=1.0)
cond1_t, cond1_e, cond1_p = (data_sol_cond1.t[2:end], (0.8 .+ 0.1*randn(10)) .*
data_sol_cond1[:E][2:end], (0.8 .+ 0.1*randn(10)) .*
data_sol_cond1[:P][2:end])
# cond2
oprob_true_cond2 = ODEProblem(rn, [:S => 0.5; u0], (0.0, 10.0), p_true)
true_sol_cond2 = solve(oprob_true_cond2, Tsit5())
data_sol_cond2 = solve(oprob_true_cond2, Tsit5(); saveat=1.0)
cond2_t, cond2_e, cond2_p = (data_sol_cond2.t[2:end], (0.8 .+ 0.1*randn(10)) .*
data_sol_cond2[:E][2:end], (0.8 .+ 0.1*randn(10)) .*
data_sol_cond2[:P][2:end])
using PEtab
@unpack E, P = rn
obs_e = PEtabObservable(E, 0.5)
obs_p = PEtabObservable(P, 0.5)
observables = Dict("obs_e" => obs_e, "obs_p" => obs_p)
p_kB = PEtabParameter(:kB)
p_kD = PEtabParameter(:kD)
p_kP = PEtabParameter(:kP)
pest = [p_kB, p_kD, p_kP]
cond1 = Dict(:S => 1.0)
cond2 = Dict(:S => 0.5)
conds = Dict("cond1" => cond1, "cond2" => cond2)
using DataFrames
m_cond1_e = DataFrame(simulation_id="cond1", obs_id="obs_e", time=cond1_t,
measurement=cond1_e)
m_cond1_p = DataFrame(simulation_id="cond1", obs_id="obs_p", time=cond1_t,
measurement=cond1_p)
m_cond2_e = DataFrame(simulation_id="cond2", obs_id="obs_e", time=cond2_t,
measurement=cond2_e)
m_cond2_p = DataFrame(simulation_id="cond2", obs_id="obs_p", time=cond2_t,
measurement=cond2_p)
measurements = vcat(m_cond1_e, m_cond1_p, m_cond2_e, m_cond2_p)
model = PEtabModel(rn , observables, measurements, pest; simulation_conditions = conds,
speciemap=u0)
petab_prob = PEtabODEProblem(model)
using Optim
res = calibrate_multistart(petab_prob, IPNewton(), 50)
Following parameter estimation, we can plot the fitted solution for P
in the first simulation condition (cond1
) as:
using Plots
plot(res, petab_prob; obsids=["obs_p"], cid="cond1", linewidth = 2.0)
To instead wish to plot both observables for the second simulation condition (cond2
), do:
plot(res, petab_prob; obsids=["obs_e", "obs_p"], cid="cond2", linewidth = 2.0)
In this example, the obsid
option is technically not required, as plotting all observables is the default behavior. Furthermore, by default, the observable formula is shown in the legend or label. If the observable formula is long (e.g., the sum of all model species), this can make the plot unreadable. To address this, you can display only the observable ID in the label by setting obsid_label = true
:
plot(res, petab_prob; obsids=["obs_e", "obs_p"], cid="cond2", linewidth = 2.0, obsid_label = true)
If as above a parameter estimation result (res
) is provided, the fit for the best-found parameter vector is plotted. It can also be useful to plot the fit for another parameter vector, such as the initial values x0
. This can be easily done, as the plot_fit
function also works for any parameter vector that is in the correct order expected by PEtab.jl (for more on parameter order, see get_x
). For example, to plot the fit for the initial value for parameter estimation run 1, do:
x0 = res.runs[1].x0
plot(x0, petab_prob; obsids=["obs_e", "obs_p"], cid="cond2", linewidth = 2.0)
Finally, it is possible to retrieve a dictionary containing plots for all combinations of observables and simulation conditions with:
comp_dict = get_obs_comparison_plots(res, petab_prob; linewidth = 2.0)
Here, comp_dict
contains one entry for each condition (with keys corresponding to their condition IDs). Each entry is itself a dictionary that contains one entry for each observable (with keys corresponding to their observable IDs). To retrieve the plot for E
and cond1
do:
comp_dict["cond1"]["obs_e"]
The input to get_obs_comparison_plots
can also be a parameter vector.
References
- [2]
- A. Raue, M. Schilling, J. Bachmann, A. Matteson, M. Schelke, D. Kaschek, S. Hug, C. Kreutz, B. D. Harms, F. J. Theis and others. Lessons learned from quantitative dynamical modeling in systems biology. PloS one 8, e74335 (2013).