Plots evaluating parameter estimation
After fitting a model's parameter to data, it is prudent to evaluate the results. The simplest approach is to simulate the model using the optimally fitting parameter set, visually inspecting its goodness of fit. However, there exist several techniques attempting to deduce whenever:
- There exist unfound parameter sets that yield better fits than what was found (suggesting a local minimum was reached).
- There exist additional parameter sets yielding equally good fits to what was found (suggesting an identifiability problem).
This section will demonstrate various plots implemented in PEtab that can be used to perform this kind of analysis. These plots can be generated by calling plot
on the output of calibrate_model
(a PEtabOptimisationResult
structure) or calibrate_model_multistart
(a PEtabMultistartOptimisationResult
structure). In addition, an optional argument (plot_type
) allows the selection of plot type.
First, we load a multi start optimisation result:
using PEtab
using Plots
petab_ms_res = PEtabMultistartOptimisationResult(joinpath(@__DIR__, "assets", "optimisation_results", "boehm"))
next, we will, throughout the following sections, demonstrate the available types of plots using petab_ms_res
as input.
Objective function evaluations
The objective function evaluations plot is accessed through the plot_type=objective
argument. It is valid both for single start and multi start optimisation results. It plots, for each iteration of the optimisation process, the achieved objective value.
For single start optimisation results, a single trajectory of dots is plotted. For a multi start optimisation result, each run correspond to a separate trajectory of dots.
plot(petab_ms_res; plot_type=:objective)
Here, the runs are separated my different colours. First, a clustering process is carried out, attempting to identify runs converging to the same local minimum (the clustering, including how to customise it, is described here). Next, runs from the same cluster is assigned the same colour. A similar scheme is used for all plots involving multi start runs. When a large number of runs are carried out, it is also possible to select which one to include in the plot (by default the 10
best ones are used, however, this can be customised).
Sometimes, the objective function is unable to successfully simulate the model for a specific parameter set (this indicates a very poor fit). If such evaluations happen, they are marked with crosses (rather than circles) in these plots.
Best objective function evaluations
The best objective function evaluations plot is accessed through the plot_type=best_objective
argument. It is valid both for single start and multi start optimisation results. This function is very similar to the objective function evaluation one, however, it instead plots the best value reached so far in the process (and is thus a decreasing function.) The best objective function evaluations plot is the default plot type for single start optimisation results.
plot(petab_ms_res; plot_type=:best_objective)
Waterfall plots
The waterfall plot is accessed through the plot_type=waterfall
argument. It is only valid for multi start optimisation results. It plots all runs' final objective values, sorted from best to worst (local minimums can typically be identified as plateaus in the plot). The waterfall plot is the default plot type for multi start optimisation results.
plot(petab_ms_res; plot_type=:waterfall)
Parallel coordinates plots
The parallel coordinates plot is accessed through the plot_type=parallel_coordinates
argument. It is only valid for multi start optimisation results. In it, each run corresponds to a line, drawn along its parameter values in vector of fitted parameters (starting with the first, and ending in the last, parameter). The parameter values are normalised (so that 0
corresponds to the minimum value encountered for that parameter, and 1
the maximum values). If runs in the same cluster typically share similar paths across the parameter values, this suggests that they have converged to the same local minimum. If, for one parameters, runs in the same cluster have widely different values, it suggests that that parameter is unidentifiable.
plot(petab_ms_res; plot_type=:parallel_coordinates)
Runtime evaluation plots
The runtime evaluation plot is accessed through the plot_type=runtime_eval
argument. It is only valid for multi start optimisation results. It is a scatter plot, showing for each run, how its final objective value depends on the runtime.
plot(petab_ms_res; plot_type=:runtime_eval)
Multi start run clustering
When using the calibrate_model_multistart
function to fit a parameter set, several independent runs are performed. In all the previous plots, a clustering function is applied to identify runs that likely have converged to the same local minimum. Runs in the same cluster are given the same colour in the plots, allowing the cluster to be identified. By default, the objective_value_clustering
function is used for this (roughly, it clusters runs together if their objective function evaluates to values within 0.1
of each other). It is possible for the user to define their own clustering function, and supply it to the plot
command through the clustering_function
argument. The clustering function should take a Vector{PEtabOptimisationResult}
input, and return an identical sized Vector{Int64}
(for each index giving an integer corresponding to that run's cluster).
Sub-selecting runs to plot
When plotting a multi start output using the :objective
, :best_objective
, or :parallel_coordinates
plot types, if the number of runs are large, it can sometimes be hard to distinguish information from the plot. Hence, for these plot types, only the 10
runs with the best final objective values are plotted. This can be modified through the best_idxs_n
optional argument. This is an Int64
, what number of runs to add to the plot (starting with the best one). Alternatively, the idxs
optional argument can be used to give the indexes of the runs to plot.
For the :waterfall
, :runtime_eval
plot types, by default all runs are plotted. However, if desired, the best_idxs_n
and idxs
arguments can be used for these plot types as well.
Plots comparing the fitted model to the measurements
After the model has been fitted, it can be useful to compare it to the measurements. This is possible by supplying both the optimisation solution, and the PEtabModel
used, to the plot command. By default, it will plot, for the first simulation condition, the output solution for all observables. However, any subset of observables can be selected (using the observable_ids
option). It is also possible to select the condition using the condition_id
option.
Here, we first fit a simple model (with two observables and two conditions) to simulated data. Next, we will show various ways to plot the fitted solution.
# Prepare model
using Catalyst
rn = @reaction_network begin
kB, S + E --> SE
kD, SE --> S + E
kP, SE --> P + E
end
u0 = [:E => 1.0, :SE => 0.0, :P => 0.0]
p_true = [:kB => 1.0, :kD => 0.1, :kP => 0.5]
# Simulate data.
using OrdinaryDiffEq
# Condition 1.
oprob_true_c1 = ODEProblem(rn, [:S => 1.0; u0], (0.0, 10.0), p_true)
true_sol_c1 = solve(oprob_true_c1, Tsit5())
data_sol_c1 = solve(oprob_true_c1, Tsit5(); saveat=1.0)
c1_t, c1_E, c1_P = data_sol_c1.t[2:end], (0.8 .+ 0.4*rand(10)) .* data_sol_c1[:E][2:end], (0.8 .+ 0.4*rand(10)) .* data_sol_c1[:P][2:end]
# Condition 2.
oprob_true_c2 = ODEProblem(rn, [:S => 0.5; u0], (0.0, 10.0), p_true)
true_sol_c2 = solve(oprob_true_c2, Tsit5())
data_sol_c2 = solve(oprob_true_c2, Tsit5(); saveat=1.0)
c2_t, c2_E, c2_P = data_sol_c2.t[2:end], (0.8 .+ 0.4*rand(10)) .* data_sol_c2[:E][2:end], (0.8 .+ 0.4*rand(10)) .* data_sol_c2[:P][2:end]
# Make PETab problem.
using PEtab
@unpack E,P = rn
obs_E = PEtabObservable(E, 0.5)
obs_P = PEtabObservable(P, 0.5)
observables = Dict("obs_E" => obs_E, "obs_P" => obs_P)
par_kB = PEtabParameter(:kB)
par_kD = PEtabParameter(:kD)
par_kP = PEtabParameter(:kP)
params = [par_kB, par_kD, par_kP]
c1 = Dict(:S => 1.0)
c2 = Dict(:S => 0.5)
simulation_conditions = Dict("c1" => c1, "c2" => c2)
using DataFrames
m_c1_E = DataFrame(simulation_id="c1", obs_id="obs_E", time=c1_t, measurement=c1_E)
m_c1_P = DataFrame(simulation_id="c1", obs_id="obs_P", time=c1_t, measurement=c1_P)
m_c2_E = DataFrame(simulation_id="c2", obs_id="obs_E", time=c2_t, measurement=c2_E)
m_c2_P = DataFrame(simulation_id="c2", obs_id="obs_P", time=c2_t, measurement=c2_P)
measurements = vcat(m_c1_E, m_c1_P, m_c2_E, m_c2_P)
petab_model = PEtabModel(rn, simulation_conditions , observables, measurements, params;
state_map=u0, verbose=false)
petab_problem = PEtabODEProblem(petab_model; verbose=false)
using Optim
res = calibrate_model_multistart(petab_problem, IPNewton(), 50, nothing)
Next we plot the fitted solution for $P$, for the first simulation condition ("c1"
`):
using Plots
plot(res, petab_problem; observable_ids=["obs_P"], condition_id="c1")
If we instead wish to, for the second simulation condition, plot both observables, we use the following command:
plot(res, petab_problem; observable_ids=["obs_E", "obs_P"], condition_id="c2")
(in this example, the observable_ids
option is technically not required, as plotting all observables is the default behaviour)
Finally, it is possible to retrieve a dictionary containing plots for all combinations of observables and conditions using:
comp_dict = get_obs_comparison_plots(res, petab_problem)
Here comp_dict
contain one entry for each condition (with keys corresponding to their condition ids). These are all dictionaries, which in turn contain one entry for each observable (with keys corresponding to their observable ids). To retrieve the plot for $E$ and "c1"
` we use:
comp_dict["c1"]["obs_E"]