Plots evaluating parameter estimation

After fitting a model's parameter to data, it is prudent to evaluate the results. The simplest approach is to simulate the model using the optimally fitting parameter set, visually inspecting its goodness of fit. However, there exist several techniques attempting to deduce whenever:

There exist unfound parameter sets that yield better fits than what was found (suggesting a local minimum was reached).
There exist additional parameter sets yielding equally good fits to what was found (suggesting an identifiability problem).

This section will demonstrate various plots implemented in PEtab that can be used to perform this kind of analysis. These plots can be generated by calling plot on the output of calibrate_model (a PEtabOptimisationResult structure) or calibrate_model_multistart (a PEtabMultistartOptimisationResult structure). In addition, an optional argument (plot_type) allows the selection of plot type.

First, we load a multi start optimisation result:

using PEtab
using Plots
petab_ms_res = PEtabMultistartOptimisationResult(joinpath(@__DIR__, "assets", "optimisation_results", "boehm"))

next, we will, throughout the following sections, demonstrate the available types of plots using petab_ms_res as input.

Objective function evaluations

The objective function evaluations plot is accessed through the plot_type=objective argument. It is valid both for single start and multi start optimisation results. It plots, for each iteration of the optimisation process, the achieved objective value.

For single start optimisation results, a single trajectory of dots is plotted. For a multi start optimisation result, each run correspond to a separate trajectory of dots.

plot(petab_ms_res; plot_type=:objective)

Here, the runs are separated my different colours. First, a clustering process is carried out, attempting to identify runs converging to the same local minimum (the clustering, including how to customise it, is described here). Next, runs from the same cluster is assigned the same colour. A similar scheme is used for all plots involving multi start runs. When a large number of runs are carried out, it is also possible to select which one to include in the plot (by default the 10 best ones are used, however, this can be customised).

Sometimes, the objective function is unable to successfully simulate the model for a specific parameter set (this indicates a very poor fit). If such evaluations happen, they are marked with crosses (rather than circles) in these plots.

Best objective function evaluations

The best objective function evaluations plot is accessed through the plot_type=best_objective argument. It is valid both for single start and multi start optimisation results. This function is very similar to the objective function evaluation one, however, it instead plots the best value reached so far in the process (and is thus a decreasing function.) The best objective function evaluations plot is the default plot type for single start optimisation results.

plot(petab_ms_res; plot_type=:best_objective)

Waterfall plots

The waterfall plot is accessed through the plot_type=waterfall argument. It is only valid for multi start optimisation results. It plots all runs' final objective values, sorted from best to worst (local minimums can typically be identified as plateaus in the plot). The waterfall plot is the default plot type for multi start optimisation results.

plot(petab_ms_res; plot_type=:waterfall)

Parallel coordinates plots

The parallel coordinates plot is accessed through the plot_type=parallel_coordinates argument. It is only valid for multi start optimisation results. In it, each run corresponds to a line, drawn along its parameter values in vector of fitted parameters (starting with the first, and ending in the last, parameter). The parameter values are normalised (so that 0 corresponds to the minimum value encountered for that parameter, and 1 the maximum values). If runs in the same cluster typically share similar paths across the parameter values, this suggests that they have converged to the same local minimum. If, for one parameters, runs in the same cluster have widely different values, it suggests that that parameter is unidentifiable.

plot(petab_ms_res; plot_type=:parallel_coordinates)

Runtime evaluation plots

The runtime evaluation plot is accessed through the plot_type=runtime_eval argument. It is only valid for multi start optimisation results. It is a scatter plot, showing for each run, how its final objective value depends on the runtime.

plot(petab_ms_res; plot_type=:runtime_eval)

Multi start run clustering

When using the calibrate_model_multistart function to fit a parameter set, several independent runs are performed. In all the previous plots, a clustering function is applied to identify runs that likely have converged to the same local minimum. Runs in the same cluster are given the same colour in the plots, allowing the cluster to be identified. By default, the objective_value_clustering function is used for this (roughly, it clusters runs together if their objective function evaluates to values within 0.1 of each other). It is possible for the user to define their own clustering function, and supply it to the plot command through the clustering_function argument. The clustering function should take a Vector{PEtabOptimisationResult} input, and return an identical sized Vector{Int64} (for each index giving an integer corresponding to that run's cluster).

Sub-selecting runs to plot

When plotting a multi start output using the :objective, :best_objective, or :parallel_coordinates plot types, if the number of runs are large, it can sometimes be hard to distinguish information from the plot. Hence, for these plot types, only the 10 runs with the best final objective values are plotted. This can be modified through the best_idxs_n optional argument. This is an Int64, what number of runs to add to the plot (starting with the best one). Alternatively, the idxs optional argument can be used to give the indexes of the runs to plot.

For the :waterfall, :runtime_eval plot types, by default all runs are plotted. However, if desired, the best_idxs_n and idxs arguments can be used for these plot types as well.

Plots comparing the fitted model to the measurements

After the model has been fitted, it can be useful to compare it to the measurements. This is possible by supplying both the optimisation solution, and the PEtabModel used, to the plot command. By default, it will plot, for the first simulation condition, the output solution for all observables. However, any subset of observables can be selected (using the observable_ids option). It is also possible to select the condition using the condition_id option.

Here, we first fit a simple model (with two observables and two conditions) to simulated data. Next, we will show various ways to plot the fitted solution.

# Prepare model
using Catalyst
rn = @reaction_network begin
    kB, S + E --> SE
    kD, SE --> S + E
    kP, SE --> P + E
end

u0 = [:E => 1.0, :SE => 0.0, :P => 0.0]
p_true = [:kB => 1.0, :kD => 0.1, :kP => 0.5]

# Simulate data.
using OrdinaryDiffEq
# Condition 1.
oprob_true_c1 = ODEProblem(rn,  [:S => 1.0; u0], (0.0, 10.0), p_true)
true_sol_c1 = solve(oprob_true_c1, Tsit5())
data_sol_c1 = solve(oprob_true_c1, Tsit5(); saveat=1.0)
c1_t, c1_E, c1_P = data_sol_c1.t[2:end], (0.8 .+ 0.4*rand(10)) .* data_sol_c1[:E][2:end], (0.8 .+ 0.4*rand(10)) .* data_sol_c1[:P][2:end]

# Condition 2.
oprob_true_c2 = ODEProblem(rn,  [:S => 0.5; u0], (0.0, 10.0), p_true)
true_sol_c2 = solve(oprob_true_c2, Tsit5())
data_sol_c2 = solve(oprob_true_c2, Tsit5(); saveat=1.0)
c2_t, c2_E, c2_P = data_sol_c2.t[2:end], (0.8 .+ 0.4*rand(10)) .* data_sol_c2[:E][2:end], (0.8 .+ 0.4*rand(10)) .* data_sol_c2[:P][2:end]

# Make PETab problem.
using PEtab
@unpack E,P = rn
obs_E = PEtabObservable(E, 0.5)
obs_P = PEtabObservable(P, 0.5)
observables = Dict("obs_E" => obs_E, "obs_P" => obs_P)

par_kB = PEtabParameter(:kB)
par_kD = PEtabParameter(:kD)
par_kP = PEtabParameter(:kP)
params = [par_kB, par_kD, par_kP]

c1 = Dict(:S => 1.0)
c2 = Dict(:S => 0.5)
simulation_conditions = Dict("c1" => c1, "c2" => c2)

using DataFrames
m_c1_E = DataFrame(simulation_id="c1", obs_id="obs_E", time=c1_t, measurement=c1_E)
m_c1_P = DataFrame(simulation_id="c1", obs_id="obs_P", time=c1_t, measurement=c1_P)
m_c2_E = DataFrame(simulation_id="c2", obs_id="obs_E", time=c2_t, measurement=c2_E)
m_c2_P = DataFrame(simulation_id="c2", obs_id="obs_P", time=c2_t, measurement=c2_P)
measurements = vcat(m_c1_E, m_c1_P, m_c2_E, m_c2_P)

petab_model = PEtabModel(rn, simulation_conditions , observables, measurements, params;
                         state_map=u0, verbose=false)
petab_problem = PEtabODEProblem(petab_model; verbose=false)

using Optim
res = calibrate_model_multistart(petab_problem, IPNewton(), 50, nothing)

Next we plot the fitted solution for $P$, for the first simulation condition ("c1"`):

using Plots
plot(res, petab_problem; observable_ids=["obs_P"], condition_id="c1")

If we instead wish to, for the second simulation condition, plot both observables, we use the following command:

plot(res, petab_problem; observable_ids=["obs_E", "obs_P"], condition_id="c2")

(in this example, the observable_ids option is technically not required, as plotting all observables is the default behaviour)

Finally, it is possible to retrieve a dictionary containing plots for all combinations of observables and conditions using:

comp_dict = get_obs_comparison_plots(res, petab_problem)

Here comp_dict contain one entry for each condition (with keys corresponding to their condition ids). These are all dictionaries, which in turn contain one entry for each observable (with keys corresponding to their observable ids). To retrieve the plot for $E$ and "c1"` we use:

comp_dict["c1"]["obs_E"]