Gradient and Hessian Methods

PEtab.jl supports several gradient and Hessian computation methods when creating a PEtabODEProblem. This section provides a brief overview of each available method and the corresponding tunable options.

Gradient Methods

PEtab.jl supports three gradient computation methods: forward-mode automatic differentiation (:ForwardDiff), forward-sensitivity equations (:ForwardEquations), and adjoint sensitivity analysis (:Adjoint). A good introduction to the math behind these methods can be found in [16], and a good introduction to automatic differentitation can be found in [17]. Below is a brief description of each method.

  • :ForwardDiff: This method uses ForwardDiff.jl to compute the gradient via forward-mode automatic differentiation [18]. The only tunable option is the chunksize (the number of directional derivatives computed in a single forward pass). While the default chunksize is typically a good choice, performance can be slightly improved by tuning this parameter, and we plan to add automatic tuning. This method is often the fastest for smaller models [19].
  • :ForwardEquations: This method computes the gradient by solving an expanded ODE system to obtain the forward sensitivities during the forward pass. These sensitivities are then used to compute the gradient. The tunable option is sensealg, where the default option sensealg=:ForwardDiff (compute the sensitivities via forward-mode automatic differentiation) is often the fastest. PEtab.jl also supports the ForwardSensitivity() and ForwardDiffSensitivity() methods from SciMLSensitivity.jl. For more details and tunable options for these two methods, see the SciMLSensitivity documentation.
  • :Adjoint: This method computes the gradient via adjoint sensitivity analysis, which involves solving an adjoint ODE backward in time. Several benchmark studies have shown that the adjoint method is the most efficient for larger models [20, 21]. The tunable option is sensealg, which specifies the adjoint algorithm from SciMLSensitivity to use. Available algorithms are InterpolatingAdjoint, GaussAdjoint, and QuadratureAdjoint. For information on their tunable options, see the SciMLSensitivity documentation.
Note

To use functionality from SciMLSensitivity (e.g., adjoint sensitivity analysis), the package must be loaded with using SciMLSensitivity before creating the PEtabODEProblem.

Hessian Methods

PEtab.jl supports three Hessian computation methods: forward-mode automatic differentiation (:ForwardDiff), a block Hessian approximation (:BlockForwardDiff), and the Gauss-Newton Hessian approximation (:GaussNewton). Below is a brief description of each method.

:ForwardDiff: Thsi method computes the Hessian via forward-mode automatic differentiation using ForwardDiff.jl. As with the gradient, the only tunable option is the chunksize. This method has quadratic complexity, $O(n^2)$, where $n$ is the number of parameters, making it feasible only for models with up to $n = 20$ parameters. However, when computationally feasible, access to the full Hessian can improve the convergence of parameter estimation runs when doing multi-start parameter estimation.

  • :BlockForwardDiff: This method computes a block Hessian approximation using forward-mode automatic differentiation with ForwardDiff.jl. Specifically, for PEtab problems, there are typically two sets of parameters to estimate: the parameters that are part of the ODE system, $\\mathbf{x}_p$, and those that are not, $\\mathbf{x}_q$. This block approach computes the Hessian for each block while approximating the cross-terms as zero:

\[\mathbf{H}_{block} = \begin{bmatrix} \mathbf{H}_{p} & \mathbf{0} \\ \mathbf{0} & \mathbf{H}_q \end{bmatrix}\]

  • :GaussNewton: This method approximates the Hessian using the Gauss-Newton method. This method often performs better than a (L)-BFGS approximation [20], but requires access to forward sensitivities (similar to :ForwardEquations above), and computing these for models with more than 75 parameters is often not computationally feasible. For more details on the computations see [22]. Therefore, for larger models, a (L)-BFGS approximation is often the only feasible option.

References

[16]
F. Sapienza, J. Bolibar, F. Schäfer, B. Groenke, A. Pal, V. Boussange, P. Heimbach, G. Hooker, F. Pérez, P.-O. Persson and others. Differentiable Programming for Differential Equations: A Review, arXiv preprint arXiv:2406.09699 (2024).
[17]
M. Blondel and V. Roulet. The elements of differentiable programming, arXiv preprint arXiv:2403.14606 (2024).
[18]
J. Revels, M. Lubin and T. Papamarkou. Forward-mode automatic differentiation in Julia, arXiv preprint arXiv:1607.07892 (2016).
[19]
R. Mester, A. Landeros, C. Rackauckas and K. Lange. Differential methods for assessing sensitivity in biological models. PLoS computational biology 18, e1009598 (2022).
[20]
F. Fröhlich, B. Kaltenbacher, F. J. Theis and J. Hasenauer. Scalable parameter estimation for genome-scale biochemical reaction networks. PLoS computational biology 13, e1005331 (2017).
[21]
Y. Ma, V. Dixit, M. J. Innes, X. Guo and C. Rackauckas. A comparison of automatic differentiation and continuous sensitivity analysis for derivatives of differential equation solutions. In: 2021 IEEE High Performance Extreme Computing Conference (HPEC) (IEEE, 2021); pp. 1–9.
[22]
A. Raue, B. Steiert, M. Schelker, C. Kreutz, T. Maiwald, H. Hass, J. Vanlier, C. Tönsing, L. Adlung, R. Engesser and others. Data2Dynamics: a modeling environment tailored to parameter estimation in dynamical systems. Bioinformatics 31, 3558–3560 (2015).