[{"content":"Statistical Consultant ","date":null,"permalink":"https://arnostrouwen.com/","section":"","summary":"\u003ch2 id=\"statistical-consultant\" class=\"relative group\"\u003eStatistical Consultant \u003c/h2\u003e","title":""},{"content":"Blog Archive ","date":null,"permalink":"https://arnostrouwen.com/posts/","section":"","summary":"\u003ch1 id=\"blog-archive\" class=\"relative group\"\u003eBlog Archive \u003c/h1\u003e","title":""},{"content":"I\u0026rsquo;m excited to announce that Sebastian Micluța-Câmpeanu and I have published \u0026ldquo;Experimental Design for Missing Physics\u0026rdquo; at DYCOPS 2025.\nWhen building models of real-world systems like bioreactors, we often don\u0026rsquo;t know the complete physics governing the process. This \u0026ldquo;missing physics\u0026rdquo; needs to be discovered from experimental data. Modern machine learning offers promising tools for this: neural networks can flexibly represent unknown dynamics, while symbolic regression can translate these black-box models into interpretable equations.\nHowever, these techniques are data-hungry—they need high-quality measurements to reliably uncover the true underlying physics. Random experiments won\u0026rsquo;t cut it.\nIn this paper, we tackle this challenge by developing a sequential experimental design technique. The idea is to intelligently choose which experiments to run next based on the candidate model structures that symbolic regression suggests. By designing experiments that best discriminate between competing hypotheses, we can efficiently guide the discovery process toward the correct physics. We demonstrate our approach on a bioreactor system, showing how smart experimentation accelerates scientific discovery.\n","date":"30 December 2025","permalink":"https://arnostrouwen.com/posts/experimental-design-for-missing-physics/","section":"","summary":"\u003cp\u003eI\u0026rsquo;m excited to announce that Sebastian Micluța-Câmpeanu and I have published \u003ca href=\"https://www.sciencedirect.com/science/article/pii/S240589632500552X\" target=\"_blank\" rel=\"noreferrer\"\u003e\u003cstrong\u003e\u0026ldquo;Experimental Design for Missing Physics\u0026rdquo;\u003c/strong\u003e\u003c/a\u003e at DYCOPS 2025.\u003c/p\u003e","title":"New Conference Paper: Experimental Design for Missing Physics"},{"content":"Announcing the registration of MixedModelsSmallSample.jl in the Julia General Registry!\nMixed effect regression models are statistical models that not only contain fixed effects (non-random quantities) but also random effects (random variables). Both of these effects must be estimated from data, and a popular method for this is restricted maximum likelihood (REML).\nThe Julia programming language already has a state-of-the-art package, MixedModels.jl, for estimating these effects using REML. Inference for the effects is, however, based on large sample approximations or on bootstrapping. When working with a small sample, the asymptotic approximation might not hold, and the resampling in the bootstrapping procedure might make the model inestimable.\nAlternative small sample inference approaches have been suggested by Kenward and Roger [1] and by Fai and Cornelius [2]. In these methods, the asymptotic results for confidence intervals and hypothesis tests are adjusted to account for finite sample sizes.\nMixedModelsSmallSample.jl implements these small sample adjustments for models estimated by MixedModels.jl.\nThese adjustment methods have already been incorporated in many statistical software programs, such as SAS [3], JMP [4], and lmerTest [5]. However, these packages differ in some statistical details. For example, SAS and JMP use the observed Fisher information matrix for variance components, while lmerTest uses the expected Fisher information matrix.\nMixedModelsSmallSample.jl provides user options to configure these technical details, such that results from SAS, JMP, and lmerTest can be exactly reproduced and easily compared.\nReferences [1]: Kenward, Michael G., and James H. Roger. “Small sample inference for fixed effects from restricted maximum likelihood.” Biometrics (1997): 983-997.\n[2]: Hrong-Tai Fai, Alex, and Paul L. Cornelius. “Approximate F-tests of multiple degree of freedom hypotheses in generalized least squares analyses of unbalanced split-plot experiments.” Journal of statistical computation and simulation 54.4 (1996): 363-378.\n[3]: SAS Institute Inc. 2015. SAS/STAT® 14.1 User’s Guide: The MIXED Procedure. Cary, NC: SAS Institute Inc\n[4]: JMP Statistical Discovery LLC 2024. JMP ® 18 Fitting Linear Models. Cary, NC: JMP Statistical Discovery LLC\n[5]: Kuznetsova, Alexandra, Per B. Brockhoff, and Rune HB Christensen. “lmerTest package: tests in linear mixed effects models.” Journal of statistical software 82 (2017): 1-26.\n","date":"18 July 2025","permalink":"https://arnostrouwen.com/posts/mmss/","section":"","summary":"\u003cp\u003eAnnouncing the registration of \u003ca href=\"https://github.com/ArnoStrouwen/MixedModelsSmallSample.jl\" target=\"_blank\" rel=\"noreferrer\"\u003eMixedModelsSmallSample.jl\u003c/a\u003e in the Julia General Registry!\u003c/p\u003e","title":"MixedModelsSmallSample.jl: Small Sample Inference for Mixed Models in Julia"},{"content":" Continuing from Bayesian experimental design and adaptive Bayesian experimental design, we can make precise why the nested adaptive criterion is equivalent to an optimal policy problem.\nConsider 3 sequential design choices \\(D_1,D_2,D_3\\) and outcomes \\(Y_1,Y_2,Y_3\\). Let $$ u(d_1,d_2,d_3,y_1,y_2,y_3) = -S(\\theta \\mid y_{1:3},d_{1:3}), $$ so maximizing \\(u\\) is the same as minimizing posterior entropy.\nThe adaptive nested objective can be written as $$ V =\\max_{d_1\\in D_1} \\mathbb E_{Y_1\\mid d_1}\\Big[ \\max_{d_2\\in D_2} \\mathbb E_{Y_2\\mid Y_1,d_1,d_2}\\Big[ \\max_{d_3\\in D_3} \\mathbb E_{Y_3\\mid Y_1,Y_2,d_1,d_2,d_3} \\big[u(d_1,d_2,d_3,Y_1,Y_2,Y_3)\\big] \\Big] \\Big]. $$\nThis is not a myopic rule. The stage-1 choice is optimized for expected utility over all future observation paths, while explicitly anticipating that stages 2 and 3 will adapt optimally to whatever data are observed. Equivalently, we are solving for a policy that maps observations to decisions.\nA deterministic non-anticipative policy is $$ \\pi=(\\pi_1,\\pi_2,\\pi_3) $$ with $$ \\pi_1\\in D_1,\\qquad \\pi_2:D_1\\times \\mathcal Y_1\\to D_2,\\qquad \\pi_3:D_1\\times \\mathcal Y_1\\times D_2\\times \\mathcal Y_2\\to D_3. $$\nGiven \\(\\pi\\), the policy value is $$ J(\\pi)= \\mathbb E\\Big[ u\\big( \\pi_1, \\pi_2(\\pi_1,Y_1), \\pi_3(\\pi_1,Y_1,\\pi_2(\\pi_1,Y_1),Y_2), Y_1,Y_2,Y_3 \\big) \\Big]. $$ Let \\(\\Pi\\) be the set of admissible policies. The key equivalence is $$ V = \\max_{\\pi\\in\\Pi} J(\\pi). $$\nThe proof has two parts.\nFirst, define the continuation values once: $$ G_3(d_1,y_1,d_2,y_2)= \\max_{a_3\\in D_3} \\mathbb E\\big[u(d_1,d_2,a_3,y_1,y_2,Y_3)\\mid d_1,y_1,d_2,y_2,a_3\\big], $$ $$ G_2(d_1,y_1)= \\max_{a_2\\in D_2} \\mathbb E\\big[G_3(d_1,y_1,a_2,Y_2)\\mid d_1,y_1,a_2\\big], $$ $$ G_1=\\max_{a_1\\in D_1}\\mathbb E\\big[G_2(a_1,Y_1)\\mid a_1\\big]. $$ By definition of the nested objective, \\(G_1=V\\).\nPart 1: no policy can score higher than V Fix any policy \\(\\pi\\). For each realized history \\(h_2=(d_1,y_1,d_2,y_2)\\), define the stage-3 value of choosing action \\(a_3\\) as $$ q_3(a_3;h_2)=\\mathbb E\\big[u(d_1,d_2,a_3,y_1,y_2,Y_3)\\mid h_2,a_3\\big]. $$ The policy chooses \\(a_3=\\pi_3(h_2)\\), so it gets \\(q_3(\\pi_3(h_2);h_2)\\). Since this is one feasible action in \\(D_3\\), $$ q_3(\\pi_3(h_2);h_2)\\le \\max_{a_3\\in D_3}q_3(a_3;h_2)=G_3(d_1,y_1,d_2,y_2). $$\nNow move one step earlier. For each \\((d_1,y_1)\\), define $$ q_2(a_2;d_1,y_1)=\\mathbb E\\big[G_3(d_1,y_1,a_2,Y_2)\\mid y_1,d_1,a_2\\big]. $$ Again, the policy uses one feasible action \\(a_2=\\pi_2(d_1,y_1)\\), so $$ q_2(\\pi_2(d_1,y_1);d_1,y_1)\\le \\max_{a_2\\in D_2}q_2(a_2;d_1,y_1)=G_2(d_1,y_1). $$\nAt stage 1, define $$ q_1(a_1)=\\mathbb E\\big[G_2(a_1,Y_1)\\mid a_1\\big]. $$ Because \\(\\pi_1\\) is one feasible action in \\(D_1\\), $$ q_1(\\pi_1)\\le \\max_{a_1\\in D_1}q_1(a_1)=G_1, $$ and from the definitions above, \\(G_1=V\\).\nPutting these stage-wise inequalities together gives \\(J(\\pi)\\le V\\). Since \\(\\pi\\) was arbitrary, $$ \\max_{\\pi\\in\\Pi}J(\\pi)\\le V. $$\nPart 2: there exists a policy with value exactly V Using those same continuation-value definitions, construct \\(\\pi^\\star\\) by choosing argmax actions at every history. Then \\(\\pi^\\star\\) makes exactly the same choices as the nested objective definition of \\(V\\). Evaluating \\(J(\\pi^\\star)\\) and applying iterated expectation from stage 3 back to stage 1 yields $$ J(\\pi^\\star)=G_1=V, $$ so $$ V\\le \\max_{\\pi\\in\\Pi}J(\\pi). $$\nCombining the two parts yields $$ V=\\max_{\\pi\\in\\Pi}J(\\pi). $$\nFor Bayesian experimental design this matters because it clarifies what is computed offline and online. The optimization is over decision rules, not over a single static sequence. Once an approximate optimal policy is available, online adaptation is cheap: new measurements are plugged into \\(\\pi_2\\) and \\(\\pi_3\\) without re-solving the full nested problem.\nIn practice, for continuous observations \\(Y\\), tabular dynamic programming is not available. A common approximation is to parameterize the policy with neural networks, for example \\(d_2=\\pi_{\\phi,2}(d_1,y_1)\\) and \\(d_3=\\pi_{\\phi,3}(d_1,y_1,d_2,y_2)\\), where \\(\\phi\\) are network weights. Conceptually, the exact problem is optimization in function space (choose the best measurable mapping from histories to actions). After parameterization, that infinite-dimensional search is replaced by finite-dimensional nonlinear programming over \\(\\phi\\). Then we simulate trajectories from the Bayesian model, estimate \\(J(\\pi_\\phi)\\) by Monte Carlo, and optimize \\(\\phi\\) with policy gradient methods.\n","date":"1 April 2024","permalink":"https://arnostrouwen.com/posts/policy-equivalence-in-adaptive-bayesian-experimental-design/","section":"","summary":"\u003cp\u003e\n\nContinuing from \u003ca href=\"https://arnostrouwen.com/posts/bayesian-experimental-design/\"\u003eBayesian experimental design\u003c/a\u003e and\n\u003ca href=\"https://arnostrouwen.com/posts/adaptive-bayesian-experimental-design/\"\u003eadaptive Bayesian experimental design\u003c/a\u003e,\nwe can make precise why the nested adaptive criterion is equivalent to an optimal policy problem.\u003c/p\u003e","title":"Notes on policy equivalence in adaptive Bayesian experimental design"},{"content":" Continuing from Bayesian experimental design, we now consider adaptivity.\nTypically, we choose designs \\(D_1\\), \\(D_2\\), and \\(D_3\\) sequentially to gather measurements. Let\u0026rsquo;s focus on 3 observations \\(y_1\\), \\(y_2\\) and \\(y_3\\). We could just optimize the entropy of the posterior of all 3 measurements once at the start of the experiment: $$ \\argmin_{D_{1:3}} \\iiint S\\left (\\theta | y_{1:3}, D_{1:3} \\right) p(y_{1:3}|D_{1:3}) dy_{1:3}. $$ This is called a static experiment. This technique does not use the opportunity to adapt the experimental design, after each measurement is gathered.\nA popular adaptive experimental design technique minimizes the expected entropy of the posterior given one additional measurement: $$ D_1^\\ast = \\argmin_{D_1} \\int S\\left (\\theta |y_1, D_1 \\right) p(y_1|D_1) dy_1, $$ $$ D_2^\\ast = \\argmin_{D_2} \\int S\\left (\\theta | y_1^\\ast, D_1^\\ast, y_2, D_2 \\right) p(y_2|y_1^\\ast, D_1^\\ast, D_2) dy_2, $$ Here \\(D_1^\\ast\\) is the optimized and actually performed experiment, and \\(y_1^\\ast\\) is the observed measurement value. $$ D_3^\\ast = \\argmin_{D_3} \\int S\\left (\\theta | y_{1:2}^\\ast, D_{1:2}^\\ast, y_3, D_3 \\right) p(y_3|y_{1:2}^\\ast, D_{1:2}^\\ast, D_3) dy_3 $$ Similarly, in the third experiment, the optimal values \\(D_{1:2}^\\ast\\) and observed \\(y_{1:2}^\\ast\\) are used.\nThis method thus requires us to keep track of the posterior after every new measurement is gathered. Performing the outer integration is also not straightforward. To perform Monte Carlo integration, we need to sample from \\(p(y_2|y_1, D_{1:2})\\): $$ p(y_2|y_1, D_{1:2}) = \\int p(y_2, \\theta|y_1, D_{1:2}) d\\theta = \\int p(y_2|\\theta, D_2)p(\\theta|y_1, D_1)d\\theta. $$ This method is called myopic adaptive Bayesian experimental design, because it only looks one observation ahead. The good aspect of this method is that it allows us to use the information of \\(y_1\\) to plan \\(D_2\\). The bad aspect, compared to the static method, is that it is greedy, the method does not carefully plan multiple experiments into the future, like the static method does.\nWe can combine the good aspects of both, by using the following 3 experiments: $$ D_1^\\ast = \\left\\lbrack \\argmin_{D_{1:3}} \\iiint S\\left (\\theta | y_{1:3}, D_{1:3} \\right) p(y_{1:3}|D_{1:3}) dy_{1:3} \\right\\rbrack_1, $$ $$ D_2^\\ast = \\left\\lbrack \\argmin_{D_{2:3}} \\iint S\\left (\\theta | y_1^\\ast, D_1^\\ast, y_{2:3}, D_{2:3} \\right) p(y_{2:3}|y_1^\\ast, D_1^\\ast, D_{2:3}) dy_{2:3} \\right\\rbrack_1, $$ $$ D_3^\\ast = \\argmin_{D_3} \\int S\\left (\\theta | y_{1:2}^\\ast, D_{1:2}^\\ast, y_3, D_3 \\right) p(y_3|y_{1:2}^\\ast, D_{1:2}^\\ast, D_3) dy_3. $$ Here we use the same entropy of the posterior after all 3 measurements have been gathered, except with the known optimal values \\(D_i^\\ast\\) and observations \\(y_i^\\ast\\) already filled in.\nWe will call this fully adaptive Bayesian experimental design. The major downside of this method is that a very challenging optimization method has to be solved in between measurements. Even if this challenge could be overcome, there is still one imperfection in this method: the experiment is adapted after every measurement, but the optimization criteria do not \u0026ldquo;know\u0026rdquo; this. The criteria encode the best experiments if they would be continued in a static fashion.\nTo remedy this we need the following criterion: $$ D_1^\\ast = \\argmin_{D_1} \\int \\left ( \\argmin_{D_2} \\int \\left ( \\argmin_{D_3} S (\\theta | y_{1:3}, D_{1:3}) p(y_3|y_{1:2}, D_{1:3}) dy_3\\right ) p(y_2|y_1, D_{1:2}) dy_2\\right ) p(y_1|D_1) dy_1. $$ The difference here is that the optimizations are pushed into the expectations, the optimization of \\(D_2\\) can thus condition on \\(y_1\\) and \\(D_1\\). After \\(y_1^\\ast\\) has been observed and \\(D_1^\\ast\\) performed, we get: $$ D_2^\\ast = \\argmin_{D_2} \\int \\left ( \\argmin_{D_3} S (\\theta | y_1^\\ast, D_1^\\ast, y_{2:3}, D_{2:3}) p(y_3|y_1^\\ast, D_1^\\ast, y_2, D_{2:3}) dy_3\\right ) p(y_2|y_1^\\ast, D_1^\\ast, D_2) dy_2. $$ After \\(y_2^\\ast\\) has been observed and \\(D_2^\\ast\\) performed, we get: $$ D_3^\\ast = \\argmin_{D_3} \\int S (\\theta | y_{1:2}^\\ast, D_{1:2}^\\ast, y_3, D_3) p(y_3|y_{1:2}^\\ast, D_{1:2}^\\ast, D_3) dy_3. $$\nNotice that the later criteria are present in the earlier criteria. To find \\(D_1^\\ast\\) we need to calculate the criterion used to find \\(D_2^\\ast\\) after \\(y_1^\\ast\\) has been gathered. In fact we need to calculate this criterion for every possible \\(y_1^\\ast\\) which could have been gathered. We call this policy-based Bayesian experimental design. The optimization problem has the structure of a dynamic programming problem. Calculating an optimal policy allows us to get rid of the online computations present in fully adaptive Bayesian experimental design, by using the pre-computed dynamic programming structure.\nHowever, when observations \\(y\\) are continuous, we cannot use tabular dynamic programming: the state space of possible observations is infinite. Approximate dynamic programming methods are required to solve these optimization problems in practice, which is the focus of much current experimental design literature, e.g. Deep Adaptive Design.\nFor a direct continuation that formalizes why this nested criterion is exactly a policy-optimization problem, see policy equivalence in adaptive Bayesian experimental design.\n","date":"1 February 2024","permalink":"https://arnostrouwen.com/posts/adaptive-bayesian-experimental-design/","section":"","summary":"\u003cp\u003e\n\nContinuing from \u003ca href=\"https://arnostrouwen.com/posts/bayesian-experimental-design/\"\u003eBayesian experimental design\u003c/a\u003e,\nwe  now consider adaptivity.\u003c/p\u003e","title":"Notes on adaptive Bayesian experimental design"},{"content":" For much of the experimental design literature, Bayesian experimental design is synonymous with the expected Kullback-Leibler divergence. It is not obvious at first glance why this criterion leads to informative experiments, so let\u0026rsquo;s dive a bit deeper into Bayesian experimental design.\nIn Bayesian experimental design, we start with a prior, \\(p(\\theta)\\). This prior represents our state of belief in possible values of the uncertain parameters, \\(\\theta\\). We then have to pick a design \\(D\\), to learn as much as possible about \\(\\theta\\). After the design has been chosen, a new measurement \\(y\\) is gathered. This measurement is then used to update our states of belief to the posterior distribution \\( p(\\theta |y, D)\\). We want this posterior distribution to be spread as narrowly as possible. But what does it mean for a distribution to be narrow? The most popular tool to quantify the spread of a distribution in Bayesian experimental design is entropy. The entropy of a distribution \\(X\\) is defined as: $$ S(X) = -\\int \\ln p(X) p(X)dX $$\nI might write a future post about an intuitive understanding of entropy, but for now, you need to know which distributions have high/low entropy.\nFor univariate distributions supported on the interval \\([a,b]\\), the uniform distribution on the interval \\([a,b]\\), is the distribution with the maximum entropy. The uniform distribution is equivalent to the state of belief where no value is to be preferred over any other.\nThis is different from other measures to quantify spread, such as variance. The distribution with maximal variance would be the 50%-50% mixture distribution of two Dirac delta distributions at \\(a\\) and \\(b\\).\nThe distribution with the lowest entropy, and also the lowest variance, is any Dirac delta distribution \\(\\delta(\\theta - c)\\) with \\( c \\in [a,b]\\). This Dirac delta distribution corresponds to a state of belief where you are absolutely certain \\(c\\) is the correct value.\nA succesful experiment thus achieves a posterior with low entropy. However, at the planning stage of the experiment, we do not have access to the measurement \\(y\\). We can only judge the quality of the design \\(D\\) on average over all possible measurements \\(y\\), weighted by how likely we think those measurements are to occur using our prior: $$ \\argmin_D \\left [ \\int \\left (-\\int \\ln p(\\theta |y, D) p(\\theta |y, D) d\\theta \\right) p(y|D) dy \\right ]. $$ After a bunch of math tricks, this can be rewritten as the same equation in the already mentioned wiki article. $$ \\argmax_D \\left [ \\iint \\ln p(y|\\theta, D) p(\\theta,y| D) d\\theta dy - \\int \\ln p(y|D) p(y|D) dy\\right ]. $$ I am unsure why entropy is the default choice to quantify spread in Bayesian experimental design, this is definitely not the case in frequentist experimental design, where variance is used. I have heard arguments that the above rewritten expression is easier to calculate, since it does not rely on the posterior anymore. For a criterion based on the variance, algebraic manipulations to get rid of the posterior are not (known to be?) possible. However, I do not find this convincing since you still need to evaluate \\(\\ln p(y|D)\\) in the rewritten expression, which will still require solving the integral: $$ p(y|D) = \\int p(y, \\theta|D) d\\theta. $$ Thus, unfortunately, a nested numerical integration technique will still be required to generate Bayesian optimal designs.\nFor continuations that add sequential decision-making and the policy interpretation, see adaptive Bayesian experimental design and policy equivalence in adaptive Bayesian experimental design.\n","date":"1 August 2023","permalink":"https://arnostrouwen.com/posts/bayesian-experimental-design/","section":"","summary":"\u003cp\u003e\n\nFor much of the experimental design literature, Bayesian experimental design is synonymous with the\n\u003ca href=\"https://en.wikipedia.org/wiki/Bayesian_experimental_design#Gain_in_Shannon_information_as_utility\" target=\"_blank\" rel=\"noreferrer\"\u003eexpected Kullback-Leibler divergence\u003c/a\u003e.\nIt is not obvious at first glance why this criterion leads to informative experiments,\nso let\u0026rsquo;s dive a bit deeper into Bayesian experimental design.\u003c/p\u003e","title":"Notes on Bayesian experimental design"},{"content":"I am happy to announce the publication of our paper \u0026ldquo;Adaptive and robust experimental design for linear dynamical models using Kalman filter\u0026rdquo; in Statistical Papers.\nCurrent experimental design techniques for dynamical systems often only incorporate measurement noise, while dynamical systems also involve process noise.\nTo construct experimental designs we need to quantify their information content. The Fisher information matrix is a popular tool to do so. Calculating the Fisher information matrix for linear dynamical systems with both process and measurement noise involves estimating the uncertain dynamical states using a Kalman filter. The Fisher information matrix, however, depends on the true but unknown model parameters. In this paper we combine two methods to solve this issue and develop a robust experimental design methodology. First, Bayesian experimental design averages the Fisher information matrix over a prior distribution of possible model parameter values. Second, adaptive experimental design allows for this information to be updated as measurements are being gathered. This updated information is then used to adapt the remainder of the design.\nRead the full article here.\n","date":"31 March 2023","permalink":"https://arnostrouwen.com/posts/adaptive-robust-experimental-design/","section":"","summary":"\u003cp\u003eI am happy to announce the publication of our paper \u0026ldquo;Adaptive and robust experimental design for linear dynamical models using Kalman filter\u0026rdquo; in \u003cem\u003eStatistical Papers\u003c/em\u003e.\u003c/p\u003e\n\u003cp\u003eCurrent experimental design techniques for dynamical systems often only incorporate measurement noise, while dynamical systems also involve process noise.\u003c/p\u003e","title":"New paper about Experimental Design using the Kalman Filter"},{"content":"Most experimental design focusses on parameter precision, where the model structure is assumed known and fixed. But arguably finding the correct model structure is the part of the modelling process that takes the most effort. In this blog we will look at automating this process using symbolic regression, and to do this with gathering too much data.\nThe Julia packages that we will use:\nusing SymbolicRegression using Symbolics using Distributions using Optimization, OptimizationBBO using Plots using Random; Random.seed!(12345) We will try to discover the equation: $$y(x) = \\exp(-x)\\sin(2\\pi x) + \\cos(\\frac{\\pi}{2}x), \\qquad 0 \\leq x \\leq 10,$$ automatically from data. Translated into Julia code:\ny(x) = exp(-x)*sin(2π*x) + cos(π/2*x) y(0.0) 1.0\rAs a baseline design let us gather 10 points randomly from the design space.\nn_obs = 10 design_region = Uniform(0.0,10.0) X = rand(design_region,n_obs) Y = y.(X) plot(0.0:0.1:10.0,y.(0.0:0.1:10.0),label=\u0026#34;true model\u0026#34;,lw=5,ls=:dash); scatter!(X,Y,ms=5,label=\u0026#34;data\u0026#34;); plot!(xlabel=\u0026#34;x\u0026#34;,ylabel=\u0026#34;y\u0026#34;, ylims=(-1.2,1.8)); plot!(tickfontsize=12, guidefontsize=14, legendfontsize=8, grid=false, dpi=600) \u003c?xml version=\"1.0\" encoding=\"utf-8\"?\u003e\rNow let us perform symbolic regression on this dataset. We will look for 10 model structures that fit the data.\noptions = SymbolicRegression.Options( unary_operators = (exp, sin, cos), binary_operators=(+, *, /, -), seed=123, deterministic=true, verbosity=false, save_to_file=false, defaults=v\u0026#34;0.24.5\u0026#34; ) hall_of_fame = EquationSearch(X\u0026#39;, Y, options=options, niterations=100, runtests=false, parallelism=:serial) n_best_max = 10 #incase \u0026lt; 10 model structures were returned n_best = min(length(hall_of_fame.members),n_best_max) best_models = sort(hall_of_fame.members,by=member-\u0026gt;member.loss)[1:n_best] 10-element Vector{PopMember{Float64, Float64, Expression{Float64, Node{Float64}, @NamedTuple{operators::DynamicExpressions.OperatorEnumModule.OperatorEnum{Tuple{typeof(+), typeof(*), typeof(/), typeof(-)}, Tuple{typeof(exp), typeof(sin), typeof(cos)}}, variable_names::Vector{String}}}}}:\rPopMember(tree = ((cos(x1 / 0.556922741892079) / (((sin(x1) + exp(x1)) + x1) + exp(-0.18983334693465292))) + cos(x1 / 0.6366135607609045)), loss = 1.353844447287992e-7, score = 0.064000242074528)\rPopMember(tree = ((cos(x1 / 0.556922741892079) / ((x1 + exp(x1)) + 1.7500726412136087)) + (cos(x1 / 0.6366135607609045) + 0.0004135556561892)), loss = 2.945164517643709e-7, score = 0.05760053521668158)\rPopMember(tree = (cos(x1 / 0.6366314628584495) - ((((-2.0209446503113644 - x1) - (-4.128483985539092 / x1)) + x1) / exp(x1))), loss = 9.307105294996532e-7, score = 0.05440169147743159)\rPopMember(tree = (cos(x1 / 0.6366314628584495) - ((-2.0209446503113644 - (-4.128483985539092 / x1)) / exp(x1))), loss = 9.307105294996631e-7, score = 0.04160169459475451)\rPopMember(tree = (cos(x1 / 0.6366314628584495) - sin((-2.0209446503113644 - (-4.128483985539092 / x1)) / (exp(x1) - 0.021122856968733148))), loss = 9.369243827736577e-7, score = 0.051201704501225895)\rPopMember(tree = (cos(x1 / 0.6366314628584495) - sin((-2.0209446503113644 - (-4.128483985539092 / x1)) / exp(x1))), loss = 9.382499642509582e-7, score = 0.04480170661063845)\rPopMember(tree = (cos(x1 / 0.6366314628584495) - sin(sin((-2.0209446503113644 - (-4.128483985539092 / x1)) / exp(x1)))), loss = 9.583132968993814e-7, score = 0.04800174515322376)\rPopMember(tree = (sin(cos(x1 / 0.556922741892079) / (((x1 + exp(x1)) + x1) - -0.18983334693465292)) + cos(x1 / 0.6366135607609045)), loss = 1.2465540540794797e-6, score = 0.06080226681628824)\rPopMember(tree = (cos(x1 / 0.6366486132211249) - (cos(x1 + -0.6342960332363662) / exp(x1))), loss = 5.174645144914829e-6, score = 0.038409419223890455)\rPopMember(tree = (sin((x1 - -1.0267283293605511) / 0.6384117132960604) / cos(-0.7430014086131702 / x1)), loss = 1.2723455785931058e-5, score = 0.03522316454183701)\rNote\nI ordered the model structures purely by mean squared error loss of the fits. Symbolic regression usually also incorporates a punishment for complexity of the model. I did not yet find a good way to incorporate this in the experimental design workflow, but this is definitely something that should be looked at.\nNow let us turn these symbolic expressions back into executable functions. Let us try it for the first suggested model structure:\n@syms x eqn = node_to_symbolic(best_models[1].tree, options,varMap=[\u0026#34;x\u0026#34;]) (cos(x / 0.556922741892079) / (((sin(x) + exp(x)) + x) + 0.8270969607022577)) + cos(x / 0.6366135607609045)\rf = build_function(eqn, x, expression=Val{false}) f.(X) 10-element Vector{Float64}:\r0.9917134120127088\r-0.9202812756848684\r0.541456165631766\r0.9838841433137335\r0.9985475538293273\r-0.27639066814560703\r0.206101568512253\r0.7390207485567821\r-0.8612497963096721\r0.0005192718672333284\rNow we do it for all the others and plot them:\nplot(0.0:0.1:10.0,y.(0.0:0.1:10.0),lw=5,label=\u0026#34;true model\u0026#34;,ls=:dash); model_structures = Function[] for i = 1:n_best eqn = node_to_symbolic(best_models[i].tree, options,varMap=[\u0026#34;x\u0026#34;]) fi = build_function(eqn, x, expression=Val{false}) x_plot = Float64[] y_plot = Float64[] for x_try in 0.0:0.1:10.0 try y_try = fi(x_try) append!(x_plot,x_try) append!(y_plot,y_try) catch end end plot!(x_plot, y_plot,label=\u0026#34;model $i\u0026#34;); push!(model_structures,fi) end scatter!(X,Y,ms=5,label=\u0026#34;data\u0026#34;,ls=:dash); plot!(xlabel=\u0026#34;x\u0026#34;,ylabel=\u0026#34;y\u0026#34;, ylims=(-1.2,1.6)); plot!(tickfontsize=12, guidefontsize=14, legendfontsize=8, grid=false) \u003c?xml version=\"1.0\" encoding=\"utf-8\"?\u003e\rWe see that none of the suggested model structures approximate the true model well in the area between 0 and 2.5, while between 2.5 and 10 the models agree. In this case it is thus probably a good idea to gather more data for small \\(x\\). Can we formalize this in mathematical terms? We will do this by creating a variant of T-optimal designs. T-optimal designs are model discrimination designs, where design points are sought which maximize the distance between a model thought to be correct (T for true) and some other plausible alternative model structures. Though perhaps it is better to think of the \u0026ldquo;true\u0026rdquo; model as a null hypothesis model. Design points are chosen such that the alternative models predict different values than the \u0026ldquo;true\u0026rdquo; model at these points. If the \u0026ldquo;true\u0026rdquo; model is then not correct after all, it should be easily discernible from the data.\nIn our situation, we do not have a model structure which can serve as the \u0026ldquo;true\u0026rdquo; model. We will instead work with all pairwise distances between the plausible model structures suggested by symbolic regression. Collecting measurements where the model structures differ greatly in predictions, will cause atleast some of the model structures to become unlikely, causing new model structures to enter the top 10. We call this S-optimal, with S for Symbolics. $$N = \\text{number of measurements}$$ $$M = \\text{number of models}$$ $$f_i = \\text{ith model structure}$$ $$x_k = \\text{kth design point}$$\n$$\\max_x \\frac{2}{M(M-1)}\\sum_{i=1}^{N}\\sum_{j=i+1}^{N} \\max_{k=1 \\text{ to } M}\\set{(f_i(x_k) - f_j(x_k))^2}$$\nNote\nThe average over the pairwise model comparisons could be replaced with the minimum. This would lead to a max-min-max strategy instead of a max-expected-max strategy. In my experiments this did not work well when two of the suggested model structures are very similar or identical. This often occurs because of terms like \\(sin(x-x)\\) being present in symbolic regression. Punishing for complexity might remedy this.\nNow let us apply this criterion to gather 3 new measurements:\nfunction S_criterion(x,model_structures) n_structures = length(model_structures) n_obs = length(x) if length(model_structures) == 1 # sometimes only a single model structure comes out of the equation search return 0.0 end y = zeros(n_obs,n_structures) for i in 1:n_structures y[:,i] .= model_structures[i].(x) end squared_differences = Float64[] for i in 1:n_structures for j in i+1:n_structures push!(squared_differences, maximum([k for k in (y[:,i] .- y[:,j]).^2])) end end -mean(squared_differences) # minus sign to minimize instead of maximize end function S_objective(x_new,(x_old,model_structures)) S_criterion([x_old;x_new],model_structures) end n_batch = 3 X_new_ini = rand(design_region,n_batch) S_objective(X_new_ini,(X,model_structures)) -0.0020153204305020994\rNote\nCan this be reformulated as a differentiable optimization problem, using slack variables?\nlb = fill(minimum(design_region),n_batch) ub = fill(maximum(design_region),n_batch) prob = OptimizationProblem(S_objective,X_new_ini,(X,model_structures),lb = lb, ub = ub) X_new = solve(prob,BBO_adaptive_de_rand_1_bin_radiuslimited(),maxtime=10.0) ┌ Warning: Using arrays or dicts to store parameters of different types can hurt performance.\r│ Consider using tuples instead.\r└ @ SciMLBase C:\\Users\\arno\\.julia\\packages\\SciMLBase\\XzPx0\\src\\performance_warnings.jl:33\rretcode: MaxTime\ru: 3-element Vector{Float64}:\r8.749678023720595e-154\r1.5708411540103517\r1.1455117200197213\rWe see that the 3 new observations are indeed both smaller than 2.5. Let us plot this:\nY_new = y.(X_new) plot(0.0:0.1:10.0,y.(0.0:0.1:10.0),lw=5,label=\u0026#34;true model\u0026#34;,ls=:dash); for i = 1:n_best x_plot = Float64[] y_plot = Float64[] for x_try in 0.0:0.01:10.0 try y_try = model_structures[i](x_try) append!(x_plot,x_try) append!(y_plot,y_try) catch end end plot!(x_plot, y_plot,label=\u0026#34;model $i\u0026#34;); end scatter!(X,Y,ms=5,label=\u0026#34;data old\u0026#34;); scatter!(X_new,Y_new,ms=5,label=\u0026#34;data new\u0026#34;); plot!(xlabel=\u0026#34;x\u0026#34;,ylabel=\u0026#34;y\u0026#34;, ylim=(-1.2,1.8)); plot!(tickfontsize=12, guidefontsize=14, legendfontsize=8, grid=false, dpi=600) \u003c?xml version=\"1.0\" encoding=\"utf-8\"?\u003e\rNow, we run symbolic regression on our combined dataset:\nX = [X;X_new] Y = [Y;Y_new] hall_of_fame = EquationSearch(X\u0026#39;, Y, options=options, niterations=100, runtests=false, parallelism=:serial) n_best = min(length(hall_of_fame.members),n_best_max) best_models = sort(hall_of_fame.members,by=member-\u0026gt;member.loss)[1:n_best] plot(0.0:0.01:10.0,y.(0.0:0.01:10.0),lw=5,label=\u0026#34;true model\u0026#34;,ls=:dash); model_structures = Function[] for i = 1:n_best eqn = node_to_symbolic(best_models[i].tree, options,varMap=[\u0026#34;x\u0026#34;]) println(eqn) fi = build_function(eqn, x, expression=Val{false}) x_plot = Float64[] y_plot = Float64[] for x_try in 0.0:0.01:10.0 try y_try = fi(x_try) append!(x_plot,x_try) append!(y_plot,y_try) catch end end plot!(x_plot, y_plot,label=\u0026#34;model $i\u0026#34;); push!(model_structures,fi) end scatter!(X,Y,ms=5,label=\u0026#34;data\u0026#34;); plot!(xlabel=\u0026#34;x\u0026#34;,ylabel=\u0026#34;y\u0026#34;, ylims=(-1.2,1.8)); plot!(tickfontsize=12, guidefontsize=14, legendfontsize=8, grid=false, dpi=600) cos(1.5707963268004939 * x) - (sin(x * -6.283185344000339) / (exp(x) - (-3.6820752424157317e-8 * (exp(x) + 1.1011581435364508))))\rcos(1.5707963268004939 * x) - (sin(-6.283185344000339 * x) / (exp(x) - (-3.6820752424157317e-8 * exp(x))))\rcos(1.5707963268004939 * x) - (sin(x * -6.283185344000339) / (exp(x) - ((x + 1.1011581435364508) * -3.6820752424157317e-8)))\rcos(x * 1.5707963268004939) - (sin(x * -6.283185344000339) / ((exp(x) - 1.5752699418768323) + 1.5752701705815984))\rcos(1.5707963268004939 * x) - (sin(x * -6.283185344000339) / (exp(x) - (-3.6820752424157317e-8 * x)))\rcos(1.5707963268004939 * x) - (sin(x * -6.283185344000339) / (exp(x) - -3.6820752424157317e-8))\rcos(x * 1.5707963268004939) - (sin(-6.283185344000339 * x) / exp(x))\rcos((x - -2.973035511273684e-6) * 1.5707957537647865) - (sin(x * -6.2831665710382785) / exp(x))\rcos(x * 1.5707791141382517) - sin(sin(-6.286309274467023 * x) / exp(x))\rcos(x * 1.5718434218299882) - (-0.030041072681216516 / (1.2656975994705448 - x))\r\u003c?xml version=\"1.0\" encoding=\"utf-8\"?\u003e\rEt voilà, we found the correct model structure, with only 3 new observations!\nNote\nIn fact we found it multiple times, with expressions like \\(sin(x-x)\\). Again, punishing for needless complexity would be of added value here.\n","date":"1 January 2023","permalink":"https://arnostrouwen.com/posts/soptimal/","section":"","summary":"\u003cp\u003eMost experimental design focusses on parameter precision,\nwhere the model structure is assumed known and fixed.\nBut arguably finding the correct model structure\nis the part of the modelling process that takes the most effort.\nIn this blog we will look at automating this process using symbolic regression,\nand to do this with gathering too much data.\u003c/p\u003e","title":"Design for model discrimination using symbolic regression"},{"content":"\rSome notes on Bayesian inference for stochastic differential equations in Julia. Specifically, inference for θ and σ of the Ornstein\u0026ndash;Uhlenbeck process. The explanation is quite terse, since in the end, I was not unable to get this to work on larger problems.\nGenerating the true data:\nusing DifferentialEquations using DiffEqNoiseProcess using Turing using Distributions using LinearAlgebra using Plots, StatsPlots using Random; Random.seed!(85455) function f!(du, u, p, t) du[1] = -p[1]*u[1] end function g!(du, u, p, t) du[1] = p[2] end u0 = [10.0] t0,tend = 0.0,1.0 Δt = 0.1 tspan = (t0,tend) tmeasure = t0:Δt:tend p = [1.0, 3.0] W = WienerProcess(0.0, zeros(length(u0))) prob = SDEProblem(f!,g!,u0,tspan,p,saveat=tmeasure,noise=W) sol = solve(prob) data = sol[1,:] # needs to be vectorized for Turing plot(sol) \u003c?xml version=\"1.0\" encoding=\"utf-8\"?\u003e\rWe see that this solve is indeed random:\nplot(solve(prob)) \u003c?xml version=\"1.0\" encoding=\"utf-8\"?\u003e\rplot() for _ in 1:1000 sol = solve(prob) plot!(sol) end plot!(legend=false) \u003c?xml version=\"1.0\" encoding=\"utf-8\"?\u003e\rThe SDE solvers discretize the process noise (W) in time. However, by default, the SDE solvers are adaptive, and thus evaluate the process noise, at different time-points for each solve. Turing can, however, only perform inference if the number of random variables and their meaning are the same in each iteration of the MC-MC algorithm.\nLuckily, DifferentialEquations.jl allows you to provide an already discretized version of the process noise to the solver, using NoiseGrid. The price you pay is that the process noise is interpolated linearly between the grid points, which is less accurate than if you let the solver choose its own discretization.\nTo generate the noise, we use the matrix normal distribution. Different columns in this matrix represent independent realizations of the process noise for different time-intervals.\nnoise_per_interval = MatrixNormal(zeros(length(u0),length(tmeasure)-1), diagm(Δt*ones(length(u0))), diagm(ones(length(tmeasure)-1))) noise_per_interval = rand(noise_per_interval) brownian_noise = hcat(zeros(length(u0)), cumsum(noise_per_interval,dims=2)) brownian_noise_aoa = [collect(c) for c in eachcol(brownian_noise)] # needs to be ArrayOfArray for DifferentialEquations.jl W = NoiseGrid(vcat(tmeasure,tend+Δt),vcat(brownian_noise_aoa,[rand(length(u0))])) # For some reason NoiseGrid needs an additional time-point after simulation has already ended. prob = remake(prob,noise=W) sol = solve(prob) plot(sol) \u003c?xml version=\"1.0\" encoding=\"utf-8\"?\u003e\rplot() for _ in 1:1000 noise_per_interval = MatrixNormal(zeros(length(u0),length(tmeasure)-1), diagm(Δt*ones(length(u0))), diagm(ones(length(tmeasure)-1))) noise_per_interval = rand(noise_per_interval) brownian_noise = hcat(zeros(length(u0)), cumsum(noise_per_interval,dims=2)) brownian_noise_aoa = [collect(c) for c in eachcol(brownian_noise)] W = NoiseGrid(vcat(tmeasure,tend+Δt),vcat(brownian_noise_aoa,[rand(length(u0))])) global prob = remake(prob,noise=W) sol = solve(prob) plot!(sol) end plot!(legend=false) \u003c?xml version=\"1.0\" encoding=\"utf-8\"?\u003e\rNow let us do inference using the Metropolis-Hastings algorithm.\n@model function model(data, prob) # Prior distributions for parameters of interest θ ~ Uniform(0.1,10.0) σ ~ Uniform(0.1,10.0) p = [θ,σ] # Prior distribution for process noise at measurement times noise_per_interval ~ MatrixNormal(zeros(length(u0), length(tmeasure)-1), diagm(Δt*ones(length(u0))), diagm(ones(length(tmeasure)-1))) brownian_noise = hcat(zeros(length(u0)), cumsum(noise_per_interval,dims=2)) brownian_noise_aoa = [collect(c) for c in eachcol(brownian_noise)] W = NoiseGrid(vcat(tmeasure,tend+Δt),vcat(brownian_noise_aoa,[rand(length(u0))])) # simulating the system prob = remake(prob,p=p,noise=W) sol = solve(prob) # likelihood data ~ MvNormal(sol[1,:],fill(sqrt(0.1) ,length(sol))) # Chain gets stuck without extra noise. return nothing end chain = sample(model(data, prob), MH(), MCMCThreads(), 1_0_000, 8, progress=false); plot(chain[:,[:θ,:σ],:]) \u003c?xml version=\"1.0\" encoding=\"utf-8\"?\u003e\rWe see that θ is recovered quite nicely, but that the chains do not completely agree on σ.\nI was unable to use more modern MCMC algorithms, such as NUTS, since these require derivatives of the likelihood, both towards the model parameters, θ and σ, as well as to the discretized process noise. Such derivatives cannot yet be calculated by StochasticDifferentialEquations.jl. Thus, this method will not scale to more complicated problems, with more states, parameters, and measurement times.\n","date":"1 August 2022","permalink":"https://arnostrouwen.com/posts/pplsde/","section":"","summary":"\u003cscript src=\"https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.6/require.min.js\" integrity=\"sha512-c3Nl8+7g4LMSTdrm621y7kf9v3SDPnhxLNhcjFJbKECVnmZHTdo+IRO05sNLTH/D3vA6u1X32ehoLC7WFVdheg==\" crossorigin=\"anonymous\"\u003e\u003c/script\u003e\r\n\u003cscript src=\"https://cdnjs.cloudflare.com/ajax/libs/jquery/3.5.1/jquery.min.js\" integrity=\"sha512-bLT0Qm9VnAYZDflyKcBaQ2gg0hSYNQrJ8RilYldYQ1FxQYoCLtUjuuRuZo+fjqhx/qtq/1itJ0C2ejDxltZVFg==\" crossorigin=\"anonymous\" data-relocate-top=\"true\"\u003e\u003c/script\u003e\r\n\u003cscript type=\"application/javascript\"\u003edefine('jquery', [],function() {return window.jQuery;})\u003c/script\u003e\r\n\u003cp\u003eSome notes on Bayesian inference for stochastic differential equations in Julia.\nSpecifically, inference for θ and σ of the\n\u003ca href=\"https://en.wikipedia.org/wiki/Ornstein%E2%80%93Uhlenbeck_process\" target=\"_blank\" rel=\"noreferrer\"\u003eOrnstein\u0026ndash;Uhlenbeck process\u003c/a\u003e.\nThe explanation is quite terse, since in the end, I was not unable to get this to work on larger problems.\u003c/p\u003e","title":"Notes on Probabilistic Programming for Stochastic Differential Equations"},{"content":" We continue from part 1 with a more rigorous version of the derivation of adjoint sensitivity analysis for continuous time systems,\n$$\\begin{align*} u(0) \u0026amp;= f_0(p)\\\\ u(t) \u0026amp;= u(0) + \\int_0^{t} f(u(q),p,q)dq\\\\ c(t) \u0026amp;= g(u(t),p,t)\\\\ G(c) \u0026amp;= \\int_0^{t_e } c(s)ds, \\end{align*}$$ \\(u(t)\\) is the dynamic state, which evolution in time is described by the function \\(f\\). \\(c(t)\\) is the cost at time \\(t\\) described by the function \\(g\\) and \\(G\\) is the total accumulated cost. Both \\(g\\) and \\(f\\) are dependent on the parameters \\(p\\) and the time \\(t\\). We want to calculate the effect \\(p\\) has on \\(G\\) using backpropagation.\nLet us assume that we have already pulled back from time \\(t_e\\) to time \\(t\\). We reparametrize \\(G\\) in terms of \\(p\\), \\(u(t)\\) and \\(c_{[0,t]}\\), which is the cost function restricted to the interval \\([0,t]\\), $$\\begin{align*} G(c_{[0,t]},u(t),p) = \\int_0^t c(s)ds + \u0026amp;\\int_t^{t_e} g(u(s),p,s)ds\\\\ \u0026amp;\\text{with}\\qquad u(s) = u(t) + \\int_t^s f(u(q),p,q)dq. \\end{align*}$$ If we assume that the partial derivative of \\(G(c_{[0,t]},u(t),p)\\) with regards to \\(u(t)\\) is equal to \\(\\lambda(t)\\),\n$$\\frac{\\partial G(c_{[0,t]},u(t),p)}{\\partial u(t)} = \\frac{\\partial \\int_t^{t_e} g(u(s),p,s)ds}{\\partial u(t)} = \\lambda(t),$$\nthen we can calculate the same partial derivative at a slightly further pulled back timepoint of \\(t-\\Delta t\\).\n$$\\begin{align*} \\frac{\\partial G(c_{[0,t-\\Delta t]},u(t-\\Delta t),p)}{\\partial u(t-\\Delta t)} \u0026amp;=\\frac{\\partial \\left( \\int_0^{t-\\Delta t} c(s) ds + \\int_{t-\\Delta t}^{t_e} g(u(s),p,s)ds\\right)}{\\partial u(t-\\Delta t)} \\\\ \u0026amp;=\\frac{\\partial \\left( \\int_{t-\\Delta t}^tg(u(s),p,s)ds + \\int_t^{t_e} g(u(s),p,s)ds\\right)}{\\partial u(t-\\Delta t)} \\\\ \u0026amp;=\\frac{\\partial \\int_{t-\\Delta t}^tg(u(s),p,s)ds}{\\partial u(t-\\Delta t)} + \\frac{\\partial \\int_t^{t_e} g(u(s),p,s)ds}{\\partial u(t)}\\frac{\\partial u(t)}{\\partial u(t-\\Delta t)} \\\\ \u0026amp;=\\frac{\\partial \\int_{t-\\Delta t}^tg(u(s),p,s)ds}{\\partial u(t-\\Delta t)} + \\lambda(t)\\frac{\\partial u(t)}{\\partial u(t-\\Delta t)} \\\\ \\end{align*}$$\nUsing the mean value theorem, we can write the second term as,\n$$\\begin{align*} \\frac{\\partial u(t)}{\\partial u(t-\\Delta t)} \u0026amp;= \\frac{\\partial \\left( u(t-\\Delta t) + \\int_{t-\\Delta t}^t f(u(q),p,q)dq\\right) }{\\partial u(t-\\Delta t)}\\\\ \u0026amp; = 1 + \\frac{\\partial \\int_{t-\\Delta t}^tf(u(q),p,q)dq}{\\partial u(t-\\Delta t)}\\\\ \u0026amp; = 1 + \\frac{\\partial f(u(t-\\Delta t_f),p,t-\\Delta t_f)}{\\partial u(t-\\Delta t)}\\Delta t \\qquad \\Delta t_f \\in [0,\\Delta t]\\\\ \u0026amp; = 1 + \\frac{\\partial f(u(t-\\Delta t_f),p,t-\\Delta t_f)}{\\partial u(t-\\Delta t_f)}\\frac{\\partial u(t-\\Delta t_f)}{\\partial u(t-\\Delta t)}\\Delta t\\\\ \u0026amp; = 1 + \\frac{\\partial f(u(t-\\Delta t_f),p,t-\\Delta t_f)}{\\partial u(t-\\Delta t_f)}\\Delta t + \\\\ \u0026amp; \\qquad \\frac{\\partial f(u(t-\\Delta t_{f_2}),p,t-\\Delta t_{f_2})}{\\partial u(t-\\Delta t)}\\Delta t(\\Delta t - \\Delta t_f) \\qquad \\Delta t_{f_2} \\in [\\Delta t_f,\\Delta t], \\end{align*}$$\nand the second term as,\n$$\\begin{align*} \\frac{\\partial \\int_{t-\\Delta t}^tg(u(s),p,s)ds}{\\partial u(t-\\Delta t)} \u0026amp;= \\frac{\\partial g(u(t-\\Delta t_g),p,t-\\Delta t_g)}{\\partial u(t-\\Delta t)}\\Delta t \\qquad \\Delta t_g \\in [0,\\Delta t] \\\\ \u0026amp;= \\frac{\\partial g(u(t-\\Delta t_g),p,t-\\Delta t_g)}{\\partial u(t-\\Delta t_g)}\\frac{\\partial u(t-\\Delta t_g)}{\\partial u(t-\\Delta t)}\\Delta t\\\\ \u0026amp;= \\frac{\\partial g(u(t-\\Delta t_g),p,t-\\Delta t_g)}{\\partial u(t-\\Delta t_g)}\\Delta t + \\\\ \u0026amp; \\qquad \\frac{\\partial f(u(t-\\Delta t_{g_2}),p,t-\\Delta t_{g_2})}{\\partial u(t-\\Delta t)}\\Delta t(\\Delta t - \\Delta t_g) \\qquad \\Delta t_{g_2} \\in [\\Delta t_g,\\Delta t]. \\end{align*}$$\nWe can obtain a differential equation for \\(\\lambda\\) by taking the limit,\n$$\\begin{align*} \\frac{d\\lambda}{dt} \u0026amp; = \\lim_{\\Delta t \\to 0} \\frac{\\frac{\\partial G(c_{[0,t]},u(t),p)}{\\partial u(t)} -\\frac{\\partial G(c_{[0,t-\\Delta t]},u(t-\\Delta t),p)}{\\partial u(t-\\Delta t)}}{\\Delta t} \\\\ \u0026amp; = \\lim_{\\Delta t \\to 0} \\left( -\\frac{\\partial g(u(t-\\Delta t_g),p,t-\\Delta t_g)}{\\partial u(t-\\Delta t_g)} - \\lambda(t)\\frac{\\partial f(u(t-\\Delta t_f),p,t-\\Delta t_f)}{\\partial u(t-\\Delta t_f)} + \\right. \\\\ \u0026amp; \\qquad \\left. \\vphantom{\\frac{\\partial}{\\partial}}\\ldots (\\Delta t - \\Delta t_{g_2}) + \\ldots (\\Delta t - \\Delta t_{f_2}) \\right)\\\\ \u0026amp; = -\\frac{\\partial g(u(t),p,t)}{\\partial u(t)} - \\lambda(t)\\frac{\\partial f(u(t),p,t)}{\\partial u(t)} \\end{align*}$$ This differential equation is the same as the one found in part 1. It is a good exercise to try the same technique on \\(\\frac{\\partial G(c_{[0,t]},u(t),p)}{\\partial p}\\).\n","date":"30 April 2022","permalink":"https://arnostrouwen.com/posts/adjoint-sensitivity2/","section":"","summary":"\u003cp\u003e\n\nWe continue from \u003ca href=\"https://arnostrouwen.com/posts/adjoint-sensitivity/\"\u003epart 1\u003c/a\u003e with a more rigorous version of the derivation of adjoint sensitivity analysis for continuous time systems,\u003c/p\u003e","title":"Notes on adjoint sensitivity analysis of dynamic systems part 2"},{"content":" Gradients are useful for efficient parameter estimation and optimal control of dynamic systems. Calculating these gradients requires sensitivity analysis. Sensitivity analysis for dynamic systems comes in two flavors, forward mode and adjoint (reverse). For systems with a large number of parameters adjoint sensitivity analysis is often more efficient [1]. I find that the traditional way of deriving adjoints for ordinary differential equations, such as [3], leaves me with little intuition what these equations represent. The goal of this blog post is to gain some intuition about these equations by deriving the adjoint equations in a different way.\nA prerequisite for understanding this post is being comfortable with the concept of backpropagation used in machine learning. If you are not familiar with this, I recommend you first read [2], until you are comfortable with the backpropagation example of logistic regression.\nTo better understand adjoint sensitivity analysis for continuous time systems, we first start from the simpler case of discrete-time dynamic systems. We want to the backpropagate the influence the parameters \\(p\\) have on the total cost \\(C\\) in the following computation,\n$$\\begin{align*} u_0 \u0026amp;= f_0(p)\\\\ u_1 \u0026amp;= f_1(u_0,p)\\\\ c_1 \u0026amp;= g_1(u_1,p)\\\\ u_2 \u0026amp;= f_2(u_1,p)\\\\ c_2 \u0026amp;= g_2(u_2,p)\\\\ C \u0026amp;= G(c_1, c_2) = c_1+c_2. \\end{align*}$$\nIn these equations \\(u_0\\) is the initial state of the dynamic system, which is a function of the parameters. The transition from the initial state to the next state \\(u_1\\) is described by the function \\(f_1\\), which is a function of the initial state and the parameters. Similarly, there is another transition \\(f_2\\) to \\(u_2\\). For simplicity, we only consider a dynamic system with two steps. For each of the two states \\(u_1\\) and \\(u_2\\) there is an associated cost function \\(c_1\\) and \\(c_2\\), these costs can also be function of the parameters. The total cost \\(C\\) is the sum of the two individual costs. Backpropagation will eventually lead to the gradient of \\(G\\) towards \\(p\\).\nNOTE\nIn the following equations we will write the function \\(G\\) with different inputs. \\(G\\) is the only function for which this is done, all other functions, such as \\(f_1\\), will only be considered as functions of the inputs they are written as in the above system.\nThe gradient of \\(G\\) towards \\(c_1\\) and \\(c_2\\) is not difficult to calculate,\n$$\\nabla G(c_1, c_2) = \\nabla(c_1+c_2) = [1,1]^T.$$\nNow let us take a step back and substitute \\(c_2\\) with \\(u_2\\) and \\(p\\),\n$$ \\nabla G(c_1, u_2, p) = \\nabla(c_1+g_2(u_2,p)) = \\left[1, \\frac{\\partial g_2}{\\partial u_2}, \\frac{\\partial g_2}{\\partial p} \\right]^T. $$\nCall the second to last element of this vector,\n$$\\lambda_2 = \\frac{\\partial g_2}{\\partial u_2},$$\nand call the last element of this vector,\n$$\\phi_2 = \\frac{\\partial g_2}{\\partial p}.$$\nNow we pull back some more and substitute \\(c_1\\) with \\(u_1\\) and \\(p\\), and also substitute \\(u_2\\) with \\(u_1\\) and \\(p\\),\n$$\\begin{align*} \\nabla G(u_1, p) \u0026amp;= \\nabla(g_1(u_1,p)+g_2(f_2(u_1,p),p)) \\\\ \u0026amp;= \\left[ \\frac{\\partial g_1}{\\partial u_1} + \\frac{\\partial g_2}{\\partial u_2}\\frac{\\partial f_2}{\\partial u_1}, \\frac{\\partial g_1}{\\partial p} + \\frac{\\partial g_2}{\\partial p} + \\frac{\\partial g_2}{\\partial u_2}\\frac{\\partial f_2}{\\partial p} \\right]^T. \\end{align*}$$\nNow note that the second to last element of this vector can be written as,\n$$\\lambda_1 = \\frac{\\partial g_1}{\\partial u_1} + \\lambda_2 \\frac{\\partial f_2}{\\partial u_1},$$\nand the last element as,\n$$\\phi_1 = \\lambda_2 \\frac{\\partial f_2}{\\partial p} + \\frac{\\partial g_1}{\\partial p} + \\phi_2.$$\nSubstituting \\(u_1\\) with \\(u_0\\) and \\(p\\) works the same,\n$$\\begin{align*} \\nabla G(u_0, p) \u0026amp;= \\nabla(g_1(f_1(u_0,p),p)+g_2(f_2(f_1(u_0,p),p),p) \\\\ \u0026amp;=\\left[ \\frac{\\partial g_1}{\\partial u_1}\\frac{\\partial f_1}{\\partial u_0}+ \\frac{\\partial g_2}{\\partial u_2}\\frac{\\partial f_2}{\\partial u_1}\\frac{\\partial f_1}{\\partial u_0}, \\frac{\\partial g_1}{\\partial p} + \\frac{\\partial g_1}{\\partial u_1}\\frac{\\partial f_1}{\\partial p} + \\frac{\\partial g_2}{\\partial p} + \\frac{\\partial g_2}{\\partial u_2}\\frac{\\partial f_2}{\\partial p} + \\frac{\\partial g_2}{\\partial u_2}\\frac{\\partial f_2}{\\partial u_1}\\frac{\\partial f_1}{\\partial p} \\right]^T, \\end{align*}$$\n$$\\lambda_0 = \\lambda_1 \\frac{\\partial f_1}{\\partial u_0},$$\n$$\\phi_0 = \\lambda_1 \\frac{\\partial f_1}{\\partial p} + \\phi_1.$$\nSubstituing \\(u_0\\) with \\(p\\) gives the desired gradient,\n$$\\begin{align*} \\nabla G(p) \u0026amp;= \\nabla(g_1(f_1(f_0(p),p),p)+g_2(f_2(f_1(f_0(p),p),p),p)\\\\ \u0026amp;=\\left[ \\frac{\\partial g_1}{\\partial p} + \\frac{\\partial g_1}{\\partial u_1}\\frac{\\partial f_1}{\\partial p} + \\frac{\\partial g_1}{\\partial u_1}\\frac{\\partial f_1}{\\partial u_0}\\frac{\\partial f_0}{\\partial u_p} + \\frac{\\partial g_2}{\\partial p} + \\frac{\\partial g_2}{\\partial u_2}\\frac{\\partial f_2}{\\partial p} + \\frac{\\partial g_2}{\\partial u_2}\\frac{\\partial f_2}{\\partial u_1}\\frac{\\partial f_1}{\\partial p} + \\frac{\\partial g_2}{\\partial u_2}\\frac{\\partial f_2}{\\partial u_1}\\frac{\\partial f_1}{\\partial u_0}\\frac{\\partial f_0}{\\partial u_p} \\right]^T\\\\ \u0026amp;=\\left[ \\lambda_0 \\frac{\\partial f_0}{\\partial p} + \\phi_0 \\right]^T. \\end{align*}$$\nYou should check that, for a high dimensional \\(p\\), this recursion based on \\(\\lambda\\) and \\(\\phi\\), involves less costly matrix multiplications than the forward mode sensitivity analysis:\n$$\\begin{align*} \\frac{\\partial u_0}{\\partial p} \u0026amp;= \\frac{\\partial f_0}{\\partial p}\\\\ \\frac{\\partial u_1}{\\partial p} \u0026amp;= \\frac{\\partial f_1}{\\partial u_0}\\frac{\\partial u_0}{\\partial p} + \\frac{\\partial f_1}{\\partial p}\\\\ \\frac{\\partial c_1}{\\partial p} \u0026amp;= \\frac{\\partial g_1}{\\partial u_1}\\frac{\\partial u_1}{\\partial p} + \\frac{\\partial g_1}{\\partial p}\\\\ \\frac{\\partial u_2}{\\partial p} \u0026amp;= \\frac{\\partial f_2}{\\partial u_1}\\frac{\\partial u_1}{\\partial p} + \\frac{\\partial f_2}{\\partial p}\\\\ \\frac{\\partial c_2}{\\partial p} \u0026amp;= \\frac{\\partial g_2}{\\partial u_2}\\frac{\\partial u_2}{\\partial p} + \\frac{\\partial g_2}{\\partial p}\\\\ \\frac{\\partial G}{\\partial p} \u0026amp;= \\frac{\\partial c_1}{\\partial p} + \\frac{\\partial c_2}{\\partial p}. \\end{align*}$$\nNow we know how to calculate the pullback of a difference equation. Can we extend this intuition to continuous time systems? If we assume that \\(f_k\\) comes from a forward euler discretization of a function \\(f_c\\),\n$$f_{k+1}(u_k,p) = u_k + \\Delta t f^c(u_k,p),$$\nand we assume that \\(g_k\\) is the accumulation of a continuous cost function \\(g^c\\), which can be considered constant in a short time-window \\(\\Delta t\\),\n$$g_k(u_k,p) = \\Delta t g^c(u_k,p),$$\nthen we can calculate the recursion for \\(\\lambda\\) as,\n$$ \\lambda_{k} = \\frac{\\partial g_k}{\\partial u_k} + \\lambda_{k+1} \\frac{\\partial f_{k+1}}{\\partial u_k} = \\Delta t\\frac{\\partial g^c}{\\partial u_k} + \\lambda_{k+1} \\left(1+\\Delta t\\frac{\\partial f^c}{\\partial u_k}\\right), $$\nwhich is the backwards euler solve of the differential equation,\n$$\\begin{align*} \\frac{\\lambda_{k+1} - \\lambda_{k}}{\\Delta t} \u0026amp;= -\\frac{\\partial g^c}{\\partial u_k}- \\lambda_{k+1} \\frac{\\partial f^c}{\\partial u_k},\\\\ \\frac{d\\lambda}{dt} \u0026amp;= -\\frac{\\partial g^c}{\\partial u} - \\lambda \\frac{\\partial f^c}{\\partial u}. \\end{align*}$$\nSimilarly we can calculate the recursion for \\(\\phi\\) as:\n$$\\begin{align*} \\phi_k \u0026amp;= \\lambda_{k+1} \\frac{\\partial f_{k+1}}{\\partial p} + \\frac{\\partial g_k}{\\partial p} + \\phi_{k+1},\\\\ \\frac{\\phi_{k+1}-\\phi_k }{\\Delta t} \u0026amp;= -\\lambda_{k+1} \\frac{\\partial f^c}{\\partial p} - \\frac{\\partial g^c}{\\partial p},\\\\ \\frac{d\\phi}{dt} \u0026amp;= -\\lambda \\frac{\\partial f^c}{\\partial p} - \\frac{\\partial g^c}{\\partial p}. \\end{align*}$$\nThese are the same equations you find in the documentation of DifferentialEquations.jl [4]. This, however, only gives us an intuition behind the equations for continuous time systems. In part 2 we will see if a more rigorous version of this argument can be made for continuous time systems.\nReferences Y. Ma, V. Dixit, M. J. Innes, X. Guo and C. Rackauckas, \u0026ldquo;A Comparison of Automatic Differentiation and Continuous Sensitivity Analysis for Derivatives of Differential Equation Solutions,\u0026rdquo; 2021 IEEE High Performance Extreme Computing Conference (HPEC), 2021, pp. 1-9. Lecture 10 of the Parallel Computing and Scientific Machine Learning course Lecture 11 of the Parallel Computing and Scientific Machine Learning course SciML sensitivity analysis documentation ","date":"31 January 2022","permalink":"https://arnostrouwen.com/posts/adjoint-sensitivity/","section":"","summary":"\u003cp\u003e\n\nGradients are useful for efficient parameter estimation and optimal control of dynamic systems.\nCalculating these gradients requires sensitivity analysis.\nSensitivity analysis for dynamic systems comes in two flavors, forward mode and adjoint (reverse).\nFor systems with a large number of parameters adjoint sensitivity analysis is often more efficient\n\u003ca href=\"https://ieeexplore.ieee.org/abstract/document/9622796\" target=\"_blank\" rel=\"noreferrer\"\u003e[1]\u003c/a\u003e.\nI find that the traditional way of deriving adjoints for ordinary differential equations, such as\n\u003ca href=\"https://book.sciml.ai/notes/11-Differentiable_Programming_and_Neural_Differential_Equations/\" target=\"_blank\" rel=\"noreferrer\"\u003e[3]\u003c/a\u003e,\nleaves me with little intuition what these equations represent.\nThe goal of this blog post is to gain some intuition about these equations by deriving the adjoint equations in a different way.\u003c/p\u003e","title":"Notes on adjoint sensitivity analysis of dynamic systems part 1"},{"content":" Optimal experimental design is an area of statistics focused on constructing informative experiments. In this tutorial we introduce the necessary tools to construct such informative experiments for dynamic systems using only 100 lines of Julia code. We will work with a well-mixed fed-batch bioreactor as an example system. We have quite a bit of domain knowledge how to model the behavior of such a reactor. The reactor has three dynamic states: the substrate concentration \\(C_s\\), the biomass concentration \\(C_x\\) and the volume of the reactor \\(V\\). The evolution in time of these states is governed by the following differential equations:\n$$\\begin{align*} \\frac{dC_s}{dt} \u0026amp;= -\\sigma C_x + \\frac{Q_{in}(t)}{V}(C_{S,in} - C_s)\\\\ \\frac{dC_x}{dt} \u0026amp;= \\mu C_x - \\frac{Q_{in}(t)}{V}C_x\\\\ \\frac{dV}{dt} \u0026amp;= Q_{in}(t), \\end{align*}$$ where,\n$$\\begin{align*} \\mu \u0026amp;= \\frac{\\mu_{max}C_s}{K_s + C_s}\\\\ \\sigma \u0026amp;= \\frac{\\mu}{y_{x,s}} + m. \\end{align*}$$\nNot all parameters in these equations are exactly known. We are unsure of the value of the maximal growth rate \\(\\mu_{max}\\), and the half saturation constant \\(K_s\\). We want to construct an experiment to learn about these parameters. To reach this goal we can control the volume of the reactor by the volumetric flow rate \\(Q_{in}(t)\\). And we will measure the two states \\(C_s\\) and \\(C_x\\). The other parameters in the equation, namely the substrate feed concentration \\(C_{S,in}\\), the maintenance factor \\(m\\) and yield \\(y_{x,s}\\), are considered to be exactly known. All of the numerical values we will use are taken from [1].\nThe packages that we will use:\nusing OrdinaryDiffEq using DiffEqFlux using ForwardDiff using Distributions using Quadrature using Optim using LinearAlgebra using Plots using Random Defining the system First, we define the dynamics of the fed-batch reactor.\nfunction dynamics!(du,u,p,t) C_s, C_x, V = u μ_max, K_s = @view p[1:n_θ] control_par = @view p[n_θ+1:end] Q_in = control_network(t, control_par)[1] C_s_in, y_x_s, m = 50.0, 0.777, 0.0 μ = μ_max*C_s/(K_s + C_s) σ = μ/y_x_s + m du[1] = -σ*C_x + Q_in/V*(C_s_in - C_s) du[2] = μ*C_x - Q_in/V*C_x du[3] = Q_in return nothing end Finding the optimal controls is an infinite dimensional optimization problem. To reduce the complexity of this problem to a non-linear optimization one, we use a parametrized control function. We call this parametrization \\(x\\), and in this case we use a neural network with three hidden layers, each with five neurons. Time \\(t\\) is the only input to this neural network and the flow rate \\(Q_{in}\\) is the only output. Note that the last activation function constrains the control input between 0 and 10. You can fine-tune the amount of hidden layers and neurons as well as the different activation functions to achieve even better experiments than presented in this tutorial.\nRandom.seed!(45415) const control_network = FastChain(FastDense(1, 5,gelu), FastDense(5, 5,swish), FastDense(5, 5,relu), FastDense(5, 1,z-\u0026gt;10.0 .*sigmoid(z))) const x_ini = randn(length(initial_params(control_network))) const n_x = length(x_ini) Next, we define the true value of the two uncertain parameters \\(\\theta\\). We will use the true values for a simulation study to test the performance of our experiment. Both the uncertain parameters and the control parameters will have to be passed together as the \\(p\\) argument of dynamics!.\nconst θ_t = [0.421, 0.439] const n_θ = length(θ_t ) const p = vcat(θ_t,x_ini) Our experiment will last 15 hours. The initial conditions of the dynamics, \\(u_0\\), are fixed.\nconst tspan = (0.0, 15.0) const u_0 = [3.0, 0.25, 7.0] const n_u = length(u_0) We continue by simulating the true system.\nprob = ODEProblem(dynamics!,u_0,tspan,p) sol = solve(prob,Tsit5(),reltol=1e-5,abstol=1e-5) plot(sol;label=[\u0026#34;Cₛ(g/L)\u0026#34; \u0026#34;Cₓ(g/L)\u0026#34; \u0026#34;V(L)\u0026#34;],xlabel=\u0026#34;t(h)\u0026#34;,lw=3) plot!(tickfontsize=12,guidefontsize=14,legendfontsize=14,grid=false, dpi=600) Some optimal experimental design theory Generally, we do not measure all the dynamic states continuously, but instead we take measurements \\(y_k\\), which are some function of the states \\(u\\) at discrete time points,\n$$\\begin{align*} \\frac{du}{dt} \u0026amp;= f(u,\\theta,x,t)\\\\ y_k(\\theta,x) \u0026amp;= h(u(\\theta,x,t_k)). \\end{align*}$$\nIn our case we measure the substrate and biomass concentration every hour. These measurements should inform us about the true value of the uncertain parameters. Often these measurements are noisy, with covariance \\(\\Sigma\\).\nconst tmeasurement = 0.0:1.0:15.0 const Σ = [0.001 0.0; 0.0 0.000625] The Fisher information matrix (FIM) \\(F\\) is a popular way to quantify this information content of an experiment,\n$$F(\\theta,x) = \\sum_k \\frac{\\partial y_k\u0026rsquo;}{\\partial\\theta}\\Sigma^{-1} \\frac{\\partial y_k}{\\partial\\theta}.$$\nThe intuition behind this formula is that in a good experiment the measurements should be sensitive towards the value of the uncertain model parameters. If the experiment is such that the measurements are similar no matter what the value of the true parameter is, then it will be hard to precisely determine that parameter value. These sensitivities of the measurements towards the uncertain parameters can be further expanded,\n$$\\frac{\\partial y_k(\\theta,x)}{\\partial\\theta} = \\frac{\\partial h}{\\partial u} \\frac{\\partial u(\\theta,x,t_k)}{\\partial\\theta}.$$\nSince we only measure the first two states, the first factor, \\(\\frac{\\partial h}{\\partial u}\\), is equal to the matrix [1 0; 0 1; 0 0]. The second factor can be calculated from the forward sensitivity transform of the differential equation system,\n$$\\begin{align*} \\frac{d}{dt}\\frac{\\partial u(\\theta,x,t)}{\\partial\\theta} \u0026amp;= \\frac{\\partial}{\\partial\\theta}\\frac{du(\\theta,x,t)}{dt} = \\frac{\\partial f(u,\\theta,x,t)}{\\partial u}\\frac{\\partial u}{\\partial \\theta} + \\frac{\\partial f(u,\\theta,x,t)}{\\partial \\theta}\\\\ \\frac{\\partial u(\\theta,x,t=0)}{\\partial \\theta} \u0026amp;= 0. \\end{align*}$$\nIn Julia this can be done using forward mode automatic differentiation. We want the Fisher information matrix to be as large as possible, but what constitutes a large matrix? The inverse of the FIM is related to confidence ellipsoids and we want these ellipsoids to be as small as possible volume wise, which turns out to be equivalent to maximizing the determinant of the FIM. In the experimental design literature, these are called D-optimal experiments. In Julia this can be done in the following way:\nfunction solve_wrap(θ,x) p = [θ;x] prob = ODEProblem(dynamics!,u_0,tspan,p) sol = solve(prob,Tsit5(),saveat=tmeasurement,reltol=1e-5,abstol=1e-5) u = convert(Array,sol) end function D_criterion(θ,x) FIM = zeros(eltype(x),n_θ,n_θ) jac = ForwardDiff.jacobian(θ-\u0026gt;solve_wrap(θ,x),θ) for k in 1:length(tmeasurement) du_dθ = jac[(k-1)*n_u+1:k*n_u,:] dy_θ = du_dθ[1:2,:] FIM += dy_θ\u0026#39;*inv(Σ)*dy_θ end return det(FIM) end The major issue with experimental design based on the Fisher information matrix, is the dependence of the optimal design on the true model parameter values that the experiment aims to learn about. We can robustify this by averaging the D-criterion over a distribution of possible parameter values. We specify the uncertainty on the model parameters as probability distributions provided by Distributions.jl. We use truncated normal distributions centered around \\(4.0\\) for both uncertain parameters. The expectation is calculated rather inaccurately to speed up the tutorial.\nfunction robust_D_criterion(x) θ_dist = product_distribution([truncated(Normal(0.4,0.1),0.1,0.7),truncated(Normal(0.4,0.1),0.1,0.7)]) integrand(θ,x) = D_criterion(θ,x)*pdf(θ_dist,θ) prob_quadrature = QuadratureProblem(integrand,minimum(θ_dist),maximum(θ_dist),x) sol_quadrature = solve(prob_quadrature,HCubatureJL(),reltol = 1e-2, abstol = 1e-2)[1] end Optimal experiment Finally, we get to optimizing the controls. A first order optimization technique is used. Optim.jl is capable of calculating the required gradient using automatic differentiation.\nx_opt = optimize(x-\u0026gt; -robust_D_criterion(x), x_ini, BFGS(), Optim.Options(iterations=10), autodiff = :forward).minimizer t_plot = 0.0:0.1:15 plot(t_plot,[control_network(t, x_ini)[1] for t in t_plot ],label=\u0026#34;initial experiment\u0026#34;,lw=3) plot!(t_plot,[control_network(t, x_opt)[1] for t in t_plot],label=\u0026#34;optimized experiment\u0026#34;,lw=3) plot!(xlabel=\u0026#34;t(h)\u0026#34;, ylabel=\u0026#34;Q(l/h)\u0026#34;) plot!(tickfontsize=12,guidefontsize=14,legendfontsize=14,grid=false,ylim=(-1,11), dpi=600) Now all that remains is showing the added value of the optimal experiment. We simulate many experiments according to both the design we started with, and the optimized design, using the true parameter values. We then see how precisely, we can recover the true parameters from the simulated data sets. The optimized experiment\u0026rsquo;s parameter estimates are generally closer to the true parameter values than the initial experiment we started with.\nRandom.seed!(454154) function SSE(θ,x) prob = ODEProblem(dynamics!,u_0,tspan,vcat(θ,x)) sol = solve(prob,Tsit5(),saveat=tmeasurement,reltol=1e-5) y_est = sol[1:2,:] errors = y_measure - y_est errors_scaled = sqrt(inv(Σ))*errors sum(errors_scaled.^2) end y_true = solve(ODEProblem(dynamics!,u_0,tspan,vcat(θ_t,x_ini)),Tsit5(), saveat=tmeasurement,reltol=1e-5,abstol=1e-5)[1:2,:] θ_estimates = zeros(n_θ,100) for j = 1:100 global y_measure = y_true + rand(MvNormal(Σ),length(tmeasurement)) θ_estimates[:,j] = optimize(θ-\u0026gt;SSE(θ,x_ini), θ_t).minimizer end Plots.scatter(θ_estimates[1,:],θ_estimates[2,:],label=\u0026#34;unoptimized experiment\u0026#34;,ms=5) y_true = solve(ODEProblem(dynamics!,u_0,tspan,vcat(θ_t,x_opt)),Tsit5(), saveat=tmeasurement,reltol=1e-5,abstol=0.0)[1:2,:] for j = 1:100 global y_measure = y_true + rand(MvNormal(Σ),length(tmeasurement)) θ_estimates[:,j] = optimize(θ-\u0026gt;SSE(θ,x_opt), θ_t).minimizer end Plots.scatter!(θ_estimates[1,:],θ_estimates[2,:],label=\u0026#34;optimized experiment\u0026#34;,ms=5) plot!(xlabel=\u0026#34;μₘₐₓ\u0026#34;,ylabel=\u0026#34;Kₛ\u0026#34;) plot!(tickfontsize=12,guidefontsize=14,legendfontsize=14,grid=false, dpi=600) The precisely estimated parameters can subsequently be used to obtain a better control of the bio-reactor to, for example, grow as much biomass as possible. Currently, the design optimization is implemented using forward over forward automatic differentiation. It would be more efficient to instead do this with reverse over forward automatic differentiation, but this is not yet easy to do in Julia.\nReferences Telen, Dries, et al. \u0026ldquo;Robustifying optimal experiment design for nonlinear, dynamic (bio) chemical systems.\u0026rdquo; Computers \u0026amp; chemical engineering 71 (2014): 415-425. ","date":"30 September 2021","permalink":"https://arnostrouwen.com/posts/dynamic-experimental-design/","section":"","summary":"\u003cp\u003e\n\nOptimal experimental design is an area of statistics focused on constructing informative experiments.\nIn this tutorial we introduce the necessary tools to construct such informative experiments for dynamic systems using only 100 lines of Julia code.\nWe will work with a well-mixed fed-batch bioreactor as an example system. We have quite a bit of domain knowledge how to model the behavior of such a reactor.\nThe reactor has three dynamic states: the substrate concentration \\(C_s\\), the biomass concentration \\(C_x\\) and the volume of the reactor \\(V\\).\nThe evolution in time of these states is governed by the following differential equations:\u003c/p\u003e","title":"Dynamic experimental design in 100 lines of Julia code"},{"content":"I am a statistical consultant focused on experimental design and other optimal data gathering techniques, particularly for dynamic systems.\nOften, researchers will perform experiments and only talk to a statistician afterwards. But statisticians are not only useful for data analysis. Much time and effort can be saved by carefully planning an experiment. Every measurement setup has its own unique details. I help other researchers by constructing optimal experiments, custom made for their own measurement setup, goals, and budgetary constraints.\nMuch of my expertise in experimental design, I developed during my Research Foundation Flanders (FWO) PhD fellowship at the KULeuven. There, I designed experiments to precisely characterize the respiration and fermentation of fruit and vegetables, leading to better storage solutions.\nAfter graduating, I joined the manufacturing and applied statistics department of Johnson \u0026amp; Johnson to valorize the ideas developed in my PhD. I worked on designing accelerated stability studies to precisely predict the shelf life of vaccines and other drug products. Some of my other responsibilities included designing high-throughput experiments to optimize the manufacturing conditions of chemical reactions, Bayesian mixed effect modelling to determine the mechanical properties of powders, and optimally blocking experiments when full randomization is not an option.\nI then felt ready for a new challenge, starting my own consulting company, Strouwen Statistics. My first client is JuliaHub, where I work on the statistical aspects of the quantitative systems pharmacology software, PumasQSP, as well as the battery simulation software, JuliaSimBatteries. I am also responsible for the continuous integration and delivery of the documentation of the Scientific Machine Learning Project (SciML).\n","date":null,"permalink":"https://arnostrouwen.com/about/","section":"","summary":"\u003cp\u003eI am a statistical consultant focused on experimental design and other optimal data gathering techniques, particularly for dynamic systems.\u003c/p\u003e\n\u003cp\u003eOften, researchers will perform experiments and only  talk to a statistician afterwards.\nBut statisticians are not only useful for data analysis.\nMuch time and effort can be saved by carefully planning an experiment.\nEvery measurement setup has its own unique details.\nI help other researchers by constructing optimal experiments, custom made for their own measurement setup, goals, and budgetary constraints.\u003c/p\u003e","title":"About me"},{"content":"","date":null,"permalink":"https://arnostrouwen.com/categories/","section":"Categories","summary":"","title":"Categories"},{"content":"","date":null,"permalink":"https://arnostrouwen.com/tags/","section":"Tags","summary":"","title":"Tags"}]