This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Los Alamos National Laboratory, Los Alamos, NM 87544, USA 11email: {rbparker,mjgarcia,rbent}@lanl.gov
22institutetext: Dowson Farms
22email: [email protected]
33institutetext: Texas A&M University, College Station, TX 77843, USA
33email: [email protected]

Formulations and scalability of neural network surrogates in nonlinear optimization problems

Robert B. Parker 11    Oscar Dowson 22    Nicole LoGiudice 33    Manuel Garcia 11    Russell Bent 11
Abstract

We compare full-space, reduced-space, and gray-box formulations for representing trained neural networks in nonlinear optimization problems. We test these formulations on a transient stability-constrained, security-constrained alternating current optimal power flow (SCOPF) problem where the transient stability criteria are represented by a trained neural network surrogate. Optimization problems are implemented in JuMP and trained neural networks are embedded using a new Julia package: MathOptAI.jl. To study the bottlenecks of the three formulations, we use neural networks with up to 590 million trained parameters. The full-space formulation is bottlenecked by the linear solver used by the optimization algorithm, while the reduced-space formulation is bottlenecked by the algebraic modeling environment and derivative computations. The gray-box formulation is the most scalable and is capable of solving with the largest neural networks tested. It is bottlenecked by evaluation of the neural network’s outputs and their derivatives, which may be accelerated with a graphics processing unit (GPU). Leveraging the gray-box formulation and GPU acceleration, we solve our test problem with our largest neural network surrogate in 2.5×\times the time required for a simpler SCOPF problem without the stability constraint.

Keywords:
Surrogate modeling Neural networks Nonlinear optimization

1 Introduction

Nonlinear local optimization is a powerful tool for engineers and operations researchers for its ability to handle accurate physical models, respect explicit constraints, and solve large-scale problems [2]. However, it is often the case that practitioners wish to include components that do not easily fit into the differentiable and algebraic frameworks of nonlinear optimization. Examples include when a mechanistic model is not available [21], is time-consuming to simulate [7], or renders the optimization problem too complicated to solve reliably [5].

A recent trend has been to replace troublesome components by a trained neural network surrogate and then embed the trained neural network into a nonlinear optimization model [16]. Several open-source software packages, e.g., OMLT [7] and gurobi-machinelearning [12], make it easy to embed neural network models in Python-based modeling environments for nonlinear optimization. A trained neural network may be represented in an optimization problem with different formulations, e.g., full-space and reduced-space formulations [7, 20], which have been compared by Kilwein [13] for a security-constrained AC optimal power flow (SCOPF) problem. A third approach, first suggested by Casas [6], is called a gray-box formulation, in which function and derivative evaluations of the surrogate are handled by the neural network modeling software (here, PyTorch [18]), rather than the algebraic modeling environment (here, JuMP [15]).

While full-space, reduced-space, and gray-box formulations have been compared in [6], the bottlenecks of these formulations have not been carefully identified. This paper profiles these three formulations of a neural network predicting transient feasibility in an SCOPF problem. We demonstrate that the gray-box formulation is the most scalable, and that it can naturally take advantage of GPU acceleration built into PyTorch. We find that the full-space formulation is bottlenecked by the solution of a linear system of equations, and the reduced-space formulation is bottlenecked by JuMP and its automatic differentiation system. Our work provides a clear benchmark and direction for future work. Additionally, we provide MathOptAI.jl, a new open-source library for embedding trained machine learning predictors into optimization models built with JuMP [15]. MathOptAI.jl is available at https://github.com/lanl-ansi/MathOptAI.jl under a BSD-3 license.

2 Background

2.1 Nonlinear optimization

We study nonlinear optimization problems in the form given by Equation (1):

minxf(x) s.t. g(x)=0;x0.\min_{x}f(x)\text{ s.t. }g(x)=0;~{}x\geq 0. (1)

We consider interior point methods, such as IPOPT [22], for solving (1). These methods iteratively compute search directions dd by solving the linear system (2):

[(2(x)+α)g(x)Tg(x)0]d=[f(x)+g(x)Tλ+βg(x)],\begin{bmatrix}(\nabla^{2}\mathcal{L}(x)+\alpha)&\nabla g(x)^{T}\\ \nabla g(x)&0\end{bmatrix}d=-\begin{bmatrix}\nabla f(x)+\nabla g(x)^{T}\lambda+\beta\\ g(x)\end{bmatrix}, (2)

where α\alpha and β\beta are additional terms that are not shown for simplicity. The matrix on the left-hand side is referred to as the Karush-Kuhn Tucker, or KKT, matrix. To construct this system, solvers rely on callbacks that provide the Jacobian g\nabla g and, optionally, the Hessian of the Lagrangian function, 2\nabla^{2}\mathcal{L}. These are typically provided by the automatic differentiation system of an algebraic modeling environment. If the Hessian 2\nabla^{2}\mathcal{L} is not available, a limited-memory quasi-Newton approximation is used [17].

2.2 Neural network predictors

A neural network predictor is a function denoted y=NN(x)y={\rm NN}(x). We consider neural networks defined by repeated application of an affine transformation and a nonlinear activation function σ\sigma over LL layers:

yl\displaystyle y_{l} =σl(Wlyl1+bl)\displaystyle=\sigma_{l}(W_{l}y_{l-1}+b_{l}) l{1,,L},\displaystyle l\in\{1,\dots,L\}, (3)

where y0=xy_{0}=x and y=yLy=y_{L}. Weights WlW_{l} and biases blb_{l} are parameters that are optimized to minimize error on a set of training data representing desired inputs and outputs of the neural network. To a nonlinear optimization solver using a trained neural network, WlW_{l} and blb_{l} are constant. To fit assumptions made by these solvers, we consider only smooth activation functions, e.g., sigmoid and hyperbolic tangent functions.

2.3 Algebraic representations of a neural network

In this section, we explain the three ways in which we encode pre-trained neural network predictors into the constraints of a nonlinear optimization model of the form (1). The three approaches are denominated full-space, reduced-space, and gray-box.

2.3.1 Full-space

In the full-space formulation, we add an intermediate vector-valued decision variable zlz_{l} to represent the output of the affine transformation in each layer ll, and we add a vector-valued decision variable yly_{l} to represent the output of each nonlinear activation function. We then add a linear equality constraint to enforce the relationship between yl1y_{l-1} and zlz_{l} and a nonlinear equality constraint to enforce the relationship between zlz_{l} and yly_{l}. Thus, the neural network in (3) is encoded by the constraints:

Wlyl1zl\displaystyle{W_{l}}y_{l-1}-z_{l} =bl\displaystyle=-b_{l} l{1,,L}\displaystyle l\in\{1,\dots,L\} (4)
ylσl(zl)\displaystyle y_{l}-\sigma_{l}(z_{l}) =0\displaystyle=0 l{1,,L}.\displaystyle l\in\{1,\dots,L\}.

The full-space approach prioritizes small expressions and small, sparse nonlinear constraints at the cost of introducing additional variables and constraints for each layer of the neural network. This formulation conforms to the assumptions of JuMP’s reverse-mode automatic differentiation algorithm: 1) nonlinear constraints can be written as a set of scalar-valued functions, and 2) they are sparse in the sense that each scalar constraint contains relatively few variables and has a simple functional form.

2.3.2 Reduced-space

In the reduced-space formulation we add a single vector-valued decision variable yy to represent the output of the final activation function, and we add a single vector-valued nonlinear equality constraint that encodes the complete network. Thus, the neural network in (3) is encoded as the vector-valued constraint:

yσL(WL(σl(Wl(σ1(W1x+b1))+bl))+bL)=0.y-\sigma_{L}({W_{L}}(\ldots\sigma_{l}({W_{l}}(\ldots\sigma_{1}({W}_{1}x+b_{1})\ldots)+b_{l})\ldots)+b_{L})=0. (5)

The benefit of the reduced-space approach is that we add only a single vector-valued decision variable yy and a single vector-valued nonlinear equality constraint (each of dimension of the output size of the last layer). The downside is that the nonlinear constraint is a complicated expression with a very large number of terms. This is made worse by the fact that JuMP scalarizes vector-valued expressions in nonlinear constraints. Thus, instead of efficiently representing the affine relationship W1x+b1{W_{1}}x+b_{1} by storing the matrix and vector, JuMP instead represents the expression as a sum of scalar products.

2.3.3 Gray-box

In the gray-box formulation, we do not attempt to encode the neural network algebraically. Instead we exploit the fact that nonlinear local solvers such as IPOPT require only callback oracles to evaluate the constraint function g(x)g(x) and the Jacobian g(x)\nabla g(x) (and, optionally, the Hessian 2\nabla^{2}\mathcal{L}). Using JuMP’s support for user-defined nonlinear operators, we implement the evaluation of the neural network as a nonlinear operator NN(x){\rm NN}(x), and we use PyTorch’s built-in automatic differentiation support to compute the Jacobian NN(x)\nabla{\rm NN}(x) and Hessians 2NN(x)\nabla^{2}{\rm NN}(x). Thus, the neural network in (3) is encoded as the vector-valued constraint:

yNN(x)=0.y-{\rm NN}(x)=0. (6)

Like the reduced-space formulation, the gray-box approach adds only a small number of variables and constraints to the optimization problem. By contrast, the gray-box approach uses the automatic differentiation system of the neural network modeling software, which is better-suited to the dense, nested, vector-valued expressions that define the neural network. However, as explicit representation of the constraints are not exposed to the solver, the gray-box formulation cannot support relaxations used by nonlinear global optimization solvers.

2.4 MathOptAI.jl

Encoding a trained neural network into an optimization model using the forms described in Section 2.3 is tedious and error-prone. To simplify experimentation, we developed a new Julia package, MathOptAI.jl, which is a JuMP extension for embedding a range of machine learning models into a JuMP model. In addition to supporting neural networks trained using PyTorch, which are the focus of this paper, MathOptAI.jl also supports Julia-based deep-learning libraries, as well as other machine learning models such as decision trees and Gaussian Processes. MathOptAI.jl is provided as an open-source package at https://github.com/lanl-ansi/MathOptAI.jl under a BSD-3 license.

3 Test problem

We compare the three different neural network formulations on a transient stability-constrained, security-constrained ACOPF problem [11].

Stability-constrained optimal power flow  Security-constrained optimal power flow (SCOPF) is a well-established problem for dispatching generators in an electric power network in which feasibility of the network (i.e., the ability to meet demand) is enforced for a set of contingencies [1]. Each contingency kk represents the loss of a set of generators and/or power lines. We consider a variant of this problem where, in addition to enforcing steady-state feasibility, we enforce feasibility of the transient response resulting from the contingency. In particular, we enforce that the transient frequency at each bus is at least η=59.4{\bf\eta}=59.4 Hz for the 30 second interval following each contingency. This problem is given by Equation 7:

minSg,Vc((Sg)) s.t. {Fk(Sg,V,𝐒𝐝)0k{0,,K}Gk(Sg,𝐒𝐝)η𝟙k{1,,K}.\min_{S^{g},V}c(\mathbb{R}(S^{g}))\text{ s.t. }\left\{\begin{array}[]{ll}F_{k}(S^{g},V,{\bf S^{d}})\leq 0&k\in\{0,\dots,K\}\\ G_{k}(S^{g},{\bf S^{d}})\geq{\bf\eta}\mathbbm{1}&k\in\{1,\dots,K\}.\\ \end{array}\right. (7)

Here, SgS^{g} is a vector of complex AC power generations for each generator in the network, VV is a vector of complex bus voltages, cc is a quadratic cost function, and 𝐒𝐝\bf S^{d} is a constant vector of complex power demands. Here Fk0F_{k}\leq 0 represents the set of constraints enforcing feasibility of the power network for contingency kk, where k=0k=0 refers to the base network, and GkG_{k} maps generations and demands to the minimum frequency at each bus over the interval considered.

In this work, we consider an instance of Problem 7 defined on a 37-bus synthetic test network [4, 3]. In this case, GkG_{k} has 117 inputs and 37 outputs. We consider a single contingency that outages generator 5 on bus 23. We choose a small network model with a single contingency because our goal is to test the different neural network formulations, not the SCOPF formulation itself.

Stability surrogate model  Instead of considering the differential equations describing transient behavior of the power network directly in the optimization problem, we approximate GkG_{k} with a neural network trained on data from 110 high-fidelity simulations using PowerWorld [19] with generations and loads uniformly sampled from within a ±20%\pm 20\% interval of each nominal value. We use sequential neural networks with tanh\tanh activation functions with between two and 20 layers and between 50 and 4,000 neurons per layer. These networks have between 7,000 and 592 million trained parameters. The networks are trained to minimize mean squared error using the Adam optimizer [14] until training loss is below 0.01 for 1,000 consecutive epochs. We use a simple training procedure and small amount of data because our goal is to test optimization formulations with embedded neural networks, rather than the neural networks themselves.

4 Results

4.1 Computational setting

We model the SCOPF problem using PowerModels [9], PowerModelsSecurityConstrained [8], and JuMP [15]. Neural networks are modeled using PyTorch [18] and embedded into the optimization problem using MathOptAI.jl. Optimization problems are solved using the IPOPT nonlinear optimization solver [22] with MA27 [10] as the linear solver. The full-space and reduced-space models support evaluation on a CPU but not on a GPU. Because gray-box models use PyTorch, they can be evaluated on a CPU or GPU. We run our experiments on the Venado supercomputer. CPU-only experiments use compute nodes with two 3.4 GHz NVIDIA Grace CPUs and 240 GB of RAM, while CPU+GPU experiments use nodes with a Grace CPU with 120 GB of RAM and an NVIDIA H100 GPU.

4.2 Structural results

Table 1 shows the numbers of variables, constraints, and nonzero entries of the derivative matrices for the optimization problem with different neural networks and formulations. We note that the reduced-space and gray-box formulations have approximately the same numbers of constraints and variables as the original problem, but more nonzero entries in the Jacobian and Hessian matrices due to the dense, nonlinear stability constraints. With these formulations, the structure of the optimization problem does not change as the neural network surrogate adds more interior layers. By contrast, the full-space formulation grows in numbers of variables, constraints, and nonzeros as the neural network gets larger. These problem structures suggest that the full-space formulation will lead to expensive KKT matrix factorizations, while this will not be an issue for reduced-space and gray-box formulations.

Table 1: Numbers of variables, constraints, and nonzeros for different networks and formulations
Parameters Formulation N. Variables N. Constraints Jacobian NNZ Hessian NNZ
No surrogate 1155 1460 5822 1398
7k Full-space 1292 1634 11796 1448
25k Full-space 1592 1934 27996 1598
578k Full-space 4192 4534 567896 2898
7M Full-space 17192 17534 7144896 9398
All networks Reduced-space 1155 1497 8708 4479
All networks Gray-box 1192 1534 8782 4479

4.3 Runtime results

Table 2: Solve times with different neural networks and formulations
Parameters Formulation Hessian Platform Build time Solve time Iterations Time/iter.
No surrogate Exact CPU 45 ms 0.4 s 41 9 ms
7k Full-space Exact CPU 0.1 s 2 s 468 4 ms
25k Full-space Exact CPU 0.3 s 5 s 642 8 ms
578k Full-space Exact CPU 0.3 s 699 s 755 0.9 s
7M Full-space Exact CPU
7k Reduced-space Exact CPU 0.1 s 7 s 49 0.1 s
25k Reduced-space Exact CPU 1 s 1125 s 41 27 s
578k Reduced-space Exact CPU
7M Reduced-space Exact CPU
7k Gray-box Exact CPU 0.1 s 8 s 41 0.2 s
25k Gray-box Exact CPU 0.1 s 9 s 42 0.2 s
578k Gray-box Exact CPU 0.1 s 11 s 42 0.3 s
7M Gray-box Exact CPU 0.1 s 22 s 42 0.5 s
68M Gray-box Exact CPU 0.1 s 140 s 42 3 s
592M Gray-box Exact CPU 0.6 s 748 s 42 18 s
7k Gray-box Exact CPU+GPU 0.1 s 7 s 41 0.2 s
25k Gray-box Exact CPU+GPU 0.1 s 7 s 42 0.2 s
578k Gray-box Exact CPU+GPU 0.1 s 8 s 42 0.2 s
7M Gray-box Exact CPU+GPU 0.1 s 8 s 42 0.2 s
68M Gray-box Exact CPU+GPU 0.2 s 9 s 42 0.2 s
592M Gray-box Exact CPU+GPU 0.7 s 15 s 42 0.3 s
7k Gray-box Approx. CPU 0.1 s 0.3 s 61 6 ms
25k Gray-box Approx. CPU 48 ms 0.3 s 57 6 ms
578k Gray-box Approx. CPU 0.1 s 1 s 66 15 ms
7M Gray-box Approx. CPU 0.1 s 6 s 57 0.1 s
68M Gray-box Approx. CPU 0.1 s 7 s 56 0.1 s
592M Gray-box Approx. CPU 0.9 s 17 s 56 0.3 s
7k Gray-box Approx. CPU+GPU 50 ms 0.5 s 63 7 ms
25k Gray-box Approx. CPU+GPU 48 ms 0.4 s 58 7 ms
578k Gray-box Approx. CPU+GPU 0.1 s 0.5 s 62 8 ms
7M Gray-box Approx. CPU+GPU 0.1 s 0.5 s 57 9 ms
68M Gray-box Approx. CPU+GPU 0.2 s 1 s 56 21 ms
592M Gray-box Approx. CPU+GPU 0.7 s 1 s 56 23 ms
Fails with a segfault, possibly due to memory requirements of MA27
Exceeds resource manager’s memory limits
Exceeds ten-hour time limit

Runtimes for the different formulations with neural network surrogates of increasing size are given in Table 2. For gray-box formulations, we compare optimization solves using exact and approximate Hessian evaluations and different hardware platforms. The results immediately show that full-space and reduced-space formulations are not scalable to neural networks with more than a few million trained parameters. The full-space formulation fails with a segmentation fault—likely due to memory issues in MA27—while the reduced-space formulation exceeds time and memory limits building the constraint expressions in JuMP. A breakdown of solve times, given in Table 3, confirms the bottlenecks in these formulations. The full-space formulation spends almost all of its solve time in the IPOPT algorithm, which we assume is dominated by KKT matrix factorization, while the reduced-space formulation spends most of its solve time evaluating the Hessian.

By contrast, the gray-box formulation is capable of solving the optimization problem with the largest neural network surrogates tested. While a CPU-only solve with exact Hessian matrices takes an unacceptably-long 748 s, a GPU-accelerated solve with approximate Hessian matrices solves in only one second. This is slower than the original SCOPF problem (with no stability constraint) by a factor of 2.5, which may be acceptable for some applications. In all cases, the solve time with the gray-box formulation is dominated by function and Hessian evaluations, which explains the large speed-ups obtained with the GPU (50×\times and 17×\times for the 592M-parameter network with exact and approximate Hessians).

Approximating the Hessian matrix also speeds up the solve significantly. Hessian approximation is not a common approach when exact Hessians are available because it can lead to slow and unreliable convergence. In the 592M-parameter case, approximating the Hessian increases the iteration count by 14, but it makes up for this by decreasing the time per iteration by a factor of 60. These results suggest that this is an appropriate trade-off for optimization problems constrained by large neural networks.

Table 3: Solve time breakdowns for selected neural networks and formulations
Formulation Parameters Hessian Platform Solve time Percent of solve time (%)
Function Jacobian Hessian Solver Other
Full-space 578k Exact CPU 699 s 0.1 <<0.1 0.2 99+ <<0.1
Reduced-space 25k Exact CPU 1125 s 2 0.5 97 0.4 0.3
Gray-box 592M Exact CPU 748 s 97 2 1 <<0.1 <<0.1
Gray-box 592M Exact CPU+GPU 15 s 54 1 42 2 0.6
Gray-box 592M Approx. CPU 17 s 96 2 2 <<0.1
Gray-box 592M Approx. CPU+GPU 1 s 76 6 17 0.1

5 Conclusion

This work demonstrates that nonlinear local optimization problems may incorporate neural networks with hundreds of millions of trained parameters, with modest overhead, using a gray-box formulation that exploits efficient automatic differentiation, Hessian approximation, and and GPU acceleration. A disadvantage of the gray-box formulation is that it is not suitable for global optimization as the non-convex neural network constraints cannot be relaxed. Additionally, relative performance of the formulations may change in different applications. This motivates future research and development to improve the performance of the full-space and reduced-space formulations. The full-space formulation may be improved by decomposing the KKT matrix to exploit the structure of the neural network’s Jacobian, while the reduced-space formulation may be improved by exploiting vector-valued functions and common subexpressions in JuMP.

References

  • [1] Aravena, I., Molzahn, D.K., Zhang, S., Petra, C.G., Curtis, F.E., Tu, S., Wächter, A., Wei, E., Wong, E., Gholami, A., Sun, K., Sun, X.A., Elbert, S.T., Holzer, J.T., Veeramany, A.: Recent developments in security-constrained AC optimal power flow: Overview of challenge 1 in the ARPA-E grid optimization competition. Operations Research 71(6), 1997–2014 (2023). https://doi.org/10.1287/opre.2022.0315
  • [2] Biegler, L.T.: Nonlinear Programming: Concepts, Algorithms, and Applications to Chemical Processes. Society for Industrial and Applied Mathematics, USA (2010)
  • [3] Birchfield, A.: Hawaii synthetic grid – 37 buses (2023), https://electricgrids.engr.tamu.edu/hawaii40/, accessed 2024-12-10
  • [4] Birchfield, A.B., Xu, T., Gegner, K.M., Shetye, K.S., Overbye, T.J.: Grid structural characteristics as validation criteria for synthetic networks. IEEE Transactions on Power Systems 32(4), 3258–3265 (2017). https://doi.org/10.1109/TPWRS.2016.2616385
  • [5] Bugosen, S.I., Laird, C.D., Parker, R.B.: Process flowsheet optimization with surrogate and implicit formulations of a Gibbs reactor. Systems and Control Transactions 3, 113–120 (2024). https://doi.org/https://doi.org/10.69997/sct.148498
  • [6] Casas, C.A.E.: Robust NMPC of Large-Scale Systems and Surrogate Embedding Strategies for NMPC. Masters thesis, University of Waterloo, Waterloo, Ontario, Canada (2024)
  • [7] Ceccon, F., Jalving, J., Haddad, J., Thebelt, A., Tsay, C., Laird, C.D., Misener, R.: OMLT: Optimization & machine learning toolkit. Journal of Machine Learning Research 23(349),  1–8 (2022), http://jmlr.org/papers/v23/22-0277.html
  • [8] Coffrin, C.: PowerModelsSecurityConstrained.jl (2022), https://github.com/lanl-ansi/PowerModelsSecurityConstrained.jl, accessed 2024-12-10
  • [9] Coffrin, C., Bent, R., Sundar, K., Ng, Y., Lubin, M.: Powermodels.jl: An open-source framework for exploring power flow formulations. In: 2018 Power Systems Computation Conference (PSCC). pp. 1–8 (June 2018). https://doi.org/10.23919/PSCC.2018.8442948
  • [10] Duff, I.S., Reid, J.K.: The multifrontal solution of indefinite sparse symmetric linear. ACM Transactions on Mathematical Software (TOMS) 9(3), 302–325 (1983)
  • [11] Gan, D., Thomas, R., Zimmerman, R.: Stability-constrained optimal power flow. IEEE Transactions on Power Systems 15(2), 535–540 (2000). https://doi.org/10.1109/59.867137
  • [12] Gurobi Optimization: Gurobi Machine Learning Manual (December 2024), https://gurobi-machinelearning.readthedocs.io/en/stable/
  • [13] Kilwein, Z., Jalving, J., Eydenberg, M., Blakely, L., Skolfield, K., Laird, C., Boukouvala, F.: Optimization with neural network feasibility surrogates: Formulations and application to security-constrained optimal power flow. Energies 16(16) (2023). https://doi.org/10.3390/en16165913, https://www.mdpi.com/1996-1073/16/16/5913
  • [14] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2017), https://arxiv.org/abs/1412.6980
  • [15] Lubin, M., Dowson, O., Garcia, J.D., Huchette, J., Legat, B., Vielma, J.P.: JuMP 1.0: Recent improvements to a modeling language for mathematical optimization. Mathematical Programming Computation 15(3), 581–589 (2023). https://doi.org/10.1007/s12532-023-00239-3, https://doi.org/10.1007/s12532-023-00239-3
  • [16] López-Flores, F.J., Ramírez-Márquez, C., Ponce-Ortega, J.M.: Process systems engineering tools for optimization of trained machine learning models: Comparative and perspective. Industrial & Engineering Chemistry Research 63(32), 13966–13979 (2024). https://doi.org/10.1021/acs.iecr.4c00632
  • [17] Nocedal, J.: Updating quasi-Newton matrices with limited storage. Mathematics of Computation 35(151), 773–782 (1980). https://doi.org/https://doi.org/10.2307/2006193, http://www.jstor.org/stable/2006193
  • [18] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019)
  • [19] PowerWorld Corporation, Champaign, IL, USA: PowerWorld Simulator Manual, https://www.powerworld.com/WebHelp/ Accessed December 10, 2024
  • [20] Schweidtmann, A.M., Mitsos, A.: Deterministic global optimization with artificial neural networks embedded. Journal of Optimization Theory and Applications 180(3), 925–948 (2019)
  • [21] Thebelt, A., Wiebe, J., Kronqvist, J., Tsay, C., Misener, R.: Maximizing information from chemical engineering data sets: Applications to machine learning. Chemical Engineering Science 252, 117469 (2022). https://doi.org/https://doi.org/10.1016/j.ces.2022.117469
  • [22] Wächter, A., Biegler, L.T.: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical programming 106(1), 25–57 (2006)