This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Deep-learning-based Early Fixing for Gas-lifted Oil Production Optimization: Supervised and Weakly-supervised Approaches

Bruno M. Pacheco    Laio O. Seman    Eduardo Camponogara Department of Automation and Systems Engineering, Federal University of Santa Catarina (UFSC), Florianópolis, Brazil (e-mail: bruno.m.pacheco@grad.ufsc.br, laio@ieee.org, eduardo.camponogara@ufsc.br).
Abstract

Maximizing oil production from gas-lifted oil wells entails solving Mixed-Integer Linear Programs (MILPs). As the parameters of the wells, such as the basic-sediment-to-water ratio and the gas-oil ratio, are updated, the problems must be repeatedly solved. Instead of relying on costly exact methods or the accuracy of general approximate methods, in this paper, we propose a tailor-made heuristic solution based on deep learning models trained to provide values to all integer variables given varying well parameters, early-fixing the integer variables and, thus, reducing the original problem to a linear program (LP). We propose two approaches for developing the learning-based heuristic: a supervised learning approach, which requires the optimal integer values for several instances of the original problem in the training set, and a weakly-supervised learning approach, which requires only solutions for the early-fixed linear problems with random assignments for the integer variables. Our results show a runtime reduction of 71.11% Furthermore, the weakly-supervised learning model provided significant values for early fixing, despite never seeing the optimal values during training.

keywords:
Mixed-integer optimization, Deep learning, Weakly-supervised learning, Early fixing, Oil production systems
thanks: This research was partly funded by Petróleo Brasileiro S.A. under grant 2018/00217-3, FAPESC under grant 2021TR2265, CNPq and CAPES.

1 Introduction

The maximization of oil production in an offshore platform is a challenging problem due to the physical models’ complexity and various operational constraints (Luguesi et al., 2023; Müller et al., 2022). Furthermore, the variations in the oil wells and the multitude of technologies that can be employed account for the many configurations this problem presents. Therefore, it becomes necessary to rely on optimization models to reach optimal conditions for the operation.

Due to the nonlinearities of the oil output stream from gas-lifted wells, the formulation of the optimization model is not straightforward. The approach of Müller et al. (2022) uses a piecewise linear model of the wells and formulates the problem as a Mixed-Integer Linear Program (MILP). More specifically, the relationship between the liquid production of the wells and the lift-gas injection, together with the wellhead pressure, is modeled as a piecewise linear function. The piecewise-linear functions are defined by studying the behavior of the relations through simulation runs or field tests and defining breakpoints between which the relation is considered to be linear. Special Ordered Set of type 2 (SOS2) constraints are then used to define the operational region for each constraint, which defines the active boundaries for a region in the state space in which the function approximation is linear. The final model is linear with binary variables and SOS2 variable sets.

Solving MILPs exactly requires algorithms such as branch-and-bound and branch-and-cut (Lee and Mitchell, 2009). These algorithms usually require long runtimes due to the many iterations of solving linear relaxations of the original problem. One can use approximate methods instead of relying solely on exact algorithms to optimize oil production. Such methods can improve the efficiency of the optimization process but may provide suboptimal solutions. Derivative-free methods like genetic algorithms, simulated annealing, and particle swarm optimization can also be used to find near-optimal solutions efficiently (Seman et al., 2020). Another approximate method is to early fix the variables, reducing the dimension of the problem. In the case of MILPs, one can develop a heuristic to fix all integer variables, reducing the problem to a linear program, which can be solved very efficiently with algorithms such as the simplex. As pointed out by Bengio et al. (2021), such heuristics can be very hard to handcraft, which makes machine learning (ML) models natural candidates for the task.

In this paper, we propose two deep-learning approaches to reduce the runtime of the gas-lifted oil production MILP through early fixing. One is a supervised learning approach that requires that the optimal set of binary variables is known for each problem instance; instances of the problem must be solved exactly to build the training data. With this data, the model is then trained to provide the optimal binary variables given the parameters of the MILP. The other is a weakly-supervised approach that requires just the solutions to the early-fixed problem, i.e., after fixing the integer variables to any (binary) value. In other words, the training data is generated from solving linear problems, resulting from randomly fixing the integer variables in the original MILP. To the best of our knowledge, this is the first work to successfully implement learning-based heuristics to speed up the solution of oil production optimization.

1.1 Related work

In this section, we provide a brief overview of related work in approximate methods for MILP, focusing on deep-learning-based methods.

The use of heuristics in MILP solvers is common. For instance, the SCIP solver (Vigerske and Gleixner, 2018) uses primal heuristics to find feasible solutions. However, recent studies have highlighted the development of heuristics based on ML techniques. Particularly, Bengio et al. (2021) have suggested that the potential benefits of using ML include reduced computational time and improved solution quality.

Ding et al. (2019) presented a learning-based heuristic to accelerate the solution-finding process. The authors propose a Graph Neural Network (GNN) model that predicts solution values for branching. The developed heuristic is used to guide a branch-and-bound tree search.

Li and Wu (2022) train a learning-based heuristic for early fixing MILPs within an Alternating Direction Method of Multipliers (ADMM) algorithm. By formulating the early fixing as a Markov decision process, the authors use reinforcement learning to train the heuristic. The authors showed that the proposed heuristic significantly improves the solving speed and can even improve the solution quality.

Pacheco et al. (2023) explore using GNNs as early-fixing heuristics for the Offline Nanosatellite Task Scheduling (ONTS) problem. In this direction, the authors implement a GNN-based solution that uses bipartite graphs, feature aggregation, and message-passing techniques. The results suggest that GNNs can be an effective method for aiding in the decision-making process of MILPs.

Finally, Anderson et al. (2022) proposed a weakly-supervised approach for warm-starting gas network problems. The authors present a model that generates feasible initial solutions which are used to accelerate the MILP solver’s convergence. The results show a 60.5% decrease in the runtime.

In summary, the literature offers promising approaches for accelerating MILP solvers using ML techniques, particularly through early fixing. However, to the best of our knowledge, no one has applied such techniques to oil production optimization. In this context, this paper is the first to explore the use of supervised and weakly-supervised learning approaches to the oil production maximization problem, leveraging surrogate models for the liquid production and offering insights into this growing area that hybridizes optimization and deep learning.

2 Problem Statement

In this section, we present the problem formulation, which is based on Müller et al. (2022), with only gas–lifted oil wells connected to manifolds. Nevertheless, it is easy to see that our methodological approach can be applied to variations of this problem, e.g., platforms with satellite wells, wells with electrical submersible pump systems, and subsea manifolds.

2.1 Well model

A single production platform can extract from multiple oil wells. Each well n𝒩n\in\mathcal{N} has its liquid production qlnq_{\rm l}^{n} induced by the wellhead pressure whpn{\rm whp}^{n} and a lift-gas injection rate qglnq_{\rm gl}^{n}. The relationship between qlnq_{\rm l}^{n}, whpn{\rm whp}^{n}, and qglnq_{\rm gl}^{n} is modeled based on the natural characteristics of the well and the gas-oil ratio (GOR, referred to as gorn{\rm gor}^{n}) and basic-sediment-to-water ratio (BSW, referred to as bswn{\rm bsw}^{n}) of its liquid production. Both bswn{\rm bsw}^{n} and gorn{\rm gor}^{n} are measured through separation tests and are considered static during the optimization. However, as they change with time, their updates drive new executions of the optimization process to keep the results reliable.

2.2 Piecewise linearization

We use the Marlim simulator (Seman et al., 2020), a proprietary software from Petrobras, to model the liquid output of each well,

qln=Marlim(qgln,whpn,bswn,gorn).q_{\rm l}^{n}=\textrm{Marlim}(q_{\rm gl}^{n},{\rm whp}^{n},{\rm bsw}^{n},{\rm gor}^{n}). (1)

An example of the liquid flow function of a real well can be seen in Figure 1 with fixed values for bswn{\rm bsw}^{n} and gorn{\rm gor}^{n}. As both qglnq_{\rm gl}^{n} and whpn{\rm whp}^{n} can be controlled, but have a nonlinear relationship with the outcome, we apply piecewise linearization to qlnq_{\rm l}^{n} as a function of both. More precisely, let 𝒦gl={1,,kgl}\mathcal{K}_{\rm gl}=\{1,\ldots,k_{\rm gl}\} and 𝒥whp={1,,jwhp}\mathcal{J}_{\rm whp}=\{1,\ldots,j_{\rm whp}\} be sets of indices for lift-gas injection and wellhead pressure. Let also 𝒬gln={qgln,k+:k𝒦gl}\mathcal{Q}_{\rm gl}^{n}=\{q_{\rm gl}^{n,k}\in\mathbb{R}_{+}:k\in\mathcal{K}_{\rm gl}\} and 𝒲whpn={whpn,j+:j𝒥whp}\mathcal{W}^{n}_{\rm whp}=\{{\rm whp}^{n,j}\in\mathbb{R}_{+}:j\in\mathcal{J}_{\rm whp}\} be the respective breakpoint values for well nn, and

𝒬liqn={qln,k,j:qln,k,j=q^ln(qgln,k,whpn,j,bswn,gorn),k𝒦gl,j𝒥whp}\mathcal{Q}_{\rm liq}^{n}=\left\{q_{\rm l}^{n,k,j}:q_{\rm l}^{n,k,j}=\hat{q}_{\rm l}^{n}(q_{\rm gl}^{n,k},{\rm whp}^{n,j},{\rm bsw}^{n},{\rm gor}^{n}),\right.\\ \left.{k\in\mathcal{K}_{\rm gl},j\in\mathcal{J}_{\rm whp}}\right\} (2)

be the liquid flow rates for the well at the breakpoints. 𝒬liq\mathcal{Q}_{\rm liq} is obtained by using Marlim as in Eq. (1) with the adjusted parameters gorn{\rm gor}^{n} and bswn{\rm bsw}^{n}.

The piecewise approximation is then given by

qgln\displaystyle q_{\rm gl}^{n}~{} =k𝒦glj𝒥whpθk,jnqgln,k\displaystyle=~{}\sum_{k\in\mathcal{K}_{\rm gl}}\sum_{j\in\mathcal{J}_{\rm whp}}\theta^{n}_{k,j}q_{\rm gl}^{n,k} (3a)
whpn\displaystyle{\rm whp}^{n}~{} =k𝒦glj𝒥whpθk,jnwhpn,j\displaystyle=~{}\sum_{k\in\mathcal{K}_{\rm gl}}\sum_{j\in\mathcal{J}_{\rm whp}}\theta^{n}_{k,j}{\rm whp}^{n,j} (3b)
qln\displaystyle q_{\rm l}^{n}~{} =k𝒦glj𝒥whpθk,jnqln,k,j\displaystyle=~{}\sum_{k\in\mathcal{K}_{\rm gl}}\sum_{j\in\mathcal{J}_{\rm whp}}\theta^{n}_{k,j}q_{\rm l}^{n,k,j} (3c)
1\displaystyle 1~{} =k𝒦glj𝒥whpθk,jn\displaystyle=~{}\sum_{k\in\mathcal{K}_{\rm gl}}\sum_{j\in\mathcal{J}_{\rm whp}}\theta^{n}_{k,j} (3d)
θk,jn\displaystyle\theta^{n}_{k,j}~{} 0,k𝒦gl,j𝒥whp.\displaystyle\geq 0,~{}k\in\mathcal{K}_{\rm gl},\,j\in\mathcal{J}_{\rm whp}. (3e)

To ensure piecewise linearization, SOS2 constraints are necessary for the values of θ\theta that correspond to qglq_{\rm gl} and whp{\rm whp}, as follows

ηkn\displaystyle\eta_{k}^{n}~{} =j𝒥whpθk,jn,k𝒦gl\displaystyle=~{}\sum_{j\in\mathcal{J}_{\rm whp}}\theta^{n}_{k,j},\,k\in\mathcal{K}_{\rm gl} (4a)
ηjn\displaystyle\eta_{j}^{n}~{} =k𝒦glθk,jn,j𝒥whp\displaystyle=~{}\sum_{k\in\mathcal{K}_{\rm gl}}\theta^{n}_{k,j},\,j\in\mathcal{J}_{\rm whp} (4b)
{ηkn}k𝒦glis SOS2\displaystyle\left\{\eta_{k}^{n}\right\}_{k\in\mathcal{K}_{\rm gl}}~{}~{}\text{is SOS2} (4c)
{ηjn}j𝒥whpis SOS2.\displaystyle\left\{\eta_{j}^{n}\right\}_{j\in\mathcal{J}_{\rm whp}}\text{is SOS2}. (4d)

SOS2 constraints imply that no more than two consecutive elements of the ordered set of variables are nonzero (Beale and Tomlin, 1969).

2.3 Problem formulation

All oil wells are connected to a hub that directs their liquid productions qlnq_{\rm l}^{n} to separators. The separators then split the liquid flow into the oil, gas, and water phase flows. This separation depends on the BSW and GOR of each well n𝒩n\in\mathcal{N},

qoiln\displaystyle q_{\rm oil}^{n} =qln(1bswn)\displaystyle=q_{\rm l}^{n}\cdot(1-{\rm bsw}^{n}) (5a)
qwatern\displaystyle q_{\rm water}^{n} =qlnbswn\displaystyle=q_{\rm l}^{n}\cdot{\rm bsw}^{n} (5b)
qgasn\displaystyle q_{\rm gas}^{n} =qln(1bswn)gorn.\displaystyle=q_{\rm l}^{n}\cdot(1-{\rm bsw}^{n})\cdot{\rm gor}^{n}. (5c)

The total gas flow is limited by the maximum lift-gas flow available q¯gl\bar{q}_{\rm gl} though

n𝒩qglnq¯gl.\sum_{n\in\mathcal{N}}q_{\rm gl}^{n}\leq\bar{q}_{\rm gl}. (6)

Finally, the objective is to maximize the total volume of oil extracted

n𝒩qoiln.\sum_{n\in\mathcal{N}}q_{\rm oil}^{n}. (7)

We can express the problem as

max𝒒gl,𝐰𝐡𝐩(7)s.t.(3)(6),\begin{split}\max_{\boldsymbol{q}_{\rm gl},\boldsymbol{{\rm whp}}}~{}&\eqref{eq:objective}\\ \textrm{s.t.}~{}~{}&\eqref{eq:pwl-approximation}\textrm{--}\eqref{eq:q-gl-max},\end{split} (8)

where 𝒒gl,𝐰𝐡𝐩|𝒩|\boldsymbol{q}_{\rm gl},\boldsymbol{{\rm whp}}\in\mathbb{R}^{|\mathcal{N}|} are the flow of lift-gas injected and the well-head pressure of each well.

Note that the problem, in this case using piecewise linearization of the liquid flow, is an MILP parameterized by bswn{\rm bsw}^{n}, gorn{\rm gor}^{n}, q¯gl\bar{q}_{\rm gl}, and QliqnQ_{\rm liq}^{n}. Therefore, let πΠ\pi\in\Pi be the vector of problem parameters. We can write the problem as

P(π)=maxx,zcTxs.t.A(π)x+C(π)zb(π)xXnx,zZ{0,1}nz\begin{split}P(\pi)=\max_{x,z}~{}&c^{T}x\\ \text{s.t.}~{}&A(\pi)x+C(\pi)z\leq b(\pi)\\ &x\in X\subset\mathbb{R}^{n_{x}},z\in Z\subset\{0,1\}^{n_{z}}\end{split} (9)

where xx is the vector of continuous variables (e.g., qlnq_{\rm l}^{n}, qglnq_{\rm gl}^{n}, θk,jn\theta^{n}_{k,j}) and zz is the vector of binary variables necessary for the SOS2 constraints. More precisely, z=(𝒛gl,𝒛whp)z=(\boldsymbol{z}_{\rm gl},\boldsymbol{z}_{{\rm whp}}) is such that, for all n𝒩n\in\mathcal{N},

ηkn\displaystyle\eta^{n}_{k} zgln,k,k=1\displaystyle\leq z_{\rm gl}^{n,k},k=1 (10a)
ηjn\displaystyle\eta^{n}_{j} zwhpn,j,j=1\displaystyle\leq z_{{\rm whp}}^{n,j},j=1 (10b)
ηkn\displaystyle\eta^{n}_{k} max(zgln,k,zgln,k1),k𝒦gl{1,kgl}\displaystyle\leq\max(z_{\rm gl}^{n,k},z_{\rm gl}^{n,k-1}),k\in\mathcal{K}_{\rm gl}\setminus\{1,k_{\rm gl}\} (10c)
ηjn\displaystyle\eta^{n}_{j} max(zwhpn,j,zwhpn,j1),j𝒥whp{1,jwhp}\displaystyle\leq\max(z_{{\rm whp}}^{n,j},z_{{\rm whp}}^{n,j-1}),j\in\mathcal{J}_{{\rm whp}}\setminus\{1,j_{{\rm whp}}\} (10d)
ηkn\displaystyle\eta^{n}_{k} zgln,k1,k=kgl\displaystyle\leq z_{\rm gl}^{n,k-1},k=k_{\rm gl} (10e)
ηjn\displaystyle\eta^{n}_{j} zwhpn,j1,j=jwhp\displaystyle\leq z_{{\rm whp}}^{n,j-1},j=j_{{\rm whp}} (10f)
k𝒦gl{kgl}zgln,k=1,j𝒥whp{jwhp}zwhpn,j=1\displaystyle\sum_{k\in\mathcal{K}_{\rm gl}\setminus\{k_{\rm gl}\}}z_{\rm gl}^{n,k}=1,\sum_{j\in\mathcal{J}_{{\rm whp}}\setminus\{j_{{\rm whp}}\}}z_{{\rm whp}}^{n,j}=1 (10g)
zgln{0,1}kgl1\displaystyle z_{\rm gl}^{n}\in\{0,1\}^{k_{\rm gl}-1} (10h)
zwhpn{0,1}jwhp1\displaystyle z_{{\rm whp}}^{n}\in\{0,1\}^{j_{{\rm whp}}-1} (10i)

in which zgln,k=1z_{\rm gl}^{n,k}=1 if the kk-th interval is selected, meaning that qgln[qgln,k,qgln,k+1]q_{\rm gl}^{n}\in[q_{\rm gl}^{n,k},q_{\rm gl}^{n,k+1}] and only ηnk\eta^{k}_{n} and ηnk+1\eta^{k+1}_{n} can be nonzero, otherwise zgln,kz_{\rm gl}^{n,k} assumes value 0. The semantics of the binary variables zwhpn,jz_{\rm whp}^{n,j} are analogous. Notice that, because zgln,kz_{\rm gl}^{n,k} is binary, the operator max(zgln,k,zgln,k1)\max(z_{\rm gl}^{n,k},z_{\rm gl}^{n,k-1}) in (10d) can be equivalently represented by the linear form (zgln,k+zgln,k1)(z_{\rm gl}^{n,k}+z_{\rm gl}^{n,k-1}). The formulation (10) is equivalent to the SOS2 constraints in (4c) and (4d).

3 Methodology

3.1 Early fixing

Suppose we can determine which linearization region of qlnq_{\rm l}^{n} is to be selected (i.e., which variables ηkn\eta_{k}^{n} and ηjn\eta_{j}^{n} can have nonzero values). In that case, the SOS2 constraints can be removed; thus, the problem becomes completely linear. This is equivalent to fixing the zz variables in the standard formulation (9). Therefore, let us write P(π,z)P(\pi,z) as the problem (9) but with fixed zz values, i.e., with integer variables zz treated as parameters.

An early fixing heuristic provides an assignment z^\hat{z} to the integer variables. Ideally, the assignment will be such that P(π,z^)=P(π)P(\pi,\hat{z})=P(\pi). Since the early-fixed problem is an LP, it can be solved much faster than the original MILP problem. Therefore, the total cost of solving the early-fixed problem is the cost of solving the LP and the cost of running the heuristic. In practice, however, a trade-off between the cost of running the heuristic and the gap between P(π,z^)P(\pi,\hat{z}) and P(π)P(\pi) is expected.

Our proposed approach is to develop an early fixing heuristic based on a deep-learning model. We want a deep learning model N:Π[0,1]nzN:\Pi\to[0,1]^{n_{z}} with which we can compute z^=N(π)\hat{z}=\lfloor N(\pi)\rceil. Two distinct approaches to training such a model are proposed.

3.2 Supervised learning approach

We can train a model for early fixing by feeding it with instances of the MILP problem and optimizing the model’s parameters such that its output approximates the optimal binary assignment. Let us define a dataset

Dsup={(π,z)Π×Z:P(π,z)=P(π)},D_{\rm sup}=\left\{(\pi,z^{\star})\in\Pi\times Z:P(\pi,z^{\star})=P(\pi)\right\},

which associates instances’ parameter vectors π\pi with the optimal binary assignment zz^{\star}. Note that this dataset requires us to solve to optimality all MILP instances available.

Let us define a deep learning model

Nsup:Π×Θsup[0,1]nzπ;θsupNsup(π;θsup)\begin{split}N_{\rm sup}:\Pi\times\Theta_{\rm sup}&\to[0,1]^{n_{z}}\\ \pi;\theta_{\rm sup}&\mapsto N_{\rm sup}(\pi;\theta_{\rm sup})\end{split} (11)

for which θsup\theta_{\rm sup} is the vector of model parameters that can be trained. Then, it is possible to optimize the model’s parameters such that for each vector of parameters in DsupD_{\rm sup}, the model’s output approximates zz^{\star}. Namely,

minθsup(π,z)Dsupsup(Nsup(π;θsup),z)\min_{\theta_{\rm sup}}\sum_{(\pi,z^{\star})\in D_{\rm sup}}\mathcal{L}_{\rm sup}(N_{\rm sup}(\pi;\theta_{\rm sup}),z^{\star}) (12)

where sup\mathcal{L}_{\rm sup} can be, for example, the binary cross entropy between the elements of both vectors.

3.3 Weakly-supervised learning approach

We propose an alternative learning approach that does not require solving MILP problems. First, we recall that the target of the deep learning model is to provide z^\hat{z} such that P(π,z^)P(\pi,\hat{z}) is maximized111Since (9) is a maximization problem.. Our proposed approach is to train a surrogate model that approximates P(,)P(\cdot,\cdot), and differentiate through this surrogate to train the early fixing heuristic using gradient descent methods.

This approach requires only a dataset of assignments for the integer variables paired with the objective of the respective early-fixed problem. Let us define

Dweak={(π,z^,p)Π×Z×:p=P(π,z^)},D_{\rm weak}=\left\{(\pi,\hat{z},p)\in\Pi\times Z\times\mathbb{R}:p=P(\pi,\hat{z})\right\},

which is built with (random) samples of z^Z\hat{z}\in Z. Then, we train a model

S:Π×[0,1]nz×ΘSπ,z^;θSS(π,z^;θS)\begin{split}S:\Pi\times[0,1]^{n_{z}}\times\Theta_{S}&\to\mathbb{R}\\ \pi,\hat{z};\theta_{S}&\mapsto S(\pi,\hat{z};\theta_{S})\end{split} (13)

in a supervised manner, such that its output approximates P(π,z^)P(\pi,\hat{z}), that is,

minθS(π,z^,p)DweakS(S(π,z^;θS),p),\min_{\theta_{S}}\sum_{(\pi,\hat{z},p)\in D_{\rm weak}}\mathcal{L}_{S}(S(\pi,\hat{z};\theta_{S}),p), (14)

where S\mathcal{L}_{S} can be, for example, the mean squared error.

Now, let us define

Nweak:Π×Θweak[0,1]nzπ;θweakNweak(π;θweak),\begin{split}N_{\rm weak}:\Pi\times\Theta_{\rm weak}&\to[0,1]^{n_{z}}\\ \pi;\theta_{\rm weak}&\mapsto N_{\rm weak}(\pi;\theta_{\rm weak}),\end{split} (15)

which can be trained in an unsupervised manner to maximize the surrogate model’s output given the candidate values for fixing

maxθweakπRΠS(π,Nweak(π;θweak);θS),\max_{\theta_{\rm weak}}\sum_{\pi\in_{R}\Pi}S(\pi,N_{\rm weak}(\pi;\theta_{\rm weak});\theta_{S}), (16)

where πRΠ\pi\in_{R}\Pi denotes that π\pi is chosen randomly from Π\Pi, and the summation is iterated over as many random samples as desired. Note that only θweak\theta_{\rm weak} is optimized during the training of NweakN_{\rm weak}, i.e., SS is unchanged.

4 Experiments and Results222Code available in github/brunompacheco/early-fixing-oil-production.

4.1 Data

For our experiments, we consider the problem of optimizing oil production from a single well (|𝒩|=1|\mathcal{N}|=1). We use data from a real subsea oil well, provided by Petrobras444The data is not made available as it is an intellectual property of Petrobras. . We set a target for the early fixing models to generalize to different values of bsw{\rm bsw}, gor{\rm gor}, and q¯gl\bar{q}_{\rm gl}. Therefore, we describe the parameter space of our problem as

π=(bsw,gor,q¯gl)Π=[0.5,1.0]×[0,300]×[4000,12500]\pi=({\rm bsw},{\rm gor},\bar{q}_{\rm gl})\in\Pi=[0.5,1.0]\times[0,300]\times[4000,12500]

assuming that the BSW can be no lower than 0.5, the GOR is always smaller than 300, and the maximum lift-gas flow is always larger than 10510^{5}. More precisely, we assume that, in practice, bswU(0.5,1.0){\rm bsw}\sim U(0.5,1.0), gorN(100,25){\rm gor}\sim N(100,25), and q¯glU(4000,12500)\bar{q}_{\rm gl}\sim U(4000,12500). Note that we omit the liquid flow function from the parameter vector, once we deal with a single well and, thus, 𝒬liq\mathcal{Q}_{\rm liq} can be uniquely determined by the other parameters.

The liquid flow function is always linearized with the same breakpoints for both qglq_{\rm gl} and whp{\rm whp}. This makes the domain of the binary variables consistent across instances. The selected breakpoints are

whpj\displaystyle{\rm whp}^{j} =14k4,\displaystyle=14k-4,\quad k=1,,6\displaystyle k=1,\dots,6 (17a)
qglk\displaystyle q_{\rm gl}^{k} =2500j2500,\displaystyle=2500j-2500,\quad j=1,,6.\displaystyle j=1,\dots,6. (17b)

An example of 𝒬liq\mathcal{Q}_{\rm liq} with these breakpoints can be seen in Figure 1. With the fixed breakpoints, we can describe the domain of the binary variables as

Z={(zgl,zwhp){0,1}5×{0,1}5:i=15zgl=1,i=15zwhp=1},\begin{split}Z=\left\{(z_{\rm gl},z_{{\rm whp}})\in\{0,1\}^{5}\times\{0,1\}^{5}:\sum_{i=1}^{5}z_{\rm gl}=1,\right.\\ \left.\sum_{i=1}^{5}z_{{\rm whp}}=1\right\},\end{split} (18)

in which zglz_{\rm gl} indicates the pair of ηk\eta_{k} variables that can take nonzero values, while zwhpz_{{\rm whp}} indicates the pair of ηj\eta_{j} variables that can take nonzero values.

Refer to caption
Figure 1: Simulated liquid flow 𝒬liq\mathcal{Q}_{\rm liq} (see Eq. (2)) based on data of a real well using Marlin. The color scale is directly proportional to the z-axis. We adopted qln,k,j=1q_{\rm l}^{n,k,j}=-1 for combinations of qgln,kq_{\rm gl}^{n,k} and whpn,j{\rm whp}^{n,j} that were considered infeasible by the simulator.

For the supervised learning task, we build DsupD_{\rm sup} with 500 instances of the MILP problem from different combinations of bsw{\rm bsw}, gor{\rm gor}, and q¯gl\bar{q}_{\rm gl}. Gurobi was used to solve the MILPs, upon which z=(zgl,zwhp)z^{\star}=(z^{\star}_{\rm gl},z^{\star}_{{\rm whp}}), the optimal value for the binary variables, was extracted. We use the same π\pi vectors as in DsupD_{\rm sup} for the weakly-supervised learning task. For each π\pi we draw 12 random candidates z^Z\hat{z}\in Z, that is, we ensure that each z^\hat{z} respects the constraints in (10). Then, DweakD_{\rm weak} is built by solving the LPs using Gurobi to compute p=P(π,z^)p=P(\pi,\hat{z}). Whenever π\pi and z^\hat{z} resulted in an infeasible problem, we added them to DweakD_{\rm weak} with p=1p=-1. In total, DsupD_{\rm sup} contains 500 data points of the form ((bsw,gor,q¯gl),z)(({\rm bsw},{\rm gor},\bar{q}_{\rm gl}),z^{\star}), while DweakD_{\rm weak} contains 6000 data points of the form ((bsw,gor,q¯gl),z^)(({\rm bsw},{\rm gor},\bar{q}_{\rm gl}),\hat{z}).

4.2 Supervised learning experiments

For supervised learning, using DsupD_{\rm sup}, we choose

Nsup:Π×ΘsupΔ5×Δ5bsw,gor,q¯gl;θsup(z^gl,z^whp)=Nsup(bsw,gor,q¯gl;θsup),\begin{split}N_{\rm sup}:\Pi\times\Theta_{\rm sup}&\to\Delta^{5}\times\Delta^{5}\\ {\rm bsw},{\rm gor},\bar{q}_{\rm gl};\theta_{\rm sup}&\mapsto(\hat{z}_{\rm gl},\hat{z}_{{\rm whp}})=N_{\rm sup}({\rm bsw},{\rm gor},\bar{q}_{\rm gl};\theta_{\rm sup}),\end{split} (19)

where Δ5\Delta^{5} is the 55-dimensional simplex set, as a neural network with 2 hidden layers of 2525 neurons each and a 1010-dimensional output. The inputs are normalized before the first layer. We use ReLU activation for all layers except the last one. The last layer’s output is divided into two vectors of dimension 55, passing through the softmax function, thus mapping both into 5-dimensional simplexes. The softmax in the output of the network ensures that, after rounding, it respects the binary constraints, that is, z^gl\lfloor\hat{z}_{\rm gl}\rceil and z^whp\lfloor\hat{z}_{{\rm whp}}\rceil respect (10).

We randomly split DsupD_{\rm sup} into training and test sets with an 80-20 ratio. The training data was used to select the optimal network architecture and hyperparameters (learning rate, batch size, and number of epochs). After this adjustment, the entirety of the training set was used to train 55 NsupN_{\rm sup} models with random initialization. Adam is used to optimizing the parameters of the models on the training set such that sup\mathcal{L}_{\rm sup} is minimized (see equation (12)). We use

sup(z^gl,z^whp,zgl,zwhp)=i=1kgl1BCE(z^gl,i,zgl,i)+j=1kwhp1BCE(z^whp,j,zwhp,j),\mathcal{L}_{\rm sup}(\hat{z}_{\rm gl},\hat{z}_{{\rm whp}},z_{\rm gl}^{\star},z_{{\rm whp}}^{\star})=\sum_{i=1}^{k_{\rm gl}-1}{\rm BCE}(\hat{z}_{{\rm gl},i},z_{{\rm gl},i}^{\star})\\ +\sum_{j=1}^{k_{\rm whp}-1}{\rm BCE}(\hat{z}_{{\rm whp},j},z_{{\rm whp},j}^{\star}),

where BCE is the binary cross-entropy function. We use batches of 6464 elements and an initial learning rate of 0.001. Each model is trained for 100100 epochs. The performance on the test set is summarized in Table 1.

Table 1: Early fixing performance on the test set. We measure the models’ accuracy in predicting the binary variables’ optimal values. Infeasible is the ratio of instances that became infeasible after early fixing. Objective gap is the mean relative decrease of objective value by performing early fixing, for all instances that remained feasible after early fixing. The values reported are the average of 5 runs (models with random initialization), except for Baseline.
Model Accuracy Infeasible Objective gap
Supervised 99.78% 0.11% 0.01%
Weakly-supervised 32.31% 4.19% 3.64%
Baseline 0% 0% 21.42%

4.3 Weakly-supervised learning experiments

We build the surrogate model

S:Π×[0,1]10×ΘSbsw,gor,q¯gl,zgl,zwhp;θSp^=S(bsw,gor,q¯gl,z;θS)\begin{split}S:\Pi\times[0,1]^{10}\times\Theta_{S}&\to\mathbb{R}\\ {\rm bsw},{\rm gor},\bar{q}_{\rm gl},z_{\rm gl},z_{{\rm whp}};\theta_{S}&\mapsto\hat{p}=S({\rm bsw},{\rm gor},\bar{q}_{\rm gl},z;\theta_{S})\end{split} (20)

as a neural network with 33 hidden layers of 1010 neurons each. ReLU is used as an activation function at each layer except the last one, which has no activation function. Inputs are normalized and a factor of 20002000 scales the output. Dropout is applied during training at each hidden layer with a probability of 2020% for each neuron.

The dataset DweakD_{\rm weak} is split randomly into a training and test set following an 80-20 ratio. The optimal architecture and hyperparameters (learning rate, batch size, number of epochs) were determined in the same way as in Sec. 4.2. We use the squared error as the loss function

S(p^,p)=|p^p|2.\mathcal{L}_{S}(\hat{p},p)=|\hat{p}-p|^{2}.

We use Adam to minimize the loss function on the training set, with an initial learning rate of 0.0010.001 and mini-batches of 10241024 samples. 55 models with random initialization are trained for 2020 epochs each. On the test set, the surrogate models correctly predicted the feasibility (its output was lower than 0) an average of 78.2378.23% of the time. The average MAE on the instances that were not infeasible was 60.0560.05. Figure 2 shows an example of the surrogate model’s performance compared to the real objective values.

Refer to caption
Figure 2: Comparison of the surrogate model S(π,z)S(\pi,z) (in blue) and the target function P(π,z)P(\pi,z) (in red) for all early fixing possibilities (zZ\forall z\in Z) for a sample instance of the problem. Values of -1 in P indicate that the problem is infeasible for that given combination of the variables.

The early fixing model NweakN_{\rm weak} has the same architecture as NsupN_{\rm sup}. 55 models with random initialization are trained as described in Section 3.3, but using the parameters π\pi from the training set of DsupD_{\rm sup} described above. Each NweakN_{\rm weak} model is trained with a different surrogate model SS. The models are trained using Adam with an initial learning rate of 0.010.01 and a batch of 6464 for 100100 epochs each. The performance on the DsupD_{\rm sup} test set is reported in Table 1.

4.4 Baseline Model

As a reference for the deep learning models’ performance, we compute the results of always fixing the same values for the binary variables. In our problem, the safest option in this approach is always to pick the region with the smallest values possible for qglq_{\rm gl} and whp{\rm whp}. This always results in a feasible problem. The performance of this baseline approach on the test set of DsupD_{\rm sup} can be seen in Table 1.

4.5 Early Fixing Impacts

To evaluate the impacts of early fixing in the optimization, we measure the runtime of solving the original MILP, the runtime of the early-fixed problem (which is an LP), and the runtime of the early fixing models. We perform these experiments on all instances of DsupD_{\rm sup}, i.e., with the problems defined by the parameters π\pi in the DsupD_{\rm sup} dataset used in the experiments above.

We found that the original problem can be solved, on average, in 0.90 ms, while the early-fixed problem is solved in an average of 0.18 ms. Considering the 0.08 ms the early-fixing models took, on average, during the experiments, the early-fixing approach takes, on average, 0.26 ms, representing a 71.11% runtime reduction.

5 Conclusions

Our experiments show that deep-learning-based early fixing models successfully speed up the optimization of the offshore gas-lifted oil production problem, with a 71.11% runtime reduction. Training in a supervised learning setting, although with a significantly higher cost for collecting the training data, is undoubtedly a superior approach concerning the weakly-supervised setting. Nevertheless, the experiments with the weakly-supervised approach indicate that it is possible to develop an early fixing heuristic when optimal solutions to the MILPs are unavailable or too hard to obtain. Still, the weakly-supervised approach needs further refinement to achieve competitive results.

Further research is still necessary on the suitability of the deep-learning-based early fixing for harder problems, e.g., multiple wells connected to manifolds, with more operational constraints and more integer variables. Moreover, the early fixing approaches presented in this paper theoretically apply to any MILP. However, the performance of the models and the practical viability of them is still an open theme for research.

References

  • Anderson et al. (2022) Anderson, L., Turner, M., and Koch, T. (2022). Generative deep learning for decision making in gas networks. Mathematical Methods of Operations Research, 95(3), 503–532. 10.1007/s00186-022-00777-x.
  • Beale and Tomlin (1969) Beale, E. and Tomlin, J. (1969). Special facilities in a general mathematical programming system for nonconvex problems using ordered sets of variables. Operational Research, 69, 447–454.
  • Bengio et al. (2021) Bengio, Y., Lodi, A., and Prouvost, A. (2021). Machine learning for combinatorial optimization: A methodological tour d’horizon. European Journal of Operational Research, 290(2), 405–421. 10.1016/j.ejor.2020.07.063.
  • Ding et al. (2019) Ding, J.Y., Zhang, C., Shen, L., Li, S., Wang, B., Xu, Y., and Song, L. (2019). Accelerating primal solution findings for mixed integer programs based on solution prediction. 10.48550/arXiv.1906.09575.
  • Lee and Mitchell (2009) Lee, E.K. and Mitchell, J.E. (2009). Integer Programming: Branch and Bound Methods. In C.A. Floudas and P.M. Pardalos (eds.), Encyclopedia of Optimization, 1634–1643. Springer US, Boston, MA. 10.1007/978-0-387-74759-0_286.
  • Li and Wu (2022) Li, L. and Wu, B. (2022). Learning to accelerate approximate methods for solving integer programming via early fixing. 10.48550/arXiv.2207.02087.
  • Luguesi et al. (2023) Luguesi, C., Camponogara, E., Seman, L.O., González, J.T., and Leithardt, V.R.Q. (2023). Derivative-free optimization with proxy models for oil production platforms sharing a subsea gas network. IEEE Access, 11, 8950–8967. 10.1109/ACCESS.2023.3239421.
  • Müller et al. (2022) Müller, E.R., Camponogara, E., Seman, L.O., Hülse, E.O., Vieira, B.F., Miyatake, L.K., and Teixeira, A.F. (2022). Short-term steady-state production optimization of offshore oil platforms: wells with dual completion (gas-lift and ESP) and flow assurance. TOP, 30(1), 152–180. 10.1007/s11750-021-00604-2.
  • Pacheco et al. (2023) Pacheco, B.M., Seman, L.O., Rigo, C.A., Camponogara, E., Bezerra, E.A., and dos Santos Coelho, L. (2023). Graph neural networks for the offline nanosatellite task scheduling problem. ArXiv:2303.13773 [cs.LG].
  • Seman et al. (2020) Seman, L.O., Miyatake, L.K., Camponogara, E., Giuliani, C.M., and Vieira, B.F. (2020). Derivative-free parameter tuning for a well multiphase flow simulator. Journal of Petroleum Science and Engineering, 192, 107288. 10.1016/j.petrol.2020.107288.
  • Vigerske and Gleixner (2018) Vigerske, S. and Gleixner, A. (2018). SCIP: global optimization of mixed-integer nonlinear programs in a branch-and-cut framework. Optimization Methods and Software, 33(3), 563–593. 10.1080/10556788.2017.1335312.