Graph Neural Networks for the Offline Nanosatellite Task Scheduling Problem

Bruno Machado Pacheco [email protected] Laio Oriel Seman [email protected] Cezar Antônio Rigo [email protected] Eduardo Camponogara [email protected] Eduardo Augusto Bezerra [email protected] Leandro dos Santos Coelho [email protected] Department of Automation and Systems Engineering, Federal University of Santa Catarina (UFSC), Florianópolis, Brazil Department of Electrical Engineering, Federal University of Santa Catarina (UFSC), Florianópolis, Brazil Department of Electrical Engineering, Federal University of Parana, Curitiba, Brazil Industrial and Systems Engineering Graduate Program, Pontifical Catholic University of Parana, Curitiba, Brazil

Abstract

This study investigates how to schedule nanosatellite tasks more efficiently using Graph Neural Networks (GNNs). In the Offline Nanosatellite Task Scheduling (ONTS) problem, the goal is to find the optimal schedule for tasks to be carried out in orbit while taking into account Quality-of-Service (QoS) considerations such as priority, minimum and maximum activation events, execution time-frames, periods, and execution windows, as well as constraints on the satellite’s power resources and the complexity of energy harvesting and management. The ONTS problem has been approached using conventional mathematical formulations and exact methods, but their applicability to challenging cases of the problem is limited. This study examines the use of GNNs in this context, which has been effectively applied to optimization problems such as the traveling salesman, scheduling, and facility placement problems. More specifically, we investigate whether GNNs can learn the complex structure of the ONTS problem with respect to feasibility and optimality of candidate solutions. Furthermore, we evaluate using GNN-based heuristic solutions to provide better solutions (w.r.t. the objective value) to the ONTS problem and reduce the optimization cost. Our experiments show that GNNs are not only able to learn feasibility and optimality for instances of the ONTS problem, but they can generalize to harder instances than those seen during training. Furthermore, the GNN-based heuristics improved the expected objective value of the best solution found under the time limit in 45%, and reduced the expected time to find a feasible solution in 35%, when compared to the SCIP (Solving Constraint Integer Programs) solver in its off-the-shelf configuration.

keywords:

Scheduling , Graph Neural Network , Combinatorial Optimization , Nanosatellite , Deep Learning.

1 Introduction

Nanosatellites have attracted attention from the industry and academia for over a decade, primarily because of their applications at reduced development and launch costs [1, 2, 3, 4]. Due to limited computational and energy resources, this spacecraft standard is associated with difficulties in mission planning. Optimizing task scheduling is key for guaranteeing a return on investment, as it maximizes resource usage, increases data quality, and brings about cost savings and mission success. Therefore, the Offline Nanosatellite Task Scheduling (ONTS) problem is crucial in developing, deploying, and operating nanosatellites in orbit. The problem aims at finding a schedule for task execution in orbit that maximizes Quality-of-Service (QoS), taking into account factors such as priority, minimum and maximum activation events, execution time-frames, periods, and execution windows, as well as the limitations of the satellite’s power resources and the complexities of energy harvesting and management systems.

During the lifetime of a mission (from launch to disposal), the ONTS problem must be solved recurrently and iteratively. New schedules must be generated (and deployed) over time, and the optimal schedule is found by iteratively figuring out how many tasks and which tasks are possible to fit in the scheduled timespan. Traditional mathematical formulations and exact algorithms have been proposed to solve the ONTS problem, starting from Integer Programming (IP) [5] to Mixed Integer Linear Programming (MILP) [6, 7] and Continuous-Time techniques [8]. However, due to the NP-hard nature of the problem, efficiently solving large problems, i.e., fitting many tasks and planning for a long time horizon, is still an open area for research.

Meanwhile, several recent investigations [9, 10, 11, 12] have considered machine learning tools to address combinatorial optimization problems, such as the single machine problem [13], resource-constrained project scheduling [14], and knapsack problems [15]. Graph Neural Networks (GNNs), in particular, have gained popularity in recent years to solve combinatorial optimization problems when the underlying structure can be represented as a graph [16], which is the case for MILP [17].

This paper investigates the novel application of GNNs for the ONTS problem. More specifically, we consider two research questions:

1.

Can a graph neural network learn the structure of the ONTS problem?
2.

Is a GNN-based heuristic effective (fast and accurate) for the ONTS problem?

To address these two questions, we propose two sets of experiments. First, we train GNNs to predict the feasibility and optimality of candidate solutions given an instance of the problem. In the second set of experiments, we train GNNs to generate candidate solutions given problem instances and use the resulting models to build matheuristics [18]. Our results indicate that: the proposed GNN can learn the feasibility and optimality of candidate solutions, even when given instances larger (and harder) than those seen during training. Furthermore, GNN-based matheuristics effectively provide high-quality solutions for large problems, overcoming the MILP solver (SCIP) both in the time to generate feasible solutions and the quality of the solutions found during limited time. In summary, the main contributions of this paper are as follows:

1.

The introduction of SatGNN, a pioneering GNN architecture suitable to the ONTS problem;
2.

A showcase of SatGNN’s impressive performance in accurately classifying feasibility, predicting optimality, and generating effective heuristic solutions;
3.

A demonstration of GNN’s robust generalization abilities for optimization problems, particularly when dealing with larger ONTS instances;
4.

An implication of the potential for a transformative impact on complex space mission scheduling.

The remainder of this paper is organized as follows: Section 2 provides an overview of the related literature. Section 3 describes the problem in detail, providing context and background information and formulating the optimization problem. Section 4 provides the necessary theoretical background on graph neural networks and their application to linear optimization problems. The data acquisition, the proposed GNN architecture, and the heuristics for optimization problems are presented in Section 5. The experiments that tackle the two research questions posed above along with the results are presented in Section 6. Finally, Sections 7 and 8 discuss the key findings and provide concluding remarks and future research directions.

2 Related Work

The ONTS problem has received significant contributions in recent years, with different approaches and formulations. At the same time, machine learning applications for optimization problems have become a prolific area of research. This section briefly overviews our paper’s most influential works in both areas.

Mathematical programming is consolidated as a practical approach for satellite scheduling problems [19, 20, 21]. Targeting small satellites, Rigo et al. [5] formulated the problem of maximizing task allocation under power availability constraints as an integer programming (IP) problem. The formulation considers discrete time steps and information such as the maximum power input at each time step along the planning horizon, power consumption of tasks, execution periods, and task priorities. This approach was able to provide energy-efficient allocations that improved mission value.

Seman et al. [7] extended the approach by [5] and proposed a Mixed-Integer Linear Programming (MILP) formulation to tackle the task scheduling in constellations of nanosatellites. Camponogara et al. [8] applied the Multi-Operation Sequencing with Synchronized Start Times (MOS-SST) representation to the problem, maintaining an MILP formulation but extending the problem to continuous time. The continuous-time feature allowed for greater flexibility and versatility, fundamental characteristics for decision-making during the design phase. The evolution of formulations and methodologies highlights the continuous efforts to find the most efficient and effective solutions to the ONTS problem.

More recently, given the difficulty in solving complex instances of the ONTS problem, Rigo et al. [22] proposed a Dantzig-Wolfe decomposition, coupled with a branch-and-price (B&P) methodology. This approach was designed to build a unique column-based formulation that produces feasible and optimal schedules. In addition, they leveraged the Dynamic Programming (DP) technique to identify optimal columns and speed up the whole process. Their computational experiments significantly improved overall solution time compared to a commercial MILP solver, with a 70% reduction in computation time. Despite these promising results, the B&P method introduces specific challenges and limitations that preclude its universal application. One significant drawback is its heavy reliance on the suitability of the decomposition, which is notoriously problem-specific. Currently, no known method allows practitioners to a priori ascertain the presence of a decomposable structure in a given problem [23]. This lack of a deterministic approach to identifying suitable problems adds a layer of complexity to the application of B&P. Additionally, the complexity inherent in the decomposition and the optimization algorithm can result in a high development undertaking. The intricate nature of these processes requires specialized knowledge and expertise, potentially rendering the B&P method prohibitive for particular applications or settings. Consequently, while the B&P methodology offers substantial improvements in computational efficiency for specific problem instances, its applicability may be limited by these challenges, emphasizing the need for caution in considering it as a universal solution for all optimization problems.

Machine learning applications for optimization problems were first proposed in 1985, applying Hopfield neural networks to the Travelling Salesperson Problem (TSP) [24]. Due to the hardware limitations of the time [25], neural network applications to optimization problems have stayed out of the spotlight up to recent years, when developments in novel architectures, more powerful hardware, and promising published results have attracted the attention of the research community back to this area [9]. One of the drivers of this change was the proposition of structured artificial neural networks able to handle the symmetries that optimization problems have [26], such as the invariance to permutation of constraints. A significant example of such property for Combinatorial Optimization (CO) is the work of Khalil et al. [27], which developed a greedy heuristic for CO problems on graphs using a GNN. Gasse et al. [28] proposed to embed linear programming problems as a bipartite graph, generalizing the use of GNNs to any MILP. The results presented in these works indicate that GNNs can learn the structures of MILP problems.

Many authors proposed using GNNs to predict solution values, aiming at using the learned model in heuristic solutions. Ding et al. [29] proposed using such GNNs to guide the Branch and Bound (B&B) algorithm towards the variables the model had the most difficulty learning. Khalil et al. [17] use the GNNs trained to predict biases of the binary variables to perform node selection and warm starting. Nair et al. [30], in their approach named Neural Diving, proposed to train the model to provide partial solutions, which generate sub-problems through variable fixing. The multiple sub-problems are then solved using off-the-shelf solvers. Han et al. [12] compare the fixing approach of [30] to their proposed approach of defining a trust region around the partial solution generated by the GNN, showing significant improvements over off-the-shelf solvers like SCIP and Gurobi. These works highlight how promising building GNN-based heuristic approaches to MILP can be.

To the best of our knowledge, our work is the first to apply deep learning models to satellite task scheduling. Furthermore, machine learning application to MILP has not been extensively tested, as confirmed by the novelty of the works presented above and the lack of learning-based components in open-source solvers such as SCIP and CPLEX. In other words, although promising, applying GNNs to MILP still needs to be well-understood and is not guaranteed to yield positive results. In this context, the present work also contributes to the machine learning and combinatorial optimization area of research by validating novel techniques.

3 Offline Nanosatellite Task Scheduling (ONTS)

Nanosatellite scheduling problems concern the decisions on each task’s start and finish time. The tasks usually require periodic execution and during restricted moments along the orbit. Besides time, energy availability through the orbit is a crucial resource to be considered. Figure 1 shows an example of optimal scheduling, in which each job is represented by a different color and the activation and deactivations are shown through the steps of the signals. Proper scheduling must account for energy management such that the tasks do not draw more energy than the system can provide and that the battery is not depleted before the end of the mission. Energy management is a difficult task since the nanosatellite draws power from its solar panels, with the energy availability depending on the attitude of the nanosatellite (which affects the orientation of the solar panels) and the trajectory with respect to Earth’s shadow, as illustrated in Figure 2.

Refer to caption — Figure 1: Illustration of an optimum scheduling for 9 tasks on a horizon of 125 time steps. Each color represents the execution of the different tasks.

3.1 MILP Formulation

A formulation that considers the realistic constraints and objectives of ONTS was proposed by Rigo et al. [5] and is presented here. Given a set of jobs $\mathcal{J}=\{0,...,J\}$ that represents the tasks, and a set of time units $\mathcal{T}=\{0,...,T\}$ that represents the scheduling horizon, variables

\boldsymbol{x}=(x_{1,1},\ldots,x_{1,T},\ldots,x_{J,1},\ldots,x_{J,T})

represent the allocation of jobs $j\in\mathcal{J}$ at times $t\in\mathcal{T}$ , i.e., $x_{j,t}=1$ indicates that job $j$ is scheduled to execute at time $t$ . Naturally, $x_{j,t}$ are binary variables, i.e., $\boldsymbol{x}\in\{0,1\}^{JT}$ .

It is assumed that there exists a priority for every job, which is defined by $\boldsymbol{u}=(u_{1},\ldots,u_{J})$ . The goal is to maximize the mission’s Quality of Service (QoS) (1), which is the sum of the allocations weighted by the priorities,

\displaystyle{\rm QoS}(\boldsymbol{x};\boldsymbol{u})=\sum_{j=1}^{J}\sum_{t=1}^{T}{u}_{j}x_{j,t}

(1)

To formalize the ONTS problem of maximizing QoS, we follow the formulation proposed by Rigo et al. [5]. The following constraints are added to ensure that the scheduling respects the requirements and specificities of each job:


$\displaystyle\phi_{j,t}\geq x_{j,t},$	$\displaystyle~{}\forall j\in\mathcal{J},\,t=1$	(2a)
$\displaystyle\phi_{j,t}\geq x_{j,t}-x_{j,(t-1)},$	$\displaystyle~{}\forall j\in\mathcal{J},\,\forall t\in\mathcal{T}:t>1$	(2b)
$\displaystyle\phi_{j,t}\leq x_{j,t},$	$\displaystyle~{}\forall j\in\mathcal{J},\,\forall t\in\mathcal{T}$	(2c)
$\displaystyle\phi_{j,t}\leq 2-x_{j,t}-x_{j,(t-1)},$	$\displaystyle~{}\forall j\in\mathcal{J},\,\forall t\in\mathcal{T}:t>1$	(2d)
$\displaystyle\sum_{t=1}^{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}w^{\min}_{j}}}x_{j,t}=0,$	$\displaystyle\forall j\in\mathcal{J}\,$	(2e)
$\displaystyle\sum_{t=w^{\max}_{j}+1}^{T}x_{j,t}=0,$	$\displaystyle\forall j\in\mathcal{J}$	(2f)
$\displaystyle\sum_{l=t}^{t+{t}^{\min}_{j}-1}x_{j,l}\geq{t}^{\min}_{j}\phi_{j,t},$	$\displaystyle\forall t\in\{1,...,T-{t}^{\min}_{j}+1\},\forall j\in\mathcal{J}$	(2g)
$\displaystyle\sum_{l=t}^{t+{t}^{\max}_{j}}x_{j,l}\leq{t}^{\max}_{j},$	$\displaystyle\forall t\in\{1,...,T-{t}^{\max}_{j}\},\forall j\in\mathcal{J}$	(2h)
$\displaystyle\sum_{l=t}^{T}x_{j,l}\geq(T-t+1)\phi_{j,t},$	$\displaystyle\forall t\in\{T-{t}^{\min}_{j}+2,...,T\},\forall j\in\mathcal{J}\,$	(2i)
$\displaystyle\sum_{l=t}^{t+{p}^{\min}_{j}-1}\phi_{j,l}\leq 1,$	$\displaystyle\forall t\in\{1,...,T-{p}^{\min}_{j}+1\},\forall j\in\mathcal{J}$	(2j)
$\displaystyle\sum_{l=t}^{t+{p}^{\max}_{j}-1}\phi_{j,l}\geq 1,$	$\displaystyle\forall t\in\{1,...,T-{p}^{\max}_{j}+1\},\forall j\in\mathcal{J}$	(2k)
$\displaystyle\sum_{t=1}^{T}\phi_{j,t}\geq{y}^{\min}_{j},$	$\displaystyle\forall j\in\mathcal{J}$	(2l)
$\displaystyle\sum_{t=1}^{T}\phi_{j,t}\leq{y}^{\max}_{j},$	$\displaystyle\forall j\in\mathcal{J}$	(2m)
$\displaystyle\phi_{j,t}\in\{0,1\},$	$\displaystyle~{}\forall j\in\mathcal{J},t\in\mathcal{T}$	(2n)
$\displaystyle x_{j,t}\in\{0,1\},$	$\displaystyle~{}\forall j\in\mathcal{J},t\in\mathcal{T}.$	(2o)

Note that auxiliary binary variables

\boldsymbol{\phi}=(\phi_{1,1},\ldots,\phi_{1,T},\ldots,\phi_{J,1},\ldots,\phi_{J,T})

are used, which take a positive value if, and only if, job $j$ was not running at time $t-1$ , but started running at time $t$ . Constraints (2a) to (2d) are used to ensure this behavior. We also consider that jobs may run only during a time window defined by parameters $w^{\min}_{j}$ and $w^{\max}_{j}$ , which is ensured by constraints (2e) and (2f). Such behavior is necessary to ensure that a payload, for instance, runs only when passing above a certain territory.

Jobs may also have limits on continuous execution. If a job $j$ starts running at time $t$ , then it must run for at least $t^{\min}_{j}$ time steps, and at most $t^{\max}_{j}$ time steps. This is ensured by constraints (2g) and (2h). The formulation, through constraint (2i), also allows a job to start at the last time steps and keep running until the end, assuming it will keep running at the start of the following schedule.

A job may require to be executed periodically, at least every $p^{\min}_{j}$ time steps, and at most every $p^{\max}_{j}$ time steps. This is ensured through constraints (2j) and (2k), over the $\phi_{j,t}$ variables. A job may also require multiple executions through the planning horizon, starting at least $y^{\min}_{j}$ times, and at most $y^{\max}_{j}$ times, which is ensured through constraints (2l) and (2m).

Beyond job-specific restrictions in equation (2), the formulation also covers energy management,


$\displaystyle\sum_{j=1}^{J}q_{j}x_{j,t}\leq r_{t}+\gamma~{}V_{b},$	$\displaystyle\forall t\in\mathcal{T}$	(3a)
$\displaystyle b_{t}=r_{t}-\sum_{j\in\mathcal{J}}q_{j}x_{j,t},$	$\displaystyle\forall t\in\mathcal{T}$	(3b)
$\displaystyle i_{t}=\frac{b_{t}}{V_{b}},$	$\displaystyle\forall t\in\mathcal{T}$	(3c)
$\displaystyle\text{SoC}_{t+1}=\text{SoC}_{t}+\frac{i_{t}~{}e}{60~{}Q},$	$\displaystyle\forall t\in\mathcal{T}$	(3d)
$\displaystyle\text{SoC}_{t}\leq 1,$	$\displaystyle\forall t\in\mathcal{T}$	(3e)
$\displaystyle\text{SoC}_{t}\geq\rho,$	$\displaystyle\forall t\in\mathcal{T}$	(3f)

Parameter $r_{t}$ provides the power available at time $t$ from the solar panels and $q_{j}$ the power required for executing job $j$ . Thus, constraint (3a) limits the power consumption, with $\gamma\cdot V_{b}$ being the maximum power the battery can provide. Constraints (3b) to (3d) update the auxiliary variables $b_{t}$ and ${\rm SoC}_{t}$ , which represent the exceeding power and State of Charge (SoC) at time $t$ , based on the battery capacity $Q$ and the discharge efficiency $e$ . The SoC of the battery must stay within the limits given in constraints (3e) and (3f), with $\rho$ being a lower limit usually greater than zero as a safety measure.

Thus, the MILP formulation is the maximization of (1) while subject to constraints (2) and (3). In other words, we can write any instance $I\in\mathcal{I}$ , where $\mathcal{I}$ is the set of all possible instances of the ONTS problem, as

\begin{split}I:\max_{\boldsymbol{x},\boldsymbol{\phi},{\rm SoC}}~{}&\underbrace{\sum_{j=1}^{J}\sum_{t=1}^{T}{u}_{j}x_{j,t}}_{\text{QoS}}\\ \text{s.t.}~{}&\eqref{eq:qos-constraints},\eqref{eq:energy-constraints}\\ &\boldsymbol{x},\boldsymbol{\phi}\in\{0,1\}^{JT},{\rm SoC}\in[0,1]^{T}.\end{split}

(4)

Note that the continuous variables ${\rm SoC}$ can be completely determined by the binary variables $\boldsymbol{x}$ and $\boldsymbol{\phi}$ , so our problem can be reduced to finding an assignment $\boldsymbol{z}=(\boldsymbol{x},\boldsymbol{\phi})\in\{0,1\}^{2JT}$ .

Any instance $I\in\mathcal{I}$ is parameterized by the number of tasks $J$ , the number of time units $T$ and the parameters $\boldsymbol{u}$ , $\boldsymbol{q}$ , $\boldsymbol{y}^{\min}$ , $\boldsymbol{y}^{\max}$ , $\boldsymbol{t}^{\min}$ , $\boldsymbol{t}^{\max}$ , $\boldsymbol{p}^{\min}$ , $\boldsymbol{p}^{\max}$ , $\boldsymbol{w}^{\min}$ , $\boldsymbol{w}^{\max}$ , $\boldsymbol{r}$ , $\rho$ , $e$ , $Q$ , $\gamma$ , and $V_{b}$ . We will denote $\Pi_{J,T}$ the parameter space through which any instance $I\in\mathcal{I}$ can be uniquely determined by some parameter vector $\pi\in\Pi_{J,T}$ (given adequate $J$ and $T$ ).

4 Graph Neural Networks (GNNs)

Graph neural networks are generalizations of convolutional neural networks from grid-structured data to graphs. The input to a GNN is a graph and a set of feature vectors associated with the graph nodes. GNNs work by propagating feature information between neighboring nodes, which can be seen as message-passing. More specifically, at each layer, the GNN computes the messages each node sends based on its feature vector and then updates each node’s feature vectors given the messages its neighbors received. We will refer to the feature update as a graph convolution, due to its similarities to the operations in convolutional neural networks.

At each layer $l\in\{1,...,L\}$ , the graph convolution operation has two major components: message functions $M_{l}(\cdot)$ , which compute the messages propagated by each node, and update functions $U_{l}(\cdot)$ , which update the node-associated features based on the messages. Let $G=(V,E)$ be a graph and $\boldsymbol{h}^{(0)}_{v}$ be feature vectors for every node $v\in V$ . The messages sent $\boldsymbol{m}_{u}^{(l)}$ are computed based on the node features

\boldsymbol{m}_{u}^{(l)}=M_{l}(\boldsymbol{h}_{u}^{(l-1)}),\forall u\in V,

(5)

where $M_{l}(\cdot)$ is the message function of layer $l$ . After computing the messages, the features of node $v$ are updated through

\boldsymbol{h}_{v}^{(l)}=U_{l}\left(\boldsymbol{h}_{v}^{(l-1)},{\tt Aggregation}\left(\left\{\boldsymbol{m}^{(l)}_{u}:u\in\mathcal{N}(v)\right\}\right)\right),

(6)

where $\mathcal{N}(v)$ are the neighbors of $v$ , $U_{l}(\cdot)$ is the update function of layer $l$ , and ${\tt Aggregation}$ is a function that receives multiple message vectors and returns a single vector.

A common choice for graph convolutions is to use feedforward neural networks as message and update functions. For these to be well-defined, we need ${\tt Aggregation}$ to return, for how many messages received, a vector with a fixed length, thus the naming. A natural choice is to sum up the messages, but many are the possibilities, such as

\displaystyle{\tt Aggregation}=\begin{cases}\frac{1}{|\mathcal{N}(v)|}\sum\limits_{u\in\mathcal{N}(v)}\boldsymbol{m}_{u}^{(l)},&\textrm{if mean}\\ \underset{u\in\mathcal{N}(v)}{\max}\boldsymbol{m}_{u}^{(l)},&\textrm{if max}\\ \sum\limits_{u\in\mathcal{N}(v)}\boldsymbol{m}_{u}^{(l)},&\textrm{if sum}\\ \sum\limits_{u\in\mathcal{N}(v)}\alpha_{u,v}\boldsymbol{m}_{u}^{(l)},&\textrm{if attention}\\ \end{cases},

(7)

where $\alpha_{u,v}$ is an attention weight for node $u$ given node $v$ , which can be learned along the model parameters. Furthermore, the message function can easily be extended to consider the edge weight (or even edge features) along with the feature vectors of the neighbors.

A common approach is to define the message functions $M_{l},l=1,\ldots,L$ , as linear operators over the hidden features of the neighbors, aggregate these messages by summing, and use the ReLU (Rectifier Linear Unit) activation function with a bias as the update functions $U_{l},l=1,\ldots,L$ . An example of such is the model proposed by Kipf and Welling [31], which uses a linear combination of the features as the message function, aggregates messages by summation, and updates using a single-layer neural network with ReLU activation. More precisely, we can describe their convolution operation as

	$\displaystyle\boldsymbol{m}_{u,v}^{(l)}$	$\displaystyle=\frac{1}{c_{vu}}W^{(l)}\boldsymbol{h}_{u}^{(l-1)},\,u\in\mathcal{N}(v)$		(8)
	$\displaystyle\boldsymbol{h}_{v}^{(l)}$	$\displaystyle=\text{ReLU}\left(\boldsymbol{b}^{(l)}+\sum_{u\in\mathcal{N}(v)}\boldsymbol{m}_{u,v}^{(l)}\right)$		(8)

where $c_{vu}=\sqrt{|\mathcal{N}(u)|}\sqrt{|\mathcal{N}(v)|}$ with $|\mathcal{N}(v)|$ denoting the number of neighbors, and $W^{(l)}\in\mathbb{R}^{d_{l}\times d_{l}},\boldsymbol{b}^{(l)}\in\mathbb{R}^{d_{l}}$ (layer $l$ has size $d_{l}$ ) are the weights and biases of the model, i.e., learnable parameters.

Another convolution operator was proposed by [32] and named SAGE (SAmple and aGgrEgate). The authors propose to directly aggregate the features of the neighbors, i.e., to use the identity as the message function and to use a single-layer network with ReLU activation as the update function. Putting it into terms, the proposed graph convolution is

	$\displaystyle\boldsymbol{m}_{u,v}^{(l)}$	$\displaystyle=\boldsymbol{h}_{u}^{(l-1)},\,u\in\mathcal{N}(v)$		(9)
	$\displaystyle\boldsymbol{h}_{v}^{(l)}$	$\displaystyle=\text{ReLU}\left(\boldsymbol{b}^{(l)}+W^{(l)}_{1}\boldsymbol{h}_{v}^{(l-1)}+W^{(l)}_{2}{\tt Aggregation}\left(\boldsymbol{m}_{u,v}^{(l)},\,u\in\mathcal{N}(v)\right)\right)$		(9)

where $W^{(l)}_{1},W^{(l)}_{2}\in\mathbb{R}^{d_{l}\times d_{l}},\boldsymbol{b}^{(l)}\in\mathbb{R}^{d_{l}}$ are the parameters. The authors suggest using more complex aggregation operators, such as an LSTM (Long Short-Term Memory network) and a fully connected single-layer neural network, followed by a pooling operation (element-wise maximum). Note that the SAGE convolution with summation as the aggregation function is equivalent to the one proposed by Kipf and Welling.

After recurrent graph convolutions through the $L$ layers of a GNN, we can use the resulting node features $H^{(L)}=\left[\boldsymbol{h}_{v}^{(L)}\right]_{v\in V}$ straight-away, which is suitable for, e.g., node classification tasks, or we can aggregate them further into a single feature vector, suitable for, e.g., graph classification tasks. The GNN can be trained end-to-end by minimizing a prediction loss based on its outputs, optimizing its parameters (e.g., $W^{(l)}$ and $\boldsymbol{b}^{(l)}$ of (8)) in the same way as a traditional deep learning model.

4.1 MILP Problems as Inputs for GNNs

We can feed MILP problem instances to GNNs by representing the relationship between problem variables and constraints through a bipartite graph [28, 29, 17, 30, 12]. More precisely, given a problem of the form

	$\displaystyle\max_{\boldsymbol{x}}$	$\displaystyle\quad\boldsymbol{c}^{T}\boldsymbol{x}$		(10)
	s.t.:	$\displaystyle\quad A\boldsymbol{x}\leq\boldsymbol{b},$		(10)

where $\boldsymbol{x}\in X\subseteq\mathbb{R}^{n}$ and $\boldsymbol{b}\in\mathbb{R}^{m}$ , we can build a graph $G=(V_{\textrm{var}}\cup V_{\textrm{con}},E)$ , in which $|V_{\textrm{var}}|=n$ , $|V_{\textrm{con}}|=m$ , and $E=\{(v_{{\rm con},i},v_{{\rm var},j}):A_{i,j}\neq 0\}$ . Intuitively, the graph represents the structure of the problem at hand, with edges capturing the relationship between variables and constraints. Note that this approach yields a bipartite graph, that is a graph in which the nodes are separated into two disjoint sets, $V_{\textrm{var}}$ and $V_{\textrm{con}}$ , with edges connecting only nodes from one set to the other.

To fully represent an MILP instance, however, the graph has to be enhanced with edge and node weights. Let $w:V\cup E\mapsto\mathbb{R}$ be the weight function¹¹1Note that, in a slight abuse of notation, we use a single function to weigh both nodes and edges.. Then, for every node $v_{i}\in V_{\rm con}$ , associated with the $i$ -th constraint, the weight will be the constraint’s bound, i.e., $i$ -th element of $\boldsymbol{b}$ , i.e., $w(v_{i})=b_{i}$ . Similarly, for $v_{j}\in V_{\rm var}$ , associated with the $j$ -th variable, $w(v_{j})=c_{j}$ , where $\boldsymbol{c}=(c_{j})_{j=1,\ldots,n}$ . Edge weights are given by the incidence matrix $A$ , that is, given $e=(v_{con,i},v_{var,j})\in E$ , then $w(e)=A_{i,j}$ . This graph representation approach establishes a bijective relationship to the instance space, i.e., every MILP instance is uniquely determined by a weighted graph and vice-versa²²2For the weighted graph to identify an MILP instance uniquely, it would need to contain the information about the integrality of the variables. Restricting the bijection statement to MILP problems with the same number of integer variables would be more precise. Alternatively, to assume, without loss of generality, that the first $k\leq n$ variables (i.e., $x_{1}$ to $x_{k}$ ) are integer variables and use $k$ as a weight for the whole graph..

In practice, the graph fed to a GNN is associated with feature vectors $\boldsymbol{h}_{v}^{(0)},\forall v\in V$ , of arbitrary size. In other words, the information contained in the features provided to the network is a design choice. It can contain the weights described above, but many other features might also help the model learn the graph-related task [28, 30].

For illustration purposes, consider an optimization problem in the form of Eq. (10) with three variables $\boldsymbol{x}=[x_{1},x_{2},x_{3}]^{T}$ , three constraints $C_{1},C_{2}$ and $C_{3}$ , and

\boldsymbol{c}=\begin{bmatrix}1\\ 2\\ 3\end{bmatrix};~{}\\ A=\begin{bmatrix}1&2&0\\ 0&1&-1\\ 3&0&1\end{bmatrix};~{}\\ \boldsymbol{b}=\begin{bmatrix}2\\ 1\\ 4\end{bmatrix}.

(11)

Then, a bipartite graph representation is illustrated in Figure 3. For the features, one may define

\boldsymbol{h}_{x_{j}}^{(0)}=\begin{bmatrix}c_{j}\\ \frac{1}{3}\sum_{i=1}^{3}A_{ij}\\ \max_{i}A_{ij}\end{bmatrix}\quad{\rm and}\quad\boldsymbol{h}_{C_{i}}^{(0)}=\begin{bmatrix}b_{i}\\ \frac{1}{3}\sum_{j=1}^{3}A_{ij}\\ \max_{j}A_{ij}\end{bmatrix},

such that

H^{(0)}=\begin{bmatrix}\boldsymbol{h}_{x_{1}}^{(0)T}\\ \boldsymbol{h}_{x_{2}}^{(0)T}\\ \boldsymbol{h}_{x_{3}}^{(0)T}\\ \boldsymbol{h}_{C_{1}}^{(0)T}\\ \boldsymbol{h}_{C_{2}}^{(0)T}\\ \boldsymbol{h}_{C_{3}}^{(0)T}\end{bmatrix}=\begin{bmatrix}1&2&3\\ 2&1.5&2\\ 3&0&1\\ 2&1.5&2\\ 1&0&1\\ 4&2&3\end{bmatrix}.

Figure 3: Bipartith graph representation of an MILP with three variables (

x_{1}

x_{2}

, and

x_{3}

), three constraints (

C_{1}

C_{2}

, and

C_{3}

), and

A

as in Eq. (11).

5 Methodology

In this section, we provide a detailed description of our methodological approach to tackle the two research questions proposed in the introduction. Namely, we describe our approach to generating ONTS problem data used to train GNNs, our proposed GNN architecture (SatGNN), and how we embed our trained models in heuristics.

5.1 Data

High-quality data is necessary to learn the tasks of interest and perform reliable evaluations. To achieve a significant quantity of data, we generate new instances of the ONTS problem with variations of all task parameters and energy availability. However, randomly sampling parameter values suitable for the problem formulation presented in Sec. 3.1 would not reflect the distribution of instances we expect to see in practice. Therefore, we use a methodological approach to sample the parameter values, following Rigo et al. [33]. The procedure for generating pseudo-random realistic parameters is described in detail in A.

The algorithm for constructing our dataset is described in Algorithm 1. We optimize every new instance $I\in\mathcal{I}$ using a commercial solver with limited time, during which we collect the best candidate solutions found (represented by $Z^{\star}$ ). Note that we collect only the values for the binary variables $\boldsymbol{z}$ , as the binary ones can completely determine the continuous variables (see Sec. 3.1). The instance is rejected if no feasible solution is found during the time budget (or if the solver proves infeasibility).

Data: Time horizon

T

, number of jobs

J

, number of instances (final dataset size)

n

Result: Dataset

\mathcal{D}=\{(I,Z^{\star}):Z^{\star}\subset Z\,{\rm given}\,I\}

while $|\mathcal{D}|<n$ do

\pi\leftarrow{\tt RandomInstance}(T,J)

I\leftarrow{\tt BuildInstance}(\pi)

Z^{\star}\leftarrow{\tt Solver}(I)

if $|Z^{\star}|>0$ then

\mathcal{D}

.add

(I,Z^{\star})

end if

end while

Algorithm 1 Dataset generation.

\pi

is the parameter vector described in Sec. 3.1,

Z

is the set of all feasible solutions, and

Z^{\star}\subset Z

is the set of candidate solutions the solver finds. For a description of

{\tt RandomInstance}

see A.

5.2 SatGNN

We name SatGNN the base model proposed for all experiments. An overview of the components of our model can be seen in the diagram of Figure 4. We follow the approach of embedding the ONTS problem instances as bipartite graphs and feeding them to a GNN (see Sec. 4). Because of the convolutional nature of the message-passing iterations of the GNNs, the model can deal with arbitrary-sized graphs, which enables the same GNN to handle optimization problems with varying numbers of variables and constraints. In other words, the GNN can handle ONTS instances with varying numbers of jobs, which is useful, e.g., for iterating over the number of jobs during the mission design to maximize QoS.

The input graph is enhanced with feature vectors for every node. Let $G=(V,E,w)$ be the graph representation of an instance $I\in\mathcal{I}$ as described in Sec. 4.1, with $V=V_{\textrm{var}}\cup V_{\textrm{con}}$ the set of nodes and $w:E\longrightarrow\mathbb{R}$ the edge weight function based on the variables coefficients on the constraints. We name $\boldsymbol{f}_{v}$ the feature vector associated with node $v$ . We add four features to constraint nodes and six to variable nodes, as described in Table 1.

Features of constraint nodes ( $\boldsymbol{f}_{v_{\rm con}}$ )	Features of variable nodes ( $\boldsymbol{f}_{v_{\rm var}}$ )
Constraint’s upper bound ( $\boldsymbol{b}$ )	Variable’s coefficient in the objective ( $\boldsymbol{c}$ )
Constraint’s average coefficient (mean of $A_{i*}$ )	Variable’s average coefficient in the constraints (mean of $A_{*j}$ )
Number of neighbors/non-null coefficients ( $\|\mathcal{N}(v_{\rm con})\|$ )	Number of neighbors/non-null coefficients ( $\|\mathcal{N}(v_{\rm var})\|$ )
Whether it is an equality or an inequality constraint	Largest coefficient in the constraints ( $\max(A_{*j})$ )
	Smallest coefficient in the constraints ( $\min(A_{*j})$ )
	Whether it is a continuous or binary variable

Table 1: Description of input features for SatGNN.

The feature vectors are encoded into the first layer of hidden features by fully-connected neural networks. More precisely, the hidden feature vectors $H^{(0)}\in\mathbb{R}^{(n+m)\times d}$ (where $n$ is the number of variables, $m$ is the number of constraints, and $d$ is the dimension of the hidden feature vectors) are computed by single-layer neural networks with ReLU activation

	$\displaystyle{\rm NN}_{\rm var}:\mathbb{R}^{6}$	$\displaystyle\longrightarrow\mathbb{R}^{d}_{+}$
	$\displaystyle\boldsymbol{f}_{v_{\rm var}}$	$\displaystyle\longmapsto\boldsymbol{h}^{(0)}_{v_{\rm var}}={\rm NN}_{\rm var}(\boldsymbol{f}_{v_{\rm var}})$

and

	$\displaystyle{\rm NN}_{\rm con}:\mathbb{R}^{4}$	$\displaystyle\longrightarrow\mathbb{R}^{d}_{+}$
	$\displaystyle\boldsymbol{f}_{v_{\rm con}}$	$\displaystyle\longmapsto\boldsymbol{h}^{(0)}_{v_{\rm con}}={\rm NN}_{\rm con}(\boldsymbol{f}_{v_{\rm con}}),$

such that the initial hidden features associated to the variable nodes $H^{(0)}_{\rm var}\in\mathbb{R}^{n\times d}$ are computed through ${\rm NN}_{\rm var}$ and the initial hidden features associated to the constraint nodes $H^{(0)}_{\rm con}\in\mathbb{R}^{m\times d}$ are computed through ${\rm NN}_{\rm con}$ , in which $H^{(0)}=[H^{(0)}_{\rm var},H^{(0)}_{\rm con}]$ .

At each layer of SatGNN, the graph convolutions are performed in two steps, as in Gasse et al. [28]. First, a graph convolution is applied to update the hidden features of the constraint nodes. Then, another graph convolution is performed, now considering the updated features of the constraint nodes, to update the hidden features of the variable nodes. These two operations are illustrated in Figure 4 through the $GraphConv$ blocks.

Finally, after $L$ iterations, the last layer’s hidden features associated to the variable nodes are fed to a neural network

	$\displaystyle{\rm NN}_{\rm out}:\mathbb{R}^{d}_{+}$	$\displaystyle\longrightarrow\mathbb{R}$
	$\displaystyle\boldsymbol{h}^{(L)}_{v_{\rm var}}$	$\displaystyle\longmapsto\hat{z}_{v_{\rm var}}={\rm NN}_{\rm out}(\boldsymbol{h}^{(L)}_{v_{\rm var}}).$

This network combines the features extracted by the graph convolution operations into the desired shape of the output. The network contains two layers of $d$ nodes with ReLU activation. For node classification tasks (Sec. 5.3.2), the output of the ${\rm NN}_{\rm out}$ contains a sigmoid activation function, resulting in a probability prediction for each variable node. For graph classification tasks (Sec. 5.3.1), in which the output is a single probability value for the entire graph, the output of ${\rm NN}_{\rm out}$ for every variable node is summed into a single value before the sigmoid.

5.3 Training

Two learning tasks are proposed to investigate the research questions that are the focus of this paper. To investigate whether a GNN can learn the structure of the ONTS problem (first research question), we train the SatGNN model on the feasibility classification of candidate solutions. To evaluate the effectiveness of a GNN-based heuristic (second research question), we train SatGNN to generate candidate solutions given instances of the problem.

5.3.1 Feasibility Classification

Given an instance of the ONTS problem $I\in\mathcal{I}$ and a candidate solution $\boldsymbol{\hat{z}}$ , we want the model to determine whether $\hat{\boldsymbol{z}}$ is a feasible solution or not, that is, $p(\hat{\boldsymbol{z}}\in Z|I)$ , where $Z\subset\{0,1\}^{2JT}$ is the set of feasible solutions. The candidate solution is fed to the model as features, adding an extra dimension to $\boldsymbol{f}_{v_{\rm var}}$ . As briefly described in Section 5.2, the feasibility classification is a graph classification task, thus, we aggregate the output of SatGNN before the sigmoid function, i.e., the outputs of ${\rm NN}_{\rm out}$ for each variable node are aggregated to generate the predicted logit, before applying the sigmoid to have the predicted probability $\hat{p}(\hat{\boldsymbol{z}}\in Z|I)$ . More precisely, we can write

\hat{p}(\hat{\boldsymbol{z}}\in Z|I)=\frac{1}{|V_{\rm var}|}\sum_{v_{\rm var}\in V_{\rm var}}{\rm NN}_{\rm out}\left(\boldsymbol{h}^{(L)}_{v_{\rm var}}\right),

where $\boldsymbol{h}^{(L)}_{v_{\rm var}}$ is computed as described in Sec. 5.2, and $V_{\rm var}$ is the set of variable nodes, as described in Sec. 4.1. The model is trained to minimize the binary cross-entropy (BCE) between the predicted probability and the actual probability

\min_{\theta}\mathcal{L}_{\rm feas}(\theta)=\sum_{(I,\hat{\boldsymbol{z}},y)\in\mathcal{D}_{\rm feas}}{\rm BCE}({\rm SatGNN}(I,\hat{\boldsymbol{z}};\theta),y),

where $y\in\{0,1\}$ indicates whether $\hat{\boldsymbol{z}}$ is feasible ( $y=1$ ) or not ( $y=0$ ), and $\theta$ encapsulates all trainable parameters of SatGNN, which includes the parameters of the graph convolutions and the parameters of ${\rm NN}_{\rm con}$ , ${\rm NN}_{\rm var}$ , and ${\rm NN}_{\rm out}$ .

The training data is built with instances from $\mathcal{D}$ (see Sec. 5.1). For each instance $I$ such that $(I,Z^{\star})\in\mathcal{D}$ , beyond $Z^{\star}$ as a set of feasible solutions, we also associate with the instance two other sets of candidate solutions $Z_{R},Z_{N}\subset Z$ , such that the dataset can be described as

\mathcal{D}_{\rm feas}=\left\{(I,\hat{\boldsymbol{z}},y)\in\mathcal{I}\times Z\times\{0,1\}:(I,Z^{\star})\in\mathcal{D},\hat{\boldsymbol{z}}\in Z^{\star}\cup Z_{R}\cup Z_{N},y=p(\hat{\boldsymbol{z}}\in Z|I)\right\}.

Note that $p(\hat{\boldsymbol{z}}\in Z|I)\in\{0,1\}$ , as it is computed analytically by evaluating the constraints.

The first set is built with randomly sampled instances

Z_{R}=\left\{\boldsymbol{\hat{z}}\in Z:\boldsymbol{\hat{z}}\sim\mathcal{U}\left[\{0,1\}^{JT}\right]\right\},

where $\mathcal{U}\left[\{0,1\}^{JT}\right]$ denotes a uniform distribution over the set of all possible assignments for the binary variables. The second set is built with candidate solutions from the neighborhood of the solutions in $Z^{\star}$ ,

Z_{N}=\left\{\boldsymbol{\hat{z}}\in Z:\|\boldsymbol{\hat{z}}-\boldsymbol{z}^{\star}\|_{0}\leq\eta,\boldsymbol{z}^{\star}\in Z^{\star}\right\},

where $\eta\in\mathbb{N}$ limits the size of the neighborhood. In practice, we build candidate solutions in $Z_{N}$ by randomly flipping at most $\eta$ values from a randomly selected candidate solution from $Z^{\star}$ . Intuitively, $Z^{\star}$ and $Z_{N}$ are used to teach the model the limits of the feasible region, while $Z_{R}$ carries general information about the domain of the problem.

5.3.2 Optimal Solution Prediction

We want to train the SatGNN model to predict the variables’ bias in optimal solutions for instances of the ONTS problem $I\in\mathcal{I}$ . More precisely, we want the model to estimate the probability that each binary variable takes a positive value in the optimal solution, which we will denote as

\boldsymbol{\hat{p}}(\boldsymbol{z}^{\star}=1|I)=\begin{bmatrix}\hat{p}(z^{\star}_{1,1}=1|I)\\ \vdots\\ \hat{p}(z^{\star}_{J,T}=1|I)]\end{bmatrix}.

Therefore, we have a node classification task for which we apply the sigmoid function at the output of SatGNN for each variable node such that

	$\displaystyle{\rm SatGNN}:\mathcal{I}\times\Theta$	$\displaystyle\longrightarrow[0,1]^{JT}$
	$\displaystyle I;\theta$	$\displaystyle\longmapsto\hat{\boldsymbol{p}}(\boldsymbol{z}^{\star}=1\|I)={\rm SatGNN}(I;\theta),$

where $\theta\in\Theta$ is the vector of trainable parameters. Note that the most probable candidate solution $\boldsymbol{\hat{z}}=(\hat{\boldsymbol{x}},\hat{\boldsymbol{\phi}})\in\{0,1\}^{2JT}$ is easily obtained given the model’s prediction as

\hat{x}_{j,t}=\lceil\hat{p}(x^{\star}_{j,t}=1|I)\rfloor,\forall(j,t)\in\mathcal{J}\times\mathcal{T}

and similarly for $\hat{\phi}_{j,t}$ .

One way to train the model is to minimize the BCE between the predicted probability and the single best solution known for each instance, minimizing the loss

\min_{\theta}\mathcal{L}_{\rm opt\text{-}b}(\theta)=\sum_{(I,\boldsymbol{z}^{\star})\in\mathcal{D}_{\rm opt\text{-}b}}{\rm BCE}({\rm SatGNN}(I;\theta),\boldsymbol{z}^{\star}),

where the dataset is built such that $\boldsymbol{z}^{\star}$ is the best solution available for each instance in $\mathcal{D}$ , i.e.,

\mathcal{D}_{\rm opt\text{-}b}=\left\{(I,\boldsymbol{z}^{\star})\in\mathcal{I}\times Z:(I,Z^{\star})\in\mathcal{D},\boldsymbol{z}^{\star}=\arg\max_{\hat{\boldsymbol{z}}\in Z^{\star}}{\rm QoS}(\hat{\boldsymbol{z}};\boldsymbol{u}_{I})\right\},

in which $\boldsymbol{u}_{I}$ is the vector of priorities from instance $I$ (see Sec. 3.1).

Another way is through the approach proposed by Nair et al. [30], in which multiple solutions are used as targets, being weighted by their objective value. Note that the best-known solution is also used in this approach, along with sub-optimal solutions. In this approach, the optimization can be described as

\min_{\theta}\mathcal{L}_{\rm opt\text{-}m}(\theta)=\sum_{(I,Z^{\star})\in\mathcal{D}_{\rm opt\text{-}m}}\sum_{\boldsymbol{z}^{\star}\in Z^{\star}}w(\boldsymbol{z}^{\star};Z^{\star},I)\cdot{\rm BCE}({\rm SatGNN}(I;\theta),\boldsymbol{z}^{\star}),

where

w(\boldsymbol{z}^{\star};Z^{\star},I)=\frac{\exp({\rm QoS}(\boldsymbol{z}^{\star};\boldsymbol{u}_{I}))}{\sum_{\boldsymbol{z}\in Z^{\star}}\exp({\rm QoS}(\boldsymbol{z};\boldsymbol{u}_{I}))}

is the weight associated with each candidate solution found by the solver, which is built such that the better the objective, the higher the weight, and, for each instance, the weights add up to 1. Note that we can directly take $\mathcal{D}_{\rm opt\text{-}m}\subseteq\mathcal{D}$ .

5.4 SatGNN-based Heuristic

The output of the SatGNN as trained for the optimal solution prediction, that is, the predicted variables’ bias, can be rounded to the closest integer (either 0 or 1) to generate the most probable candidate solution. However, this is not expected to perform well as we have no feasibility guarantees; thus, any error in the network may generate an infeasible solution. Given that ONTS problems of moderate size already have thousands of variables³³3The smallest instance evaluated by Rigo et al. [22] has 1746 binary variables, considering both $x$ and $\phi$ variables., infeasibility is likely to occur. Therefore, we propose using the deep learning model to aid the optimization in three approaches: warm-starting, early-fixing, and trust region.

5.4.1 Warm-starting

Another straightforward approach to using the information learned by the model is to provide (partial) solutions to a solver, warm-starting the optimization. For example, the SCIP solver [34] accepts complete and partial solutions to the problem, which are used to guide the inner heuristics during the optimization process. We use the output of the model to determine which variables will compose the partial solution provided to the solver based on the confidence of the model’s prediction. In other words, the closer the model’s probability output is to the extreme values (0 or 1), the more confident it is of that assignment.

Formally, let $\hat{\boldsymbol{p}}(\boldsymbol{z}^{\star}=1|I)$ be the output of SatGNN given an instance $I\in\mathcal{I}$ as defined in Sec. 5.3.2. Then, we will denote $\boldsymbol{\kappa}(\boldsymbol{z}^{\star}|I)\in[0,1]^{2JT}$ the confidence the model has in each binary variable. For variable $x_{j,t}$ of instance $I$ , given that the model’s output is $\hat{p}(x_{j,t}^{\star}=1|I)$ , then we can describe the respective confidence $\kappa(x_{j,t}|I)$ as

\kappa(x_{j,t}|I)=\begin{cases}\hat{p}(x_{j,t}^{\star}=1|I),&\text{if}\quad\hat{p}(x_{j,t}^{\star}=1|I)\geq 0.5\\ \hat{p}(x_{j,t}^{\star}=0|I)=1-\hat{p}(x_{j,t}^{\star}=1|I),&\text{otherwise}\end{cases}.

The confidence associated with the $\phi_{j,t}$ variables can be described similarly. Note that the confidence is always between 0.5 and 1.

Let us write $\boldsymbol{z}=(\boldsymbol{x},\boldsymbol{\phi})=(z_{1},\ldots,z_{k},\ldots,z_{2JT})$ . We define $\mathcal{C}^{N}_{I}\subseteq\{1,\ldots,2JT\}$ as the set of indices of the $N$ variables that the model is most confident of, that is, $|\mathcal{C}^{N}_{I}|=N$ and

\mathcal{C}^{N}_{I}=\left\{k\in\{1,\ldots,2JT\}:\kappa(z^{\star}_{k}|I)\geq\kappa(z^{\star}_{k^{\prime}}|I),\forall k^{\prime}\not\in\mathcal{C}^{N}_{I}\right\}.

Then, a partial solution $\bar{\boldsymbol{z}}^{(N)}$ of size $N$ can be written

\bar{\boldsymbol{z}}^{(N)}=\left[\hat{z}_{k}\right]_{k\in\mathcal{C}^{N}_{I}},

which contains the predicted assignments of the highest confidence and can be provided to the solver.

Note that warm-starting does not configure a heuristic solution to the problem, as it just modifies the behavior of the heuristics already present in the algorithmic solution. In other words, warm-starting will only modify the order of exploring the B&B tree. Therefore, we maintain optimality and feasibility guarantees, even in the worst-case scenario when the model provides the solver with bad, infeasible solutions.

5.4.2 Early-fixing

A heuristic solution can be made by fixing the value assignments based on the most confident biases predicted by SatGNN, similar to what is proposed in the Neural Diving approach by Nair et al. [30]. Compared to the warm-starting approach, instead of merely informing the solver, we modify the instance of the problem and fix the values of the variables that the model is most confident of before calling the solver. More precisely, early-fixing is equivalent (in principle) to adding to the problem constraints of the form

z_{k}=\hat{z}_{k},\forall k\in\mathcal{C}^{N}_{I},

which limits $N$ variables to the values predicted by the model. In practice, however, no constraints are added, but the assignment in the existing constraints and the objective replaces the variables. Thus, the resulting sub-problem will contain fewer binary variables and, thus, is expected to be faster to solve.

Note that every feasible solution to the early-fixed problem and the fixed variables is also feasible in the original instance. However, an optimal solution to the sub-problem might not be optimal for the original instance. Even worse, early-fixing the problem might render it infeasible if the model predicts a poor variable assignment. Therefore, this approach of early-fixing is a matheuristic, as it uses mathematical optimization to find heuristic solutions.

5.4.3 Trust Region

Instead of fixing the variables that the model is most confident of, we may allow a slight deviation from the partial candidate solution. In other words, we can use SatGNN’s prediction to define a trust region [12]. A new constraint is added to the problem of the form

\sum_{k\in\mathcal{C}^{N}_{I}}\left|z_{k}-\hat{z}_{k}\right|\leq\Delta,

where $\Delta\in\mathbb{N}$ is the parameter that defines the size of the trust region. This limits the solution space to vectors in the vicinity of the partial candidate solution.

In the above definition, it is easy to see that the early-fix is a particular case of the trust region method with $\Delta=0$ . In fact, just like the early-fixing approach, allowing a trust region around the (partial) candidate solution also configures a matheuristic, as neither optimality nor feasibility are guaranteed.

6 Experiments and Results

The experiments for this paper were conducted in Python, using PyTorch, the DGL libraries, and the SCIP solver, on a server with an Intel i7-12700 16-Core (12 cores, 20 threads), an NVIDIA RTX A4000, 16 GB of memory, and Ubuntu 22.04.1 LTS 64 bits. Implementation details are available in the paper’s accompanying repository⁴⁴4https://github.com/brunompacheco/sat-gnn.

In the following sections, we present three experiments addressing the two research questions raised in the introduction (Sec. 1). In the first experiment, we train the proposed SatGNN model to classify the feasibility of candidate solutions for problem instances. We consider three scenarios of increasing difficulty with respect to the generalization performance. In the second experiment we train SatGNN models to predict the bias for the binary variables of problem instances, effectively learning the probability that each binary variable has to assume a positive value in an optimal solution. We compare two training approaches, considering either the optimal solution as the target or multiple feasible solutions as targets for each instance. Finally, we use the models trained for optimality prediction to generate candidate solutions, which are used to construct heuristics to the ONTS problem. More specifically, we use the trained models to construct three approaches (as presented in Sec. 5.4) to improve the SCIP solver in its off-the-shelf setting: warm-starting, early-fixing, and trust region.

6.1 Dataset

The data for the following experiments is generated according to the procedure described in Sec. 5.1. The dataset construction requires verifying feasibility solving to (quasi-)optimality instances of the ONTS problem. In other words, the dataset construction imposes a high computational cost, which scales with the size of the problem.

To circumvent the cost of dataset generation, we build a training set with many instances of smaller size, and validation and test sets with fewer-but-larger instances. This way, we can afford a sufficient volume of data for training while evaluating on the most relevant instances, which are the ones with a higher solving cost. At the same time, we test the model’s generalization capability by evaluating its performance on instances harder than those seen during training.

We base our data generation in the instances proposed by Rigo et al. [33], which take the FloripaSat-I mission as a reference [35] (see A). More precisely, we focus on instances with scheduling horizon $T=125$ .

We generate 200 instances for each $J\in\{9,13,18\}$ jobs, and 40 instances for each $J\in\{20,22,24\}$ jobs, in a total of 720 instances.

The dataset is generated through Algorithm 1. We solve each instance and gather 500 of the best solutions possible, limited to a time budget of 5 minutes. More precisely, every instance $(I,Z^{\star})\in\mathcal{D}$ is such that $|Z^{\star}|=500$ . An example of the optimal schedule for one of the generated instances with $J=9$ can be seen in Figure 1. All instances and solutions are made publicly available [36].

6.2 Feasibility Classification

As described in Sec. 5.3.1, we build the dataset for feasibility classification $\mathcal{D}_{\rm feas}$ by adding random candidate solutions to $Z^{\star}$ for every $(I,Z^{\star})\in\mathcal{D}$ . More precisely, for every instance of the ONTS problem, we generate $Z_{R}$ and $Z_{N}$ such that $|Z_{R}|=250$ and $|Z_{N}|=250$ , in a total of 1000 solutions for every instance.

SatGNN is modified for graph classification as described in Sec. 5.3.1. We use the graph convolution proposed by Kipf and Welling [31], as presented in equation (8), and a single layer ( $L=1$ ). The hidden features all have a fixed size $d=8$ . Both ${\rm NN}_{\rm con}$ and ${\rm NN}_{\rm var}$ are neural networks with a single layer and ReLU activation, while ${\rm NN}_{\rm out}$ is an artificial neural network with 3 layers and ReLU activation in the hidden layers and sigmoid activation at the last layer. For all feasibility classification experiments, SatGNN is trained using Adam [37] with a learning rate of $10^{-3}$ to minimize the binary cross-entropy between the prediction and the feasibility of the candidate solution. The model is trained until the training loss becomes smaller than $10^{-2}$ , limited to a maximum of 200 epochs.

First, SatGNN is trained for one particular instance. Given an instance $I$ , we build $\mathcal{D}_{\rm feas}^{I}=\{(\boldsymbol{\hat{z}},y)\in Z\times\{0,1\}:(I,\boldsymbol{\hat{z}},y)\in\mathcal{D}_{\rm feas}\}$ . This dataset is randomly divided into a training and a test set in an 80-20 split, so 800 instances are used to train the model. For each number of jobs $J\in\{9,13,18,20,22,24\}$ , we selected 5 different instances randomly. In all experiments (all instances of all sizes), SatGNN achieved 100% accuracy, that is, the model could perfectly distinguish between feasible and infeasible candidate solutions.

Then, SatGNN is trained to generalize across instances with the same number of jobs. For each number of jobs $J$ , we build the dataset by selecting only instances (and the respective candidate solutions) with the same number of jobs, i.e.,

\mathcal{D}_{\rm feas}^{J}=\{(I,\hat{\boldsymbol{z}},y)\in\mathcal{D}_{\rm feas}:J_{I}=J\},

(12)

where $J_{I}$ represents the number of jobs of instance $I$ . 20 of the generated instances are randomly selected for training, in a total of 20,000 training samples, while 10 different instances are randomly selected for testing. SatGNN had a very high performance, with over 90% accuracy on all settings. The complete test set performance for the different number of jobs can be seen in Figure 5.

Finally, SatGNN is trained solely on candidate solutions from small instances (i.e., those with $J\in\{9,13,18\}$ ) and tested on candidate solutions from large instances (i.e., those with $J\in\{20,22,24\}$ ). Following the notation of equation (12), we use $\mathcal{D}_{\rm feas}^{9}\cup\mathcal{D}_{\rm feas}^{13}\cup\mathcal{D}_{\rm feas}^{18}$ for training and $\mathcal{D}_{\rm feas}^{20}\cup\mathcal{D}_{\rm feas}^{22}\cup\mathcal{D}_{\rm feas}^{24}$ for testing. In total, 60,000 samples are used for training and 30,000 samples are used for testing. SatGNN achieved 94.15% accuracy in the test set. The performance across the different sizes of instances and the groups of candidate solutions is presented in Table 2. We see that the model had more difficulty in the candidate solutions from the $Z_{N}$ sets, the candidate solutions in the edge of the feasible region. It is also true that the performance decreased with problem size (number of jobs), indicating an increasing difficulty, as expected.

Table 2: Feasibility classification performance of SatGNN on instances larger than those used for training. The test set is discriminated into the three sets of candidate solutions that compose it (see Sec. 5.3.1). Each row indicates the set of 10 instances with the same size (number of jobs).

	Accuracy
# of jobs	$Z^{\star}$	$Z_{N}$	$Z_{R}$	Total
20	100%	90%	100%	97.5%
22	100%	80%	100%	95%
24	90%	90%	89.84%	89.96%

6.3 Optimal Solution Prediction

Following the methodology presented in Sec. 5.3.2, we first generate the datasets $\mathcal{D}_{\rm opt\text{-}b}$ and $\mathcal{D}_{\rm opt\text{-}m}$ from the dataset $\mathcal{D}$ (described in Sec. 6.1). Having both datasets allows us to compare the two different training strategies: training with a single target (quasi-optimal solution), and training with the best solutions found (which include the quasi-optimal solutions). The data is divided into training, validation, and test sets. The training sets contain the 200 small instances from $\mathcal{D}$ , with 9, 13 or 18 jobs, while the validation and test sets contain, each, 20 large instances from $\mathcal{D}$ , with 20, 22 or 24 jobs, such that no instance is used in both validation and test (empty intersection).

The models are trained according to the description provided in Sec. 5.3.2. Both when training with the best solution for each instance and when training with multiple solutions, the models are trained using Adam to minimize the BCE between the prediction and the targets. The validation set is used to perform hyperparameter tuning. More specifically, learning rate, size, number of hidden features ( $d$ ), graph convolution (from the two presented in Sec. 4), whether the convolutions would share parameters or not, and number of layers ( $L$ ) are determined through random search. In our experiments with training with the best solution, SatGNN performed best on the validation set with 2 layers, 64 hidden features at each layer, and a learning rate of $10^{-2}$ . When training with multiple solutions, SatGNN performed best with 3 layers, 256 hidden features, and a $10^{-3}$ learning rate. On both scenarios, sharing the parameters across graph convolutions and using the SAGE graph convolution was the best choice.

We use the validation set to perform early-stopping of the training, i.e., during the training budget (100 epochs), we pick the model that performed the best on the validation set. The training curve for both training approaches can be seen in Figure 6. After training with the best solution, the average BCE on the validation set was 0.2887 and on the test set 0.2873. After training with multiple solutions, the average BCE on the validation set was 0.2451 and on the test set 0.2482. Although the BCE values are not comparable across training approaches, the small difference between the validation and the test sets for both approaches indicates no sign of overfitting.

A deeper analysis is performed on the models’ confidence when predicting variable assignments $\boldsymbol{\kappa}(\boldsymbol{z}=1|I)$ , as presented in Sec. 5.4.1. The average confidence of both models on instances of the test set is depicted in Figure 7 with respect to the binary variables of the optimization problem. Training with multiple solutions seems to result in a more confident model overall. Furthermore, both models provide significantly more confident predictions for the $\boldsymbol{\phi}$ variables.

6.4 SatGNN-based Heuristic

The two models presented in Sec. 6.3 are used (each) to build three matheuristics, as described in Sec. 5.3.2. Namely, the two models are the SatGNN trained with the best solution available and the SatGNN trained with multiple solutions (see Sec. 5.3.2). The three matheuristics evaluated are warm-starting, early-fixing, and trust region. We use SCIP as our solver both for the baseline results and the optimization within the matheuristics.

Although all model parameters and hyperparameters are already defined (and will not change) by the time the heuristics are assembled, the heuristics have their own hyperparameters, which require adjustment. More specifically, all three heuristic approaches have the hyperparameter $N\in\mathbb{N}$ , representing the size of the partial solution extracted from SatGNN. Besides, the trust region matheuristic has the size of the trust region $\Delta\in\mathbb{N}$ .

We use the validation set to select the best values for the hyperparameters, aiming to optimize two performance metrics. For once, we select values for the hyperparameters that maximize the expected objective value. The objective values are normalized, i.e., we divide them by the known optimal for each instance, resulting in a value between $0$ (no QoS) and $1$ (max. QoS possible for the instance). In parallel, we select the hyperparameters that minimize the time required to find a feasible solution to each problem. The two tunings are performed for the SatGNN model trained with the best solution available and the SatGNN trained with multiple solutions. Table 3 summarizes the best hyperparameters found for each model and each heuristic approach, in both evaluations.

Table 3: Best values for partial solution size (

N

) and trust region size (

\Delta

) for each heuristic approach and each SatGNN model, as evaluated on the validation set. Objective corresponds to the values that maximize the objective value of the best solution found. Feasibility indicates the values that minimize the amount of time to find a feasible solution.

		Objective		Feasibility
SatGNN model	Heuristic	$N$	$\Delta$	$N$	$\Delta$
Best Solution	Warm-start	750	-	1000	-
	Early-fix	500	-	750	-
	Trust region	1000	5	1000	1
Multiple Solutions	Warm-start	1750	-	1500	-
	Early-fix	1000	-	1250	-
	Trust region	1250	1	1750	1

The heuristics are then evaluated on the test set, which is not used for tuning neither the deep learning models’ hyperparameters nor the heuristics’ hyperparameters. The final performance is measured through the expected relative objective value given the time budget, and the time it takes the heuristics to find a feasible solution. Furthermore, we also analyze the progress of the lower bound over time, indicating the expected performance under more restricted budgets. Figure 8 illustrates the results.

To assess the significance of the results presented in Figure 8, we apply the Wilcoxon signed-rank test [38], which is a non-parametric version of the t-test for matched pairs. We apply the test pair-wise, comparing each matheuristic approach to every other matheuristic approach, and to the baseline. The results are summarized in the critical difference diagram of Figure 9. Through early-fixing, both SatGNN models provided statistically significant ( $p$ -value $>0.05$ ) improvements over the baseline in both performance metrics. However, the results show a significant advantage of the model trained with multiple solutions, providing better solutions (objective value) through early-fixing and trust region, and speeding up the feasibility definition (time to find a feasible solution) through all heuristic approaches.

Through the SatGNN model trained with multiple solutions, a 43% increase in the expected objective value within the time budget was achieved by early-fixing the partial solution, while defining a trust region around that same partial solution resulted in an expected 45% increase. Although the results are close, we see from the progress of the lower bound (in Fig. 8(a), the plot on the right) that the early-fix heuristic was able to find better solutions more quickly than the trust region method during the time budget. In terms of feasibility, the early-fixing strategy using the same SatGNN model reduced in 35% the expected time to find a feasible solution, having a significant advantage over all other heuristic approaches. Surprisingly, even when optimized (through hyperparameter tuning) for reducing the time to find a feasible solution, the early-fix heuristic still improved the expected objective value of the best solution found in 41%, as the lower bound progress illustrates (in Fig. 8(b), the plot on the right).

6.4.1 Partial Solution Size

To evaluate the impact of the value of $N$ , the number of variables in the partial solution being fed to the matheuristics, we further evaluate the SatGNN model on the test. More specifically, we vary the value of $N$ for all matheuristics and measure the impact on the test set considering both perspectives (objective value and time to find a feasible solution). For completeness, we also vary $\Delta$ , the size of the trust region, to evaluate the full potential of the matheuristic. For conciseness, we consider only the SatGNN model trained with multiple solutions, as it was overall superior to the model trained solely with the best solution in our experiments. The results are illustrated in Figure 10.

As expected, the number of variables that compose the partial solution is impactful for all approaches, with early-fixing being the one most sensitive to it, and warm-start the less sensitive one. Furthermore, we note a relation between $N$ and $\Delta$ through the peak performance of the trust region method (including early-fixing, which is equivalent to trust region with $\Delta=0$ ), that is, larger partial solutions seem to require larger trust regions. Intuitively, this result is expected since the partial solutions are based on confidence, and, thus, the larger the partial solution is, the higher the expected probability of it including wrongly predicted variables. Therefore, this results suggests that the SatGNN model is properly trained (its confidence correlates to its performance).

7 Discussion

In this section, we discuss the results obtained from a series of experiments conducted to address the research questions posed in the introduction. The experiments aimed to evaluate the effectiveness of GNNs in solving the ONTS problem. More specifically, we use our proposed model, SatGNN, which is based in the models by Gasse et al. [28] and Hamilton et al. [32]. Our experiments evaluate whether SatGNN can learn feasibility and optimality given large instances of the ONTS problem, and whether the trained model can be used in heuristic solutions.

The results of the first set of experiments (Sec. 6.2) indicate that the SatGNN can provide accurate feasibility estimations even in challenging scenarios. Three scenarios of increasing difficulty were considered, and the results were quite promising, with over 89% accuracy in all experiments. In the most challenging scenario, SatGNN is evaluated on large ( $J\in\{20,22,24\}$ ) unseen instances, but trained on small instances ( $J\in\{9,13,18\}$ ), and it achieved 94.15% accuracy in feasibility classification. The remarkable accuracy indicates that SatGNN is highly effective at learning the constraints of the ONTS problem. This performance is expected as the feasibility classification task requires the model to learn linear classification boundaries, which are derived from the linear constraints of the problem. In other words, the model learns the indicator function of a convex set (considering the LP relaxation of the problem), which neural networks are known to be suitable for.

The second set of experiments (Sec. 6.3) aimed to predict optimal solutions to ONTS instances using SatGNN. More specifically, SatGNN was trained to predict the biases of the binary variables for multiple instances of the ONTS problem. Two training approaches were explored: one trained with a single target (the quasi-optimal solution) and another trained with multiple solutions (the best feasible solutions found). Both approaches showed promising results, with no signs of overfitting, as indicated by the small difference between validation and test set performance. Interestingly, the model trained with multiple solutions tended to be more confident in its predictions, particularly for the binary variables. This increased confidence could be valuable in guiding the optimization process, as it indicates the model’s certainty about its predictions.

Although the results from the second set of experiments suggest that SatGNN can effectively predict optimal or near-optimal solutions to ONTS instances, we put the models to the test through a third set of experiments (Sec. 6.4) in which we build (mat)heuristic solutions based on the output of the SatGNN trained for optimality prediction. Namely, as detailed in Sec. 5.4, three approaches were evaluated for applying the model’s output: warm-start, early fix, and trust region. Each approach is optimized for finding the best solution, giving the time budget (2 min.) and reducing the time to find a feasible solution (2 separate experiments). First, we notice that in all direct comparisons, the SatGNN model trained with multiple solutions resulted in better heuristic solutions than the model trained with a single solution, as was suggested by the results of the second set of experiments.

Early-fixing the partial solution provided by the SatGNN models yielded the best results overall, showing a 43% gain in the evaluation of the expected objective value within the time budget, and a 35% reduction in the evaluation of the expected time to find a feasible solution. The trust region approach, which can be seen as a relaxation of the early-fixing, was comparable to early-fixing in terms of the best solution found within the time limit, but overall seems to provide a worse trade-off, as both early-fixing and trust region do not provide optimality guarantees, but trust region takes longer to find good solutions to the ONTS problem. Using the SatGNN models to warm-start the SCIP solver significantly reduced the time to find a feasible solution (in comparison to the baseline), but was overall worse than the two other approaches. However, warm-start has the advantage of providing optimality guarantees, which makes it more widely applicable.

Across all experiments, a highlight of the GNNs’ performance was the generalization to larger instances. This characteristic is essential for problems with varying instance size and is not trivially achievable with traditional deep-learning models. However, it is also key for tasks with higher acquisition costs for certain problem sizes, as in the case of the ONTS problem. Large instances of optimization problems are the ones that drive the research of heuristic solutions, as they are usually more expensive to solve; however, as we need to solve instances to create a training dataset, it often becomes infeasible to train the deep learning models on large instances. In this context, the generalization results from our experiments suggest that GNNs can be effectively trained using easy (cheap to acquire) instances of the problem, and be used on the hard instances of interest.

Generating reliable instances to train and evaluate solvers is a key point in such a project, as the confidence of the results depends on how carefully instances are designed to compose the test set [39]. Data generation is particularly critical when historical data is not available [9], which motivates the research on algorithms for generating reliable instances [39, 40]. We follow the problem definition of the FloripaSat-I mission [35] to define the parameters’ ranges used to sample new instances uniformly. Although the data generation process is reliable, it still depends on the optimization of every randomly generated instance. Furthermore, as we restrict ourselves to feasible instances, we only add to the dataset instances for which the solver could find at least one feasible solution within the time budget, effectively restricting our data to an easier sub-problem [41]. Given the nature of the ONTS problem, a future research direction could be generating large instances by combining the job parameters from easier instances, thus generating instances that extrapolate the time budget restriction.

8 Conclusion

This work has proposed a novel approach to tackle the ONTS problem using graph neural networks. More specifically, we investigated whether a GNN can learn the structure of the problem and be used to build effective heuristic solutions. Our experiments showed that our proposed architecture, SatGNN, successfully classifies the feasibility of candidate solutions and is also able to provide reliable partial solutions to varied instances of the ONTS problem. Not only was the model able to generalize to unseen instances, but it also showed promising results on out-of-distribution instances, which were larger than the ones seen during training. This outcome shows how the inherent symmetries of graph neural networks make them suitable for dealing with the structures of the optimization problem.

By leveraging partial solutions provided by the SatGNN model, we built heuristic solutions to the ONTS problem that outperformed the SCIP solver in hard instances. More specifically, by fixing the variable assignment from the partial solution provided by SatGNN, our heuristics were able to improve up to 43% in the expected objective value of the best solution found within the time budget, and reduce up to 35% in the expected time to find a feasible solution.

Overall, the experiments presented in this paper showcase the potential of GNNs for addressing the ONTS problem. The high accuracy in feasibility classification, the ability to predict optimal solutions, and the effectiveness of SatGNN-based heuristics suggest that this approach has the potential to revolutionize the way complex space mission scheduling problems are tackled, enabling more efficient and reliable scheduling for nanosatellite missions. Further research could explore the scalability of these methods to larger problem instances, problems with satellite constellations, and real-world applications in space mission planning and scheduling.

Acknowledgments

The authors acknowledge support from CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico) under grant number 150281/2022-6, 404576/2021-4 as well as FAPESC under grant number 2021TR001851.

References

[1] W. A. Shiroma, L. K. Martin, J. M. Akagi, J. T. Akagi, B. L. Wolfe, B. A. Fewell, and A. T. Ohta, “CubeSats: A bright future for nanosatellites,” Central European Journal of Engineering, vol. 1, pp. 9–15, Mar. 2011.
[2] B. Lucia, B. Denby, Z. Manchester, H. Desai, E. Ruppel, and A. Colin, “Computational nanosatellite constellations: Opportunities and challenges,” GetMobile: Mobile Comp. and Comm., vol. 25, p. 16–23, jun 2021.
[3] G. W. Nagel, E. M. L. de Moraes Novo, and M. Kampel, “Nanosatellites applied to optical earth observation: a review,” Ambiente e Agua - An Interdisciplinary Journal of Applied Science, vol. 15, p. 1, June 2020.
[4] N. Saeed, A. Elzanaty, H. Almorad, H. Dahrouj, T. Y. Al-Naffouri, and M.-S. Alouini, “Cubesat communications: Recent advances and future challenges,” IEEE Communications Surveys & Tutorials, vol. 22, no. 3, pp. 1839–1862, 2020.
[5] C. A. Rigo, L. O. Seman, E. Camponogara, E. M. Filho, and E. A. Bezerra, “Task scheduling for optimal power management and quality-of-service assurance in CubeSats,” Acta Astronautica, vol. 179, pp. 550–560, 2021.
[6] C. A. Rigo, L. O. Seman, E. Camponogara, E. M. Filho, and E. A. Bezerra, “A nanosatellite task scheduling framework to improve mission value using fuzzy constraints,” Expert Systems with Applications, vol. 175, p. 114784, 2021.
[7] L. O. Seman, B. F. Ribeiro, C. A. Rigo, E. M. Filho, E. Camponogara, R. Leonardi, and E. A. Bezerra, “An energy-aware task scheduling for quality-of-service assurance in constellations of nanosatellites,” Sensors, vol. 22, no. 10, 2022.
[8] E. Camponogara, L. O. Seman, C. A. Rigo, E. M. Filho, B. F. Ribeiro, and E. A. Bezerra, “A continuous-time formulation for optimal task scheduling and quality-of-service assurance in nanosatellites,” Computers & Operations Research, vol. 147, p. 105945, 2022.
[9] Y. Bengio, A. Lodi, and A. Prouvost, “Machine learning for combinatorial optimization: A methodological tour d’horizon,” European Journal of Operational Research, vol. 290, no. 2, pp. 405–421, 2021.
[10] M. Karimi-Mamaghan, M. Mohammadi, P. Meyer, A. M. Karimi-Mamaghan, and E.-G. Talbi, “Machine learning at the service of meta-heuristics for solving combinatorial optimization problems: A state-of-the-art,” European Journal of Operational Research, vol. 296, no. 2, pp. 393–422, 2022.
[11] B. M. Pacheco, L. O. Seman, and E. Camponogara, “Deep-learning-based early fixing for gas-lifted oil production optimization: Supervised and weakly-supervised approaches,” 2023.
[12] Q. Han, L. Yang, Q. Chen, X. Zhou, D. Zhang, A. Wang, R. Sun, and X. Luo, “A GNN-guided predict-and-search framework for mixed-integer linear programming,” Mar. 2023. arXiv:2302.05636 [math].
[13] A. Parmentier and V. T’Kindt, “Structured learning based heuristics to solve the single machine scheduling problem with release times and sum of completion times,” European Journal of Operational Research, vol. 305, no. 3, pp. 1032–1041, 2023.
[14] W. Guo, M. Vanhoucke, and J. Coelho, “A prediction model for ranking branch-and-bound procedures for the resource-constrained project scheduling problem,” European Journal of Operational Research, vol. 306, no. 2, pp. 579–595, 2023.
[15] Y. Yang, N. Boland, B. Dilkina, and M. Savelsbergh, “Learning generalized strong branching for set covering, set packing, and 0–1 knapsack problems,” European Journal of Operational Research, vol. 301, no. 3, pp. 828–840, 2022.
[16] J. Zhang, C. Liu, X. Li, H.-L. Zhen, M. Yuan, Y. Li, and J. Yan, “A survey for solving mixed integer programming via machine learning,” Neurocomputing, vol. 519, pp. 205–217, 2023.
[17] E. B. Khalil, C. Morris, and A. Lodi, “MIP-GNN: A data-driven framework for guiding combinatorial solvers,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 10219–10227, June 2022. Number: 9.
[18] M. A. Boschetti and V. Maniezzo, “Matheuristics: using mathematics for heuristic design,” 4OR, vol. 20, pp. 173–208, June 2022.
[19] J. Wang, X. Zhu, D. Qiu, and L. T. Yang, “Dynamic scheduling for emergency tasks on distributed imaging satellites with task merging,” IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 9, pp. 2275–2285, 2013.
[20] J. Wang, E. Demeulemeester, and D. Qiu, “A pure proactive scheduling algorithm for multiple earth observation satellites under uncertainties of clouds,” Computers & Operations Research, vol. 74, pp. 1–13, Oct. 2016.
[21] J. Cui and X. Zhang, “Application of a multi-satellite dynamic mission scheduling model based on mission priority in emergency response,” Sensors, vol. 19, p. 1430, Jan. 2019. Number: 6 Publisher: Multidisciplinary Digital Publishing Institute.
[22] C. A. Rigo, L. O. Seman, E. Camponogara, E. M. Filho, E. A. Bezerra, and P. Munari, “A branch-and-price algorithm for nanosatellite task scheduling to improve mission quality-of-service,” European Journal of Operational Research, vol. 303, no. 1, pp. 168–183, 2022.
[23] M. Kruber, M. E. Lübbecke, and A. Parmentier, “Learning when to use a decomposition,” in Integration of AI and OR Techniques in Constraint Programming (D. Salvagnin and M. Lombardi, eds.), Lecture Notes in Computer Science, (Cham), pp. 202–210, 2017.
[24] J. J. Hopfield and D. W. Tank, “‘Neural’ computation of decisions in optimization problems,” Biological Cybernetics, vol. 52, pp. 141–152, July 1985.
[25] K. A. Smith, “Neural networks for combinatorial optimization: A review of more than a decade of research,” INFORMS Journal on Computing, vol. 11, pp. 15–34, Feb. 1999.
[26] Q. Cappart, D. Chételat, E. Khalil, A. Lodi, C. Morris, and P. Veličković, “Combinatorial optimization and reasoning with graph neural networks,” Sept. 2022. arXiv:2102.09544 [cs, math, stat].
[27] E. Khalil, H. Dai, Y. Zhang, B. Dilkina, and L. Song, “Learning combinatorial optimization algorithms over graphs,” in Advances in Neural Information Processing Systems, vol. 30, Curran Associates, Inc., 2017.
[28] M. Gasse, D. Chételat, N. Ferroni, L. Charlin, and A. Lodi, “Exact combinatorial optimization with graph convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 32, 2019.
[29] J.-Y. Ding, C. Zhang, L. Shen, S. Li, B. Wang, Y. Xu, and L. Song, “Accelerating primal solution findings for mixed integer programs based on solution prediction,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 1452–1459, 2020. Issue: 02.
[30] V. Nair, M. Alizadeh, and others, “Neural large neighborhood search,” in Learning Meets Combinatorial Algorithms at NeurIPS2020, 2020.
[31] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” Feb. 2017. arXiv:1609.02907.
[32] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” 2017. arXiv:1706.02216.
[33] C. A. Rigo, E. Morsch Filho, L. O. Seman, L. Loures, and V. R. Q. Leithardt, “Instance and data generation for the offline nanosatellite task scheduling problem,” Data, vol. 8, no. 3, 2023.
[34] K. Bestuzheva, M. Besançon, W.-K. Chen, A. Chmiela, T. Donkiewicz, J. v. Doornmalen, L. Eifler, O. Gaul, G. Gamrath, A. Gleixner, L. Gottwald, C. Graczyk, K. Halbig, A. Hoen, C. Hojny, R. v. d. Hulst, T. Koch, M. Lübbecke, S. J. Maher, F. Matter, E. Mühmer, B. Müller, M. E. Pfetsch, D. Rehfeldt, S. Schlein, F. Schlösser, F. Serrano, Y. Shinano, B. Sofranac, M. Turner, S. Vigerske, F. Wegscheider, P. Wellner, D. Weninger, and J. Witzig, “The SCIP Optimization Suite 8.0,” ZIB-Report 21-41, Zuse Institute Berlin, Dec. 2021.
[35] G. M. Marcelino, S. Vega-Martinez, L. O. Seman, L. Kessler Slongo, and E. A. Bezerra, “A critical embedded system challenge: The FloripaSat-1 mission,” IEEE Latin America Transactions, vol. 18, no. 2, pp. 249–256, 2020.
[36] B. M. Pacheco, L. O. Seman, C. A. Rigo, and E. Camponogara, “Floripasat milps,” Sept. 2023.
[37] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (Y. Bengio and Y. LeCun, eds.), 2015.
[38] F. Wilcoxon, “Individual comparisons by ranking methods,” Biometrics Bulletin, vol. 1, no. 6, pp. 80–83, 1945.
[39] K. Smith-Miles and S. Bowly, “Generating new test instances by evolving in instance space,” Computers & Operations Research, vol. 63, pp. 102–113, Nov. 2015.
[40] Y. Malitsky, M. Merschformann, B. O’Sullivan, and K. Tierney, “Structure-preserving instance generation,” in Learning and Intelligent Optimization (P. Festa, M. Sellmann, and J. Vanschoren, eds.), Lecture Notes in Computer Science, (Cham), pp. 123–140, 2016.
[41] G. Yehuda, M. Gabel, and A. Schuster, “It’s not what machines can learn, it’s what we cannot teach,” in Proceedings of the 37th International Conference on Machine Learning, pp. 10831–10841, PMLR, Nov. 2020. ISSN: 2640-3498.
[42] E. M. Filho, V. d. P. Nicolau, K. V. d. Paiva, and T. S. Possamai, “A comprehensive attitude formulation with spin for numerical model of irradiance for CubeSats and Picosats,” Applied Thermal Engineering, vol. 168, p. 114859, 2020.

Appendix A Random instance generation

For any particular mission size and orbital length, the main objective is to generate a realistic ONTS case using random data. We have taken the FloripaSat-I mission as a reference for instance generation, which has an altitude of 628 kilometers and an orbital period of 97.2 minutes [35]. The battery-related parameters are fixed for all instances as

	$\displaystyle e$	$\displaystyle\leftarrow 0.9$
	$\displaystyle Q$	$\displaystyle\leftarrow 5$
	$\displaystyle\gamma$	$\displaystyle\leftarrow 5$
	$\displaystyle V_{b}$	$\displaystyle\leftarrow 3.6$
	$\displaystyle\rho$	$\displaystyle\leftarrow 0.0$

The attitude considered here is the Nadir, in which the satellite turns at the same rate around the Earth, so one side (or axis) always faces the Earth’s surface. This analytical model then utilizes a rotation matrix to simulate the satellite’s dynamics and can be adapted for larger or different geometries by adjusting the normal vectors representing the body.

To generate realistic power input vectors $\boldsymbol{r}$ , we use 2 years of historical data of solar irradiance in orbit. More precisely, a window is randomly drawn from the historical data, and an analytical model is used to determine the power input vector. Once orbits are stable and solar flux constant – $1360W/m^{2}$ – one can calculate this vector by knowing the spacecraft orbit, attitude – its kinematics – and size [42].

The remaining parameters are drawn uniformly, given handcrafted limits to increase the feasibility rate. Specifically, given the number of tasks $J$ and the number of time units $T$ , the parameters are drawn as

	$\displaystyle u_{j}$	$\displaystyle\leftarrow\mathcal{U}(1,J)$
	$\displaystyle q_{j}$	$\displaystyle\leftarrow\mathcal{U}(0.3,2.5)$
	$\displaystyle y_{j}^{\min}$	$\displaystyle\leftarrow\mathcal{U}(1,\lceil T/45\rceil)$
	$\displaystyle y_{j}^{\max}$	$\displaystyle\leftarrow\mathcal{U}(y_{j}^{\min},\lceil T/15\rceil)$
	$\displaystyle t_{j}^{\min}$	$\displaystyle\leftarrow\mathcal{U}(1,\lceil T/10\rceil)$
	$\displaystyle t_{j}^{\max}$	$\displaystyle\leftarrow\mathcal{U}(t_{j}^{\min},\lceil T/4\rceil)$
	$\displaystyle p_{j}^{\min}$	$\displaystyle\leftarrow\mathcal{U}(t_{j}^{\min},\lceil T/4\rceil)$
	$\displaystyle p_{j}^{\max}$	$\displaystyle\leftarrow\mathcal{U}(p_{j}^{\min},T)$
	$\displaystyle w_{j}^{\min}$	$\displaystyle\leftarrow\mathcal{U}(0,\lceil T/5\rceil)$
	$\displaystyle w_{j}^{\max}$	$\displaystyle\leftarrow\mathcal{U}(\lfloor T-\lceil T/5\rceil\rfloor,T).$