\floatsetup

[table]capposition=top

ReEvo: Large Language Models as Hyper-Heuristics with Reflective Evolution

Haoran Ye¹, Jiarui Wang², Zhiguang Cao³, Federico Berto⁴,
Chuanbo Hua⁴, Haeyeon Kim⁴, Jinkyoo Park⁴, Guojie Song^{1 5}
¹National Key Laboratory of General Artificial Intelligence,
School of Intelligence Science and Technology, Peking University
²Southeast University ³Singapore Management University
⁴KAIST ⁵PKU-Wuhan Institute for Artificial Intelligence AI4CO
[email protected], [email protected], [email protected]
{fberto, cbhua, haeyeonkim, jinkyoo.park}@kaist.ac.kr, [email protected]
Project Website: https://ai4co.github.io/reevo Correspondence to: Guojie Song <[email protected]>.Work made with contributions from the AI4CO open research community.

Abstract

The omnipresence of NP-hard combinatorial optimization problems (COPs) compels domain experts to engage in trial-and-error heuristic design. The long-standing endeavor of design automation has gained new momentum with the rise of large language models (LLMs). This paper introduces Language Hyper-Heuristics (LHHs), an emerging variant of Hyper-Heuristics that leverages LLMs for heuristic generation, featuring minimal manual intervention and open-ended heuristic spaces. To empower LHHs, we present Reflective Evolution (ReEvo), a novel integration of evolutionary search for efficiently exploring the heuristic space, and LLM reflections to provide verbal gradients within the space. Across five heterogeneous algorithmic types, six different COPs, and both white-box and black-box views of COPs, ReEvo yields state-of-the-art and competitive meta-heuristics, evolutionary algorithms, heuristics, and neural solvers, while being more sample-efficient than prior LHHs.

1 Introduction

NP-hard combinatorial optimization problems (COPs) pervade numerous real-world systems, each characterized by distinct constraints and objectives. The intrinsic complexity and heterogeneity of these problems compel domain experts to laboriously develop heuristics for their approximate solutions [23]. Automation of heuristic designs represents a longstanding pursuit.

Classic Hyper-Heuristics (HHs) automate heuristic design by searching for the best heuristic (combination) from a set of heuristics or heuristic components [64]. Despite decades of development, HHs are limited by heuristic spaces predefined by human experts [64]. The rise of large language models (LLMs) opens up new possibilities for HHs. This paper introduces the general concept of Language Hyper-Heuristics (LHH) to advance beyond preliminary attempts in individual COP settings [68, 46]. LHH constitutes an emerging variant of HH that utilizes LLMs for heuristic generations. It features minimal human intervention and open-ended heuristic spaces, showing promise to comprehensively shift the HH research paradigm.

Pure LHH (e.g., LLM generations alone) is sample-inefficient and exhibits limited inference capability for black-box COPs. This work elicits the power of LHH with Reflective Evolution (ReEvo). ReEvo couples evolutionary search for efficiently exploring heuristic spaces, with self-reflections to boost the reasoning capabilities of LLMs. It emulates human experts by reflecting on the relative performance of two heuristics and gathering insights across iterations. This reflection approach is analogous to interpreting genetic cues and providing "verbal gradient" within search spaces. We introduce fitness landscape analysis and black-box prompting for reliable evaluation of LHHs. The dual-level reflections are shown to enhance heuristic search and induce verbal inference for black-box COPs, enabling ReEvo to outperform prior state-of-the-art (SOTA) LHH [47].

We introduce novel applications of LHHs and yield SOTA solvers with ReEvo: (1) We evolve penalty heuristics for Guided Local Search (GLS), which outperforms SOTA learning-based [52, 24, 75] and knowledge-based [1] (G)LS solvers. (2) We enhance Ant Colony Optimization (ACO) by evolving its heuristic measures, surpassing both neural-enhanced heuristics [94] and expert-designed heuristics [71, 6, 72, 17, 39]. (3) We refine the genetic algorithm (GA) for Electronic Design Automation (EDA) by evolving genetic operators, outperforming expert-designed GA [63] and the SOTA neural solver [31] for the Decap Placement Problem (DPP). (4) Compared to a classic HH [15], ReEvo generates superior constructive heuristics for the Traveling Salesman Problem (TSP). (5) We enhance the generalization of SOTA neural combinatorial optimization (NCO) solvers [37, 51] by evolving heuristics for attention reshaping. For example, we improve the optimality gap of POMO [37] from 52% to 29% and LEHD [51] from 3.2% to 3.0% on TSP1000, with negligible additional time overhead and no need for tuning neural models.

We summarize our contributions as follows. (1) We propose the concept of Language Hyper-Heuristics (LHHs), which bridges emerging attempts using LLMs for heuristic generation with a methodological group that enjoys decades of development. (2) We present Reflective Evolution (ReEvo), coupling evolutionary computation with humanoid reflections to elicit the power of LHHs. We introduce fitness landscape analysis and black-box prompting for reliable LHH evaluations, where ReEvo achieves SOTA sample efficiency. (3) We introduce novel applications of LHHs and present SOTA COP solvers with ReEvo, across five heterogeneous algorithmic types and six different COPs.

2 Related work

Traditional Hyper-Heuristics.

Traditional HHs select the best performing heuristic from a predefined set [13] or generate new heuristics through the combination of simpler heuristic components [15, 104]. HHs offer a higher level of generality in solving various optimization problems [109, 96, 19, 44, 103, 58], but are limited by the heuristic space predefined by human experts.

Neural Combinatorial Optimization.

Recent advances of NCO show promise in learning end-to-end solutions for COPs [2, 93, 3]. NCO can be regarded as a variant of HH, wherein neural architectures and solution pipelines define a heuristic space, and training algorithms search within it. A well-trained neural network (NN), under certain solution pipelines, represents a distinct heuristic. From this perspective, recent advancements in NCO HHs have led to better-aligned neural architectures [28, 51, 34, 73] and advanced solution pipelines [32, 52, 42, 89, 95, 12, 5] to define effective heuristic spaces, and improved training algorithms to efficiently explore heuristic spaces [33, 27, 14, 76, 18, 90, 79, 35], while targeting increasingly broader applications [9, 107, 54, 77]. In this work, we show that ReEvo-generated heuristics can outperform or enhance NCO methods.

LLMs for code generation and optimization.

The rise of LLMs introduces new prospects for diverse fields [88, 82, 105, 25, 50, 99]. Among others, code generation capabilities of LLMs are utilized for code debugging [10, 49], enhancing code performance [55], solving algorithmic competition challenges [41, 70], robotics [38, 43, 81], and general task solving [92, 102]. Interleaving LLM generations with evaluations [74] yields powerful methods for prompt optimization [108, 83, 20], reinforcement learning (RL) reward design [53], algorithmic (self-)improvement [98, 48, 45], neural architecture search [8], and general solution optimization [91, 4, 80], with many under evolutionary frameworks [57, 87, 21, 7, 40]. Most related to ReEvo, concurrent efforts by Liu et al. [47] and Romera-Paredes et al. [68] leverage LLMs to develop heuristics for COPs. We go beyond and propose generic LHH for COPs, along with better sample efficiency, broader applications, more reliable evaluations, and improved heuristics. In addition, ReEvo contributes to a smoother fitness landscape, showing the potential to enhance other tasks involving LLMs for optimization. We present further discussions in Appendix A.

Self-reflections of LLMs.

Shinn et al. [70] propose to reinforce language agents via linguistic feedback, which is subsequently harnessed for various tasks [56, 84]. While Shinn et al. [70] leverage binary rewards indicating passing or failing test cases in programming, ReEvo extends the scope of verbal RL feedback to comparative analysis of two heuristics, analogous to verbal gradient information [66] within heuristic spaces. Also, ReEvo incorporates reflection within an evolutionary framework, presenting a novel and powerful integration.

3 Language Hyper-Heuristics for Combinatorial Optimization

HHs explore a search space of heuristic configurations to select or generate effective heuristics, indirectly optimizing the underlying COP. This dual-level framework is formally defined as follows.

Definition 3.1 (Hyper-Heuristic).

For COP with solution space $S$ and objective function $f:S\rightarrow\mathbb{R}$ , a Hyper-Heuristic (HH) searches for the optimal heuristic $h^{*}$ in a heuristic space $H$ such that a meta-objective function $F:H\rightarrow\mathbb{R}$ is minimized, i.e., $h^{*}=\mathrm{argmin}_{h\in H}F(h)$ .

Depending on how the heuristic space $H$ is defined, traditional HHs can be categorized into selection and generation HHs, both entailing manually defined heuristic primitives. Here, we introduce a novel variant of HHs, Language Hyper-Heuristics (LHH), wherein heuristics in $H$ are generated by LLMs. LHHs dispense with the need for predefined $H$ , and instead leverage LLMs to explore an open-ended heuristic space. We recursively define LHHs as follows.

Definition 3.2 (Language Hyper-Heuristic).

A Language Hyper-Heuristic (LHH) is an HH variant where heuristics in $H$ are generated by LLMs.

In this work, we define the meta-objective function $F$ as the expected performance of a heuristic $h$ for certain COP. It is estimated by the average performance on a dataset of problem instances.

4 Language Hyper-Heuristic with ReEvo

LHH takes COP specifications as input and outputs the best inductive heuristic found for this COP. Vanilla LHH can be repeated LLM generations to randomly search the heuristic space, which is sample-inefficient and lacks reasoning capabilities for complex and black-box problems (see § 6). Therefore, we propose Reflective Evolution (ReEvo) to interpret genetic cues of evolutionary search and unleash the power of LHHs.

ReEvo is schematically illustrated in Fig. 1. Under an evolutionary framework, LLMs assume two roles: a generator LLM for generating individuals and a reflector LLM for guiding the generation with reflections. ReEvo, as an LHH, features a distinct individual encoding, where each individual is the code snippet of a heuristic. Its evolution begins with population initialization, followed by five iterative steps: selection, short-term reflection, crossover, long-term reflection, and elitist mutation. We evaluate the meta-objective of all heuristics, both after crossover and mutation. Our prompts are gathered in Appendix B.

Refer to caption — (a) ReEvo pipeline. Top: ReEvo evolves a population of heuristics. Insights and knowledge are verbalized as long-term reflections and accumulated throughout iterations. Bottom: A ReEvo iteration contains five sequential steps: selection, short-term reflection, crossover, long-term reflection, and elitist mutation.

Individual encoding.

ReEvo optimizes toward best-performing heuristics via an evolutionary process, specifically a Genetic Programming (GP). It diverges from traditional GPs in that (1) individuals are code snippets generated by LLMs, and (2) individuals are not constrained by any predefined encoding format, except for adhering to a specified function signature.

Population initialization.

ReEvo initializes a heuristic population by prompting the generator LLM with a task specification. A task specification contains COP descriptions (if available), heuristic designation, and heuristic functionality. Optionally, including seed heuristics, either trivial or expertly crafted to improve upon, can provide in-context examples that encourage valid heuristic generation and bias the search toward more promising directions.

A ReEvo iteration contains the following five sequential steps.

Selection.

ReEvo selects parent pairs from successfully executed heuristics at random, while avoiding pairing heuristics with an identical meta-objective value $F$ .

Short-term reflection.

For each pair of heuristic parents, the reflector LLM reflects upon their relative performance and gives hints accordingly for improved design. Unlike prior work [70], ReEvo integrates the reflections into evolutionary search and reflects by performing comparative analyses. Our proposed approach is analogous to interpreting genetic cues and providing verbal gradients within search spaces, which leads to smoother fitness landscapes and better search results (see § 6.1).

Crossover.

ReEvo prompts the generator LLM to generate an offspring heuristic, given task specifications, a pair of parent heuristics, explicit indications of their relative performance, short-term reflections over the pair, and generation instructions.

Long-term reflection.

ReEvo accumulates expertise in improving heuristics via long-term reflections. The reflector LLM, given previous long-term reflections and newly gained short-term ones, summarizes them and gives hints for improved heuristic design.

Elitist mutation.

ReEvo employs an elitist mutation approach. Based on long-term reflections, the generator LLM samples multiple heuristics to improve the current best one. A mutation prompt consists of task specifications, the elite heuristic, long-term reflections, and generation instructions.

Viewing ReEvo from the perspective of an LLM agentic architecture [88], short-term reflections interpret the environmental feedback from each round of interaction. Long-term reflections distill accumulated experiences and knowledge, enabling them to be loaded into the inference context without causing memory blowups.

5 Heuristic generation with ReEvo

This section presents novel applications of LHH across heterogeneous algorithmic types and diverse COPs. With ReEvo, we yield state-of-the-art and competitive meta-heuristics, evolutionary algorithms, heuristics, and neural solvers.

Hyperparameters of ReEvo and detailed experimental setup are given in Appendix C. We apply ReEvo to different algorithmic types across six diverse COPs representative of different areas: Traveling Salesman Problem (TSP), Capacitated Vehicle Routing Problem (CVRP), and Orienteering Problem (OP) for routing problems; Multiple Knapsack Problem (MKP) for subset problems; Bin Packing Problem (BPP) for grouping problems; and Decap Placement Problem (DPP) for electronic design automation (EDA) problems. Details of the benchmark COPs are given in Appendix D. The best ReEvo-generated heuristics are collected in Appendix E.

5.1 Penalty heuristics for Guided Local Search

We evolve penalty heuristics for Guided Local Search (GLS) [1]. GLS interleaves local search with solution perturbation. The perturbation is guided by the penalty heuristics to maximize its utility. ReEvo searches for the penalty heuristic that leads to the best GLS performance.

We implement the best heuristic generated by ReEvo within KGLS [1] and refer to such coupling as KGLS-ReEvo. In Table 1, we compare KGLS-ReEvo with the original KGLS, other GLS variants [24, 75, 47], and SOTA NCO method that learns to improve a solution [52]. The results show that ReEvo can improve KGLS and outperform SOTA baselines. In addition, we use a single heuristic for TSP20 to 200, while NCO baselines require training models specific to each problem size.

Table 1: Evaluation results of different local search (LS) variants. We report optimality gaps and per-instance execution time.

*: All instances are solved in one batch. D2A=1; T=500, 4000, 5000, and 5000 for 4 problem sizes, respectively.
Method	Type	TSP20		TSP50		TSP100		TSP200
Method	Type	Opt. gap (%)	Time (s)	Opt. gap (%)	Time (s)	Opt. gap (%)	Time (s)	Opt. gap (%)	Time (s)
NeuOpt* [52]	LS+RL	0.000	0.124	0.000	1.32	0.027	2.67	0.403	4.81
GNNGLS [24]	GLS+SL	0.000	0.116	0.052	3.83	0.705	6.78	3.522	9.92
NeuralGLS† [75]	GLS+SL	0.000	10.005	0.003	10.01	0.470	10.02	3.622	10.12
EoH [47]	GLS+LHH	0.000	0.563	0.000	1.90	0.025	5.87	0.338	17.52
KGLS‡ [1]	GLS	0.004	0.001	0.017	0.03	0.002	1.55	0.284	2.52
KGLS-ReEvo‡	GLS+LHH	0.000	0.001	0.000	0.03	0.000	1.55	0.216	2.52
†: The results are drawn from the original literature. ‡: They are based on our own GLS implementation.

5.2 Heuristic measures for Ant Colony Optimization

Solutions to COPs can be stochastically sampled, with heuristic measures indicating the promise of solution components and biasing the sampling. Ant Colony Optimization (ACO), which interleaves stochastic solution sampling with pheromone update, builds on this idea. We generate such heuristic measures for five different COPs: TSP, CVRP, OP, MKP, and BPP.

Under the ACO framework, we evaluate the best ReEvo-generated heuristics against the expert-designed ones and neural heuristics specifically learned for ACO [94]. The evolution curves displayed in Fig. 2 verify the consistent superiority of ReEvo across COPs and problem sizes. Notably, on 3 out of 5 COPs, ReEvo outperforms DeepACO [94] even when the latter overfits the test problem size (TSP50, OP50, and MKP100). We observe that most ReEvo-generated heuristics show consistent performance across problem sizes and distributions. Hence, their advantages grow as the distributional shift increases for neural heuristics.

5.3 Genetic operators for Electronic Design Automation

Expert-designed GAs are widely adopted in EDA [69, 97, 11, 26]. Besides directly solving EDA problems, GA-generated solutions can be used to train amortized neural solvers [31]. Here, we show that ReEvo can improve the expert-designed GAs and outperform DevFormer [31], the SOTA solver for the DPP problem. We sequentially evolve with ReEvo the crossover and mutation operators for the GA expert-designed by Park et al. [63]. Fig. 3 compares online and offline learned methods, DevFormer, the original expert-designed GA, and the GA with ReEvo-generated operators, showing that the ReEvo-designed GA outperforms previous methods and, importantly, both the expert-designed GA and DevFormer.

Method	# of shots	Obj. $\uparrow$
Pointer-PG [30]	10,000	9.66 $\pm$ 0.206
AM-PG [60]	10,000	9.63 $\pm$ 0.587
CNN-DQN [61]	10,000	9.79 $\pm$ 0.267
CNN-DDQN [100]	10,000	9.63 $\pm$ 0.150
Pointer-CRL [30]	Zero Shot	9.59 $\pm$ 0.232
AM-CRL [62]	Zero Shot	9.56 $\pm$ 0.471
DevFormer-CSE [31]	Zero Shot	12.88 $\pm$ 0.003
GA-expert [63]	400	12.41 $\pm$ 0.026
GA-ReEvo (ours)	400	12.98 $\pm$ 0.018

5.4 Constructive heuristics for the Traveling Salesman Problem

Heuristics can be used for deterministic solution construction by sequentially assigning values to each decision variable. We evaluate the constructive heuristic for TSP generated by ReEvo on real-world benchmark instances from TSPLIB [67] in Table 2. ReEvo can generate better heuristics than GHPP [15], a classic HH based on GP.

Table 2: Comparisons of constructive heuristics designed by human, GHPP [15], and ReEvo. We report the average optimality gap of each instance, where the baseline results are drawn from [15] and the results of ReEvo are averaged over 3 runs with different starting nodes.

Instance	Nearest Neighbour	GHPP [15]	ReEvo
ts225	16.8	7.7	6.6
rat99	21.8	14.1	12.4
rl1889	23.7	21.1	17.5
u1817	22.2	21.2	16.6
d1655	23.9	18.7	17.5
bier127	23.3	15.6	10.8
lin318	25.8	14.3	16.6
eil51	32.0	10.2	6.5
d493	24.0	15.6	13.4
kroB100	26.3	14.1	12.2
kroC100	25.8	16.2	15.9

Instance	Nearest Neighbour	GHPP [15]	ReEvo
ch130	25.7	14.8	9.4
pr299	31.4	18.2	20.6
fl417	32.4	22.7	19.2
d657	29.7	16.3	16.0
kroA150	26.1	15.6	11.6
fl1577	25.0	17.6	12.1
u724	28.5	15.5	16.9
pr264	17.9	24.0	16.8
pr226	24.6	15.5	18.0
pr439	27.4	21.4	19.3
Avg. opt. gap	25.4	16.7	14.6

5.5 Attention reshaping for Neural Combinatorial Optimization

Autoregressive NCO solvers suffer from limited scaling-up generalization [29], partially due to the dispersion of attention scores [85]. Wang et al. [85] design a distance-aware heuristic to reshape the attention scores, which improves the generalization of NCO solvers without additional training. However, the expert-designed attention-reshaping can be suboptimal and does not generalize across neural models or problem distributions.

Here we show that ReEvo can automatically and efficiently tailor attention reshaping for specific neural models and problem distributions of interest. We apply attention reshaping designed by experts [85] and ReEvo to two distinct model architectures: POMO with heavy encoder and light decoder [37], and LEHD with light encoder and heavy decoder [51]. On TSP and CVRP, Table 3 compares the original NCO solvers [37, 51], those with expert-designed attention reshaping [85], and those with ReEvo-designed attention reshaping. The results reveal that the ReEvo-generated heuristics can improve the original models and outperform their expert-designed counterparts. Note that implementing ReEvo-generated attention reshaping takes negligible additional time; e.g., solving a CVRP1000 with LEHD takes 50.0 seconds with reshaping, compared to 49.8 seconds without.

Table 3: Evaluation results for NCO solvers with and without different attention-reshaping heuristics.

	Method	$n=200$		$n=500$		$n=1000$
	Method	Obj.	Opt. gap (%)	Obj.	Opt. gap (%)	Obj.	Opt. gap (%)
TSP	POMO [37]	11.16	4.40	22.21	34.43	35.19	52.11
	POMO + DAR [85]	11.12	3.98	21.63	30.95	33.32	44.05
	POMO + ReEvo [75]	11.12	4.02	20.54	24.32	29.86	29.08
	LEHD [51]	10.79	0.87	16.78	1.55	23.87	3.17
	LEHD + DAR [85]	10.79	0.89	16.79	1.62	23.87	3.19
	LEHD + ReEvo	10.77	0.74	16.78	1.55	23.82	2.97
CVRP	POMO [37]	22.39	10.93	50.12	33.76	145.40	289.48
	POMO + DAR [85]	22.36	10.78	50.23	34.05	144.24	286.37
	POMO + ReEvo	22.30	10.48	47.10	25.70	118.80	218.22
	LEHD [51]	20.92	3.68	38.61	3.03	39.12	4.79
	LEHD + DAR [85]	21.13	4.67	39.16	4.49	39.70	6.35
	LEHD + ReEvo	20.85	3.30	38.57	2.94	39.11	4.76

6 Evaluating ReEvo

6.1 Fitness landscape analysis

The fitness landscape of a searching algorithm depicts the structure and characteristics of its search space $F:H\rightarrow\mathbb{R}$ [59]. This understanding is essential for designing effective HHs. Here we introduce this technique to LHHs and evaluate the impact of reflections on the fitness landscape.

Traditionally, the neighborhood of a solution is defined as a set of solutions that can be reached after a single move of a certain heuristic. However, LHHs feature a probabilistic nature and open-ended search space, and we redefine its neighborhood as follows.

Definition 6.1 (Neighborhood).

Let $LLM$ denote an LHH move, $x$ a specific prompt, and $h_{c}$ the current heuristic. Given $LLM$ and $x$ , the neighborhood of $h_{c}$ is defined as a set $\mathcal{N}$ , where each element $h\in\mathcal{N}$ represents a heuristic that $LLM$ can mutate $h_{c}$ into, in response to $x$ :

\mathcal{N}(h_{c})=\{h\mid LLM(h|h_{c},x)>\xi\}.

(1)

Here, $LLM(h|h_{c},x)$ denotes the probability of generating $h$ after prompting with $h_{c}$ and $x$ , and $\xi$ is a small threshold value. In practice, the neighborhood can be approximated by sampling from the distribution $LLM(\cdot|h_{c},x)$ for a large number of times.

We extend the concept of autocorrelation to LHHs under our definition of neighborhood. Autocorrelation reflects the ruggedness of a landscape, indicating the difficulty of a COP [59, 22].

Definition 6.2 (Autocorrelation).

Autocorrelation measures the correlation structure of a fitness landscape. It is derived from the autocorrelation function $r$ of a time series of fitness values, which are generated by a random walk on the landscape via neighboring points:

r_{i}=\frac{\sum_{t=1}^{T-i}(f_{t}-\bar{f})(f_{t+i}-\bar{f})}{\sum_{t=1}^{T}(f_{t}-\bar{f})^{2}},

(2)

where $\bar{f}$ is the mean fitness of the points visited, $T$ is the size of the random walk, and $i$ is the time lag between points in the walk.

Based on the autocorrelation function, correlation length is defined below [86].

Definition 6.3 (Correlation Length).

Given an autocorrelation function $r$ , the correlation length $l$ is formulated as $l=-1/\ln(|r_{1}|)$ for $r_{1}\neq 0$ . It reflects the ruggedness of a landscape, and smaller values indicate a more rugged landscape.

To perform autocorrelation analysis for ReEvo, we conduct random walks based on the neighborhood established with our crossover prompt either with or without short-term reflections. In practice, we set the population size to 1 and skip invalid heuristics; the selection always picks the current and last heuristics for short-term reflection and crossover, and we do not implement mutation.

Table 4: Autocorrelation analysis of ReEvo.

	Correlation length $\uparrow$	Objective $\downarrow$
w/o reflection	0.28 $\pm$ 0.07	12.08 $\pm$ 7.15
w/ reflection	1.28 $\pm$ 0.62	6.53 $\pm$ 0.60

Table 4 presents the correlation length and the average objective value of the random walks, where we generate ACO heuristics for TSP50. The correlation length is averaged over 3 runs each with 40 random walk steps, while the objective value is averaged over all $3\times 40$ heuristics. The results verify that implementing reflection leads to a less rugged landscape and better search results. As discussed in § 4, reflections can function as verbal gradients that lead to better neighborhood structures.

6.2 Ablation studies

In this section, we investigate the effects of the proposed components of ReEvo with both white and black-box prompting.

Black-box prompting.

We do not reveal any information related to the COPs and prompt LHHs in general forms (e.g., $\tt edge\_attr$ in place of $\tt distance\_matrix$ ). Black-box settings allow reliable evaluations of LHHs in designing effective heuristics for novel and complex problems, rather than merely retrieving code tailored for prominent COPs from their parameterized knowledge.

Table 5: Ablation study of ReEvo components with both white and black-box prompting.

Method	White-box $\downarrow$	Black-box $\downarrow$
LLM	8.64 $\pm$ 0.13	9.74 $\pm$ 0.54
w/o long-term reflections	8.61 $\pm$ 0.21	9.32 $\pm$ 0.71
w/o short-term reflections	8.46 $\pm$ 0.01	9.05 $\pm$ 0.83
w/o crossover	8.45 $\pm$ 0.02	9.47 $\pm$ 1.40
w/o mutation	8.83 $\pm$ 0.09	9.34 $\pm$ 0.96
ReEvo	8.40 $\pm$ 0.02	8.96 $\pm$ 0.82

We evaluate sampling LLM generations without evolution (LLM) and ReEvo without long-term reflections, short-term reflections, crossover, or mutation on generating ACO heuristics for TSP100. Table 5 shows that ReEvo enhances sample efficiency, and all its components positively contribute to its performance, both in white-box and black-box prompting.

6.3 Comparative evaluations

This section compares ReEvo with EoH [47], a recent SOTA LHH that is more sample-efficient than FunSearch [68]. We adhere to the original code and (hyper)parameters of EoH. Our experiments apply both LHHs to generate ACO heuristics for TSP, CVRP, OP, MKP, and BPP, using black-box prompting and three LLMs: GPT-3.5 Turbo, GPT-4 Turbo, and Llama 3 (70B).

Fig. 4 compares EoH and ReEvo, and shows that ReEvo demonstrates superior sample efficiency. Besides the better neighborhood structure (§ 6.1), reflections facilitate explicit verbal inference of underlying black-box COP structures; we depict an example in Fig. 1 (b). The enhanced sample efficiency and inference capabilities of ReEvo are particularly useful for complex real-world problems, where the objective function is usually black-box and expensive to evaluate.

7 Discussions and limitations

When to use ReEvo as an LHH.

Our experiments limit the number of heuristic evaluations to 100 shots and the results do not necessarily scale up. ReEvo is designed for scenarios where sample efficiency is crucial, such as real-world applications where heuristic evaluation can be costly. Allowing a large number of heuristic evaluations could obscure the impact of reflection or other prompting techniques, as reported by Zhang et al. [101].

When to use ReEvo as an (alternative to) NCO/ML4CO method.

LHH can be categorized as an NCO/ML4CO method. However, to facilitate our discussion, we differentiate LHHs from "traditional" NCO methods that usually train NN-parameterized heuristics via parameter adjustment. In § 5, we demonstrate that ReEvo can either outperform or enhance NCO methods. Below, we explore the complementary nature of LHH and NCO methods.

•

Rule-based v.s. NN-parameterized policies. LHHs generate interpretable and rule-based heuristics (code snippets), while NCO generates black-box NN-parameterized policies. Interpretable heuristics offer insights for human designers and can be more reliable in practice when faced with dynamic environments, limited data, distributional shifts, or adversarial attacks. However, they may not be as expressive as neural networks and may underfit in complex environments.
•

Evolution and training. LHHs require only less than 100 heuristic evaluations and about 5 minutes to evolve a strong heuristic, while many NCO methods usually require millions of samples and days of training. LHHs are more practical when solution evaluation is expensive.
•

Inference. LHHs generate heuristics that are less demanding in terms of computational resources, as they do not require GPU during deployment. NCO methods require GPU for training and deployment, but they can also leverage the parallelism of GPU to potentially speed up inference.
•

Engineering efforts and inductive biases. LHHs only need some text-based (and even black-box) explanations to guide the search. NCO requires the development of NN architectures, hyperparameters, and training strategies, where informed inductive biases and manual tuning are crucial to guarantee performance.

The choice of LLMs for ReEvo.

Reflection is more effective when using capable LLMs, such as GPT-3.5 Turbo and its successors, as discussed by Shinn et al. [70]. Currently, many open-source LLMs are not capable enough to guarantee statistically significant improvement of reflections [101]. However, as LLM capabilities improve, we only expect this paradigm to get better over time [70]. One can refer to [101] for extended evaluations based on more LLMs and problem settings.

Benchmarking LHHs based on heuristic evaluations.

We argue that benchmarking LHHs should prioritize the number of heuristic evaluations rather than LLM query budgets [101] due to the following reasons.

•

Prioritizing scenarios where heuristic evaluations are costly leads to meaningful comparisons between LHHs. The performance of different LHH methods becomes nearly indistinguishable when a large number of heuristic evaluations are allowed [101].
•

The overhead of LLM queries is negligible compared to real-world heuristic evaluations. LLM inference—whether via local models or commercial APIs—is highly cost-effective nowadays, with expenses averaging around $0.0003 per call in ReEvo using GPT-3.5-turbo, and response times of under one second on average for asynchronous API calls or batched inference. These costs are negligible compared to real-world heuristic evaluations, which, taking the toy EDA problem in this paper as an example, exceeds 20 minutes per evaluation.
•

Benchmarking LHHs based on LLM inference costs presents additional challenges. Costs and processing time are driven by token usage rather than the number of queries, complicating the benchmarking process. For instance, EoH [47] requires heuristic descriptions before code generation, resulting in higher token usage. In contrast, although ReEvo involves more queries for reflections, it is more token-efficient when generating heuristics.

8 Conclusion

This paper presents Language Hyper-Heuristics (LHHs), a rising variant of HHs, alongside Reflective Evolution (ReEvo), an evolutionary framework to elicit the power of LHHs. Applying ReEvo across five heterogeneous algorithmic types, six different COPs, and both white-box and black-box views of COPs, we yield state-of-the-art and competitive meta-heuristics, evolutionary algorithms, heuristics, and neural solvers. Comparing against SOTA LHH [47], ReEvo demonstrates superior sample efficiency. The development of LHHs is still at its emerging stage. It is promising to explore their broader applications, better dual-level optimization architectures, and theoretical foundations. We also expect ReEvo to enrich the landscape of evolutionary computation, by showing that genetic cues can be interpreted and verbalized using LLMs.

Acknowledgments and disclosure of funding

We are very grateful to Yuan Jiang, Yining Ma, Yifan Yang, AI4CO community, anonymous reviewers, and the area chair for valuable discussions and feedback. This work was supported by the National Natural Science Foundation of China (Grant No. 62276006); Wuhan East Lake High-Tech Development Zone National Comprehensive Experimental Base for Governance of Intelligent Society; the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG3-RP-2022-031); the Institute of Information & Communications Technology Planning & Evaluation (IITP)-Innovative Human Resource Development for Local Intellectualization program grant funded by the Korea government (MSIT) (IITP-2024-RS-2024-00436765).

References

Arnold and Sörensen [2019] F. Arnold and K. Sörensen. Knowledge-guided local search for the vehicle routing problem. Computers & Operations Research, 105:32–46, 2019.
Bengio et al. [2021] Y. Bengio, A. Lodi, and A. Prouvost. Machine learning for combinatorial optimization: a methodological tour d’horizon. European Journal of Operational Research, 290(2):405–421, 2021.
Berto et al. [2023] F. Berto, C. Hua, J. Park, M. Kim, H. Kim, J. Son, H. Kim, J. Kim, and J. Park. RL4CO: a unified reinforcement learning for combinatorial optimization library. In NeurIPS 2023 Workshop: New Frontiers in Graph Learning, 2023.
Brooks et al. [2023] E. Brooks, L. A. Walls, R. Lewis, and S. Singh. Large language models can implement policy iteration. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
Bu et al. [2024] F. Bu, H. Jo, S. Y. Lee, S. Ahn, and K. Shin. Tackling prevalent conditions in unsupervised combinatorial optimization: Cardinality, minimum, covering, and more. arXiv preprint arXiv:2405.08424, 2024.
Cai et al. [2022] J. Cai, P. Wang, S. Sun, and H. Dong. A dynamic space reduction ant colony optimization for capacitated vehicle routing problem. Soft Computing, 26(17):8745–8756, 2022.
Chao et al. [2024] W. Chao, J. Zhao, L. Jiao, L. Li, F. Liu, and S. Yang. A match made in consistency heaven: when large language models meet evolutionary algorithms, 2024.
Chen et al. [2023a] A. Chen, D. M. Dohan, and D. R. So. Evoprompting: Language models for code-level neural architecture search. arXiv preprint arXiv:2302.14838, 2023a.
Chen et al. [2023b] J. Chen, J. Wang, Z. Zhang, Z. Cao, T. Ye, and C. Siyuan. Efficient meta neural heuristic for multi-objective combinatorial optimization. In Advances in Neural Information Processing Systems, 2023b.
Chen et al. [2023c] X. Chen, M. Lin, N. Schärli, and D. Zhou. Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128, 2023c.
de Paulis et al. [2020] F. de Paulis, R. Cecchetti, C. Olivieri, and M. Buecker. Genetic algorithm pdn optimization based on minimum number of decoupling capacitors applied to arbitrary target impedance. In 2020 IEEE International Symposium on Electromagnetic Compatibility & Signal/Power Integrity (EMCSI), pages 428–433. IEEE, 2020.
Dernedde et al. [2024] T. Dernedde, D. Thyssens, S. Dittrich, M. Stubbemann, and L. Schmidt-Thieme. Moco: A learnable meta optimizer for combinatorial optimization. arXiv preprint arXiv:2402.04915, 2024.
Drake et al. [2020] J. H. Drake, A. Kheiri, E. Özcan, and E. K. Burke. Recent advances in selection hyper-heuristics. European Journal of Operational Research, 285(2):405–428, 2020.
Drakulic et al. [2023] D. Drakulic, S. Michel, F. Mai, A. Sors, and J.-M. Andreoli. Bq-nco: Bisimulation quotienting for efficient neural combinatorial optimization. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
Duflo et al. [2019] G. Duflo, E. Kieffer, M. R. Brust, G. Danoy, and P. Bouvry. A gp hyper-heuristic approach for generating tsp heuristics. In 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pages 521–529. IEEE, 2019.
Erdin and Achar [2019] I. Erdin and R. Achar. Multi-objective optimization of decoupling capacitors for placement and component value. IEEE Transactions on Components, Packaging and Manufacturing Technology, 9(10):1976–1983, 2019.
Fidanova [2020] S. Fidanova. Hybrid ant colony optimization algorithm for multiple knapsack problem. In 2020 5th IEEE International Conference on Recent Advances and Innovations in Engineering (ICRAIE), pages 1–5. IEEE, 2020.
Gao et al. [2023] C. Gao, H. Shang, K. Xue, D. Li, and C. Qian. Towards generalizable neural solvers for vehicle routing problems via ensemble with transferrable local policy. arXiv preprint arXiv:2308.14104, 2023.
Guerriero and Saccomanno [2023] F. Guerriero and F. P. Saccomanno. A hierarchical hyper-heuristic for the bin packing problem. Soft Computing, 27(18):12997–13010, 2023.
Guo et al. [2023] Q. Guo, R. Wang, J. Guo, B. Li, K. Song, X. Tan, G. Liu, J. Bian, and Y. Yang. Connecting large language models with evolutionary algorithms yields powerful prompt optimizers. arXiv preprint arXiv:2309.08532, 2023.
Hemberg et al. [2024] E. Hemberg, S. Moskal, and U.-M. O’Reilly. Evolving code with a large language model, 2024.
Hordijk [1996] W. Hordijk. A measure of landscapes. Evolutionary computation, 4(4):335–360, 1996.
Hromkovič [2013] J. Hromkovič. Algorithmics for hard problems: introduction to combinatorial optimization, randomization, approximation, and heuristics. Springer Science & Business Media, 2013.
Hudson et al. [2022] B. Hudson, Q. Li, M. Malencia, and A. Prorok. Graph neural network guided local search for the traveling salesperson problem. In International Conference on Learning Representations, 2022.
Ji et al. [2023] J. Ji, T. Qiu, B. Chen, B. Zhang, H. Lou, K. Wang, Y. Duan, Z. He, J. Zhou, Z. Zhang, et al. Ai alignment: A comprehensive survey. arXiv preprint arXiv:2310.19852, 2023.
Jiang et al. [2024] L. Jiang, L. Zhang, S. Tan, D. Li, C. Hwang, J. Fan, and E.-P. Li. A novel physics-assisted genetic algorithm for decoupling capacitor optimization. IEEE Transactions on Microwave Theory and Techniques, 2024.
Jiang et al. [2023] Y. Jiang, Z. Cao, Y. Wu, W. Song, and J. Zhang. Ensemble-based deep reinforcement learning for vehicle routing problems under distribution shift. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
Jin et al. [2023] Y. Jin, Y. Ding, X. Pan, K. He, L. Zhao, T. Qin, L. Song, and J. Bian. Pointerformer: Deep reinforced multi-pointer transformer for the traveling salesman problem. arXiv preprint arXiv:2304.09407, 2023.
Joshi et al. [2020] C. K. Joshi, Q. Cappart, L.-M. Rousseau, and T. Laurent. Learning the travelling salesperson problem requires rethinking generalization. arXiv preprint arXiv:2006.07054, 2020.
Kim et al. [2021a] H. Kim, H. Park, M. Kim, S. Choi, J. Kim, J. Park, S. Kim, S. Kim, and J. Kim. Deep reinforcement learning framework for optimal decoupling capacitor placement on general pdn with an arbitrary probing port. In 2021 IEEE 30th Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS), pages 1–3. IEEE, 2021a.
Kim et al. [2023a] H. Kim, M. Kim, F. Berto, J. Kim, and J. Park. Devformer: A symmetric transformer for context-aware device placement. In International Conference on Machine Learning, pages 16541–16566. PMLR, 2023a.
Kim et al. [2021b] M. Kim, J. Park, et al. Learning collaborative policies to solve np-hard routing problems. Advances in Neural Information Processing Systems, 34:10418–10430, 2021b.
Kim et al. [2022] M. Kim, J. Park, and J. Park. Sym-nco: Leveraging symmetricity for neural combinatorial optimization. Advances in Neural Information Processing Systems, 35:1936–1949, 2022.
Kim et al. [2023b] M. Kim, T. Yun, E. Bengio, D. Zhang, Y. Bengio, S. Ahn, and J. Park. Local search gflownets. arXiv preprint arXiv:2310.02710, 2023b.
Kim et al. [2024] M. Kim, S. Choi, J. Son, H. Kim, J. Park, and Y. Bengio. Ant colony sampling with gflownets for combinatorial optimization. arXiv preprint arXiv:2403.07041, 2024.
Kool et al. [2019] W. Kool, H. van Hoof, and M. Welling. Attention, learn to solve routing problems! In International Conference on Learning Representations, 2019.
Kwon et al. [2020] Y.-D. Kwon, J. Choo, B. Kim, I. Yoon, Y. Gwon, and S. Min. Pomo: Policy optimization with multiple optima for reinforcement learning. Advances in Neural Information Processing Systems, 33:21188–21198, 2020.
Lehman et al. [2023] J. Lehman, J. Gordon, S. Jain, K. Ndousse, C. Yeh, and K. O. Stanley. Evolution through large models. In Handbook of Evolutionary Machine Learning, pages 331–366. Springer, 2023.
Levine and Ducatelle [2004] J. Levine and F. Ducatelle. Ant colony optimization and local search for bin packing and cutting stock problems. Journal of the Operational Research society, 55(7):705–716, 2004.
Li et al. [2024] P. Li, J. Hao, H. Tang, X. Fu, Y. Zheng, and K. Tang. Bridging evolutionary algorithms and reinforcement learning: A comprehensive survey, 2024.
Li et al. [2022] Y. Li, D. Choi, J. Chung, N. Kushman, J. Schrittwieser, R. Leblond, T. Eccles, J. Keeling, F. Gimeno, A. Dal Lago, et al. Competition-level code generation with alphacode. Science, 378(6624):1092–1097, 2022.
Li et al. [2023] Y. Li, J. Guo, R. Wang, and J. Yan. T2t: From distribution learning in training to gradient search in testing for combinatorial optimization. In Advances in Neural Information Processing Systems, 2023.
Liang et al. [2023] J. Liang, W. Huang, F. Xia, P. Xu, K. Hausman, B. Ichter, P. Florence, and A. Zeng. Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 9493–9500. IEEE, 2023.
Lim et al. [2023] K. C. W. Lim, L.-P. Wong, and J. F. Chin. Simulated-annealing-based hyper-heuristic for flexible job-shop scheduling. Engineering Optimization, 55(10):1635–1651, 2023.
Liu et al. [2023a] F. Liu, X. Lin, Z. Wang, S. Yao, X. Tong, M. Yuan, and Q. Zhang. Large language model for multi-objective evolutionary optimization. arXiv preprint arXiv:2310.12541, 2023a.
Liu et al. [2023b] F. Liu, X. Tong, M. Yuan, and Q. Zhang. Algorithm evolution using large language model. arXiv preprint arXiv:2311.15249, 2023b.
Liu et al. [2024] F. Liu, X. Tong, M. Yuan, X. Lin, F. Luo, Z. Wang, Z. Lu, and Q. Zhang. Evolution of heuristics: Towards efficient automatic algorithm design using large language mode. In ICML, 2024. URL https://arxiv.org/abs/2401.02051.
Liu et al. [2023c] S. Liu, C. Chen, X. Qu, K. Tang, and Y.-S. Ong. Large language models as evolutionary optimizers. arXiv preprint arXiv:2310.19046, 2023c.
Liventsev et al. [2023] V. Liventsev, A. Grishina, A. Härmä, and L. Moonen. Fully autonomous programming with large language models. arXiv preprint arXiv:2304.10423, 2023.
Lu et al. [2023] Y. Lu, X. Liu, Z. Du, Y. Gao, and G. Wang. Medkpl: a heterogeneous knowledge enhanced prompt learning framework for transferable diagnosis. Journal of Biomedical Informatics, page 104417, 2023.
Luo et al. [2023] F. Luo, X. Lin, F. Liu, Q. Zhang, and Z. Wang. Neural combinatorial optimization with heavy decoder: Toward large scale generalization. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
Ma et al. [2023a] Y. Ma, Z. Cao, and Y. M. Chee. Learning to search feasible and infeasible regions of routing problems with flexible neural k-opt. In Advances in Neural Information Processing Systems, 2023a.
Ma et al. [2023b] Y. J. Ma, W. Liang, G. Wang, D.-A. Huang, O. Bastani, D. Jayaraman, Y. Zhu, L. Fan, and A. Anandkumar. Eureka: Human-level reward design via coding large language models. arXiv preprint arXiv:2310.12931, 2023b.
Ma et al. [2023c] Z. Ma, H. Guo, J. Chen, Z. Li, G. Peng, Y.-J. Gong, Y. Ma, and Z. Cao. Metabox: A benchmark platform for meta-black-box optimization with reinforcement learning. In Advances in Neural Information Processing Systems, volume 36, 2023c.
Madaan et al. [2023a] A. Madaan, A. Shypula, U. Alon, M. Hashemi, P. Ranganathan, Y. Yang, G. Neubig, and A. Yazdanbakhsh. Learning performance-improving code edits. arXiv preprint arXiv:2302.07867, 2023a.
Madaan et al. [2023b] A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri, S. Prabhumoye, Y. Yang, et al. Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651, 2023b.
Meyerson et al. [2023] E. Meyerson, M. J. Nelson, H. Bradley, A. Gaier, A. Moradi, A. K. Hoover, and J. Lehman. Language model crossover: Variation through few-shot prompting. arXiv preprint arXiv:2302.12170, 2023.
Mohammad Hasani Zade et al. [2023] B. Mohammad Hasani Zade, N. Mansouri, and M. M. Javidi. A new hyper-heuristic based on ant lion optimizer and tabu search algorithm for replica management in cloud environment. Artificial Intelligence Review, 56(9):9837–9947, 2023.
Ochoa et al. [2009] G. Ochoa, R. Qu, and E. K. Burke. Analyzing the landscape of a graph based hyper-heuristic for timetabling problems. In Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pages 341–348, 2009.
Park et al. [2020a] H. Park, M. Kim, S. Kim, S. Jeong, S. Kim, H. Kang, K. Kim, K. Son, G. Park, K. Son, et al. Policy gradient reinforcement learning-based optimal decoupling capacitor design method for 2.5-d/3-d ics using transformer network. In 2020 IEEE Electrical Design of Advanced Packaging and Systems (EDAPS), pages 1–3. IEEE, 2020a.
Park et al. [2020b] H. Park, J. Park, S. Kim, K. Cho, D. Lho, S. Jeong, S. Park, G. Park, B. Sim, S. Kim, et al. Deep reinforcement learning-based optimal decoupling capacitor design method for silicon interposer-based 2.5-d/3-d ics. IEEE Transactions on Components, Packaging and Manufacturing Technology, 10(3):467–478, 2020b.
Park et al. [2022] H. Park, M. Kim, S. Kim, K. Kim, H. Kim, T. Shin, K. Son, B. Sim, S. Kim, S. Jeong, et al. Transformer network-based reinforcement learning method for power distribution network (pdn) optimization of high bandwidth memory (hbm). IEEE Transactions on Microwave Theory and Techniques, 70(11):4772–4786, 2022.
Park et al. [2023] H. Park, H. Kim, H. Kim, J. Park, S. Choi, J. Kim, K. Son, H. Suh, T. Kim, J. Ahn, et al. Versatile genetic algorithm-bayesian optimization (ga-bo) bi-level optimization for decoupling capacitor placement. In 2023 IEEE 32nd Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS), pages 1–3. IEEE, 2023.
Pillay and Qu [2018] N. Pillay and R. Qu. Hyper-heuristics: theory and applications. Springer, 2018.
Popovich et al. [2007] M. Popovich, A. Mezhiba, and E. G. Friedman. Power distribution networks with on-chip decoupling capacitors. Springer Science & Business Media, 2007.
Pryzant et al. [2023] R. Pryzant, D. Iter, J. Li, Y. T. Lee, C. Zhu, and M. Zeng. Automatic prompt optimization with" gradient descent" and beam search. arXiv preprint arXiv:2305.03495, 2023.
Reinelt [1991] G. Reinelt. Tsplib—a traveling salesman problem library. ORSA journal on computing, 3(4):376–384, 1991.
Romera-Paredes et al. [2023] B. Romera-Paredes, M. Barekatain, A. Novikov, M. Balog, M. P. Kumar, E. Dupont, F. J. Ruiz, J. S. Ellenberg, P. Wang, O. Fawzi, et al. Mathematical discoveries from program search with large language models. Nature, pages 1–3, 2023.
Shibasaka et al. [2013] K. Shibasaka, K. Kanazawa, and M. Yasunaga. Decoupling-capacitor allocation problem solved by genetic algorithm. In 2013 IEEE Electrical Design of Advanced Packaging Systems Symposium (EDAPS), pages 225–228. IEEE, 2013.
Shinn et al. [2023] N. Shinn, F. Cassano, A. Gopinath, K. R. Narasimhan, and S. Yao. Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
Skinderowicz [2022] R. Skinderowicz. Improving ant colony optimization efficiency for solving large tsp instances. Applied Soft Computing, 120:108653, 2022.
Sohrabi et al. [2021] S. Sohrabi, K. Ziarati, and M. Keshtkaran. Acs-ophs: Ant colony system for the orienteering problem with hotel selection. EURO Journal on Transportation and Logistics, 10:100036, 2021.
Son et al. [2023] J. Son, M. Kim, S. Choi, and J. Park. Solving np-hard min-max routing problems as sequential generation with equity context. arXiv preprint arXiv:2306.02689, 2023.
Song et al. [2024] X. Song, Y. Tian, R. T. Lange, C. Lee, Y. Tang, and Y. Chen. Position paper: Leveraging foundational models for black-box optimization: Benefits, challenges, and future directions. arXiv preprint arXiv:2405.03547, 2024.
Sui et al. [2023] J. Sui, S. Ding, B. Xia, R. Liu, and D. Bu. Neuralgls: learning to guide local search with graph convolutional network for the traveling salesman problem. Neural Computing and Applications, pages 1–20, 2023.
Sun and Yang [2023] Z. Sun and Y. Yang. Difusco: Graph-based diffusion solvers for combinatorial optimization. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
Tang et al. [2024] H. Tang, F. Berto, Z. Ma, C. Hua, K. Ahn, and J. Park. Himap: Learning heuristics-informed policies for large-scale multi-agent pathfinding. arXiv preprint arXiv:2402.15546, 2024.
Voudouris and Tsang [1999] C. Voudouris and E. Tsang. Guided local search and its application to the traveling salesman problem. European Journal of Operational Research, 113(2):469–499, Mar. 1999. ISSN 03772217. doi: 10.1016/S0377-2217(98)00099-X.
Wang et al. [2024a] C. Wang, Z. Yu, S. McAleer, T. Yu, and Y. Yang. Asp: Learn a universal neural solver! IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024a.
Wang et al. [2023a] H. Wang, S. Feng, T. He, Z. Tan, X. Han, and Y. Tsvetkov. Can language models solve graph problems in natural language? arXiv preprint arXiv:2305.10037, 2023a.
Wang et al. [2023b] L. Wang, Y. Ling, Z. Yuan, M. Shridhar, C. Bao, Y. Qin, B. Wang, H. Xu, and X. Wang. Gensim: Generating robotic simulation tasks via large language models. arXiv preprint arXiv:2310.01361, 2023b.
Wang et al. [2023c] L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y. Lin, et al. A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432, 2023c.
Wang et al. [2023d] X. Wang, C. Li, Z. Wang, F. Bai, H. Luo, J. Zhang, N. Jojic, E. P. Xing, and Z. Hu. Promptagent: Strategic planning with language models enables expert-level prompt optimization. arXiv preprint arXiv:2310.16427, 2023d.
Wang et al. [2023e] Y. Wang, Z. Liu, J. Zhang, W. Yao, S. Heinecke, and P. S. Yu. Drdt: Dynamic reflection with divergent thinking for llm-based sequential recommendation. arXiv preprint arXiv:2312.11336, 2023e.
Wang et al. [2024b] Y. Wang, Y.-H. Jia, W.-N. Chen, and Y. Mei. Distance-aware attention reshaping: Enhance generalization of neural solver for large-scale vehicle routing problems. arXiv preprint arXiv:2401.06979, 2024b.
Weinberger [1990] E. Weinberger. Correlated and uncorrelated fitness landscapes and how to tell the difference. Biological cybernetics, 63(5):325–336, 1990.
Wu et al. [2024] X. Wu, S. hao Wu, J. Wu, L. Feng, and K. C. Tan. Evolutionary computation in the era of large language model: Survey and roadmap, 2024.
Xi et al. [2023] Z. Xi, W. Chen, X. Guo, W. He, Y. Ding, B. Hong, M. Zhang, J. Wang, S. Jin, E. Zhou, et al. The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864, 2023.
Xiao et al. [2023] Y. Xiao, D. Wang, H. Chen, B. Li, W. Pang, X. Wu, H. Li, D. Xu, Y. Liang, and Y. Zhou. Reinforcement learning-based non-autoregressive solver for traveling salesman problems. arXiv preprint arXiv:2308.00560, 2023.
Xiao et al. [2024] Y. Xiao, D. Wang, B. Li, M. Wang, X. Wu, C. Zhou, and Y. Zhou. Distilling autoregressive models to obtain high-performance non-autoregressive solvers for vehicle routing problems with faster inference speed. In Proceedings of the AAAI Conference on Artificial Intelligence, 2024.
Yang et al. [2023a] C. Yang, X. Wang, Y. Lu, H. Liu, Q. V. Le, D. Zhou, and X. Chen. Large language models as optimizers. arXiv preprint arXiv:2309.03409, 2023a.
Yang et al. [2023b] J. Yang, A. Prabhakar, K. Narasimhan, and S. Yao. Intercode: Standardizing and benchmarking interactive coding with execution feedback. arXiv preprint arXiv:2306.14898, 2023b.
Yang and Whinston [2023] Y. Yang and A. Whinston. A survey on reinforcement learning for combinatorial optimization. In 2023 IEEE World Conference on Applied Intelligence and Computing (AIC), pages 131–136. IEEE, 2023.
Ye et al. [2023] H. Ye, J. Wang, Z. Cao, H. Liang, and Y. Li. Deepaco: Neural-enhanced ant systems for combinatorial optimization. In Advances in Neural Information Processing Systems, 2023.
Ye et al. [2024] H. Ye, J. Wang, H. Liang, Z. Cao, Y. Li, and F. Li. Glop: Learning global partition and local construction for solving large-scale routing problems in real-time. In Proceedings of the AAAI Conference on Artificial Intelligence, 2024.
Zambrano-Gutierrez et al. [2023] D. Zambrano-Gutierrez, J. Cruz-Duarte, and H. Castañeda. Automatic hyper-heuristic to generate heuristic-based adaptive sliding mode controller tuners for buck-boost converters. In Proceedings of the Genetic and Evolutionary Computation Conference, pages 1482–1489, 2023.
Zebulum et al. [2018] R. S. Zebulum, M. A. Pacheco, and M. M. B. Vellasco. Evolutionary electronics: automatic design of electronic circuits and systems by genetic algorithms. CRC press, 2018.
Zelikman et al. [2023] E. Zelikman, E. Lorch, L. Mackey, and A. T. Kalai. Self-taught optimizer (stop): Recursively self-improving code generation. arXiv preprint arXiv:2310.02304, 2023.
Zhang et al. [2023a] C. Zhang, K. Yang, S. Hu, Z. Wang, G. Li, Y. Sun, C. Zhang, Z. Zhang, A. Liu, S.-C. Zhu, et al. Proagent: Building proactive cooperative ai with la rge language models. arXiv preprint arXiv:2308.11339, 2023a.
Zhang et al. [2020] L. Zhang, W. Huang, J. Juang, H. Lin, B.-C. Tseng, and C. Hwang. An enhanced deep reinforcement learning algorithm for decoupling capacitor selection in power distribution network design. In 2020 IEEE International Symposium on Electromagnetic Compatibility & Signal/Power Integrity (EMCSI), pages 245–250. IEEE, 2020.
Zhang et al. [2024] R. Zhang, F. Liu, X. Lin, Z. Wang, Z. Lu, and Q. Zhang. Understanding the importance of evolutionary search in automated heuristic design with large language models. In International Conference on Parallel Problem Solving from Nature, pages 185–202. Springer, 2024.
Zhang et al. [2022] S. Zhang, Z. Chen, Y. Shen, M. Ding, J. B. Tenenbaum, and C. Gan. Planning with large language models for code generation. In The Eleventh International Conference on Learning Representations, 2022.
Zhang et al. [2023b] Z.-Q. Zhang, F.-C. Wu, B. Qian, R. Hu, L. Wang, and H.-P. Jin. A q-learning-based hyper-heuristic evolutionary algorithm for the distributed flexible job-shop scheduling problem with crane transportation. Expert Systems with Applications, 234:121050, 2023b.
Zhao et al. [2023a] Q. Zhao, Q. Duan, B. Yan, S. Cheng, and Y. Shi. A survey on automated design of metaheuristic algorithms. arXiv preprint arXiv:2303.06532, 2023a.
Zhao et al. [2023b] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, et al. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023b.
Zhong et al. [2024] T. Zhong, Z. Liu, Y. Pan, Y. Zhang, Y. Zhou, S. Liang, Z. Wu, Y. Lyu, P. Shu, X. Yu, et al. Evaluation of openai o1: Opportunities and challenges of agi. arXiv preprint arXiv:2409.18486, 2024.
Zhou et al. [2023] J. Zhou, Y. Wu, Z. Cao, W. Song, J. Zhang, and Z. Chen. Learning large neighborhood search for vehicle routing in airport ground handling. IEEE Transactions on Knowledge and Data Engineering, 2023.
Zhou et al. [2022] Y. Zhou, A. I. Muresanu, Z. Han, K. Paster, S. Pitis, H. Chan, and J. Ba. Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910, 2022.
Zhu et al. [2023] L. Zhu, Y. Zhou, S. Sun, and Q. Su. Surgical cases assignment problem using an efficient genetic programming hyper-heuristic. Computers & Industrial Engineering, 178:109102, 2023.

Appendix A Extended discussions

A.1 Comparisons with EoH

Our work is developed concurrently with Evolution of Heuristics (EoH) [47], which establishes the groundwork for this emerging field. Nonetheless, our work extends the boundaries of LHH through three primary lenses: (1) the search algorithm, (2) the downstream CO applications, and (3) the evaluation methodologies.

•

Search Algorithm: We introduce the Reflective Evolution, demonstrating its superior sample efficiency.
•

Applications: Our work broadens the scope by applying LHH to five heterogeneous algorithmic types and six different COPs, advancing the state-of-the-art in GLS, EDA, ACO, and NCO.
•

Evaluation Methodologies: We employ fitness landscape analysis to explore the underlying mechanisms of our proposed method; we establish black-box experimental settings to ensure reliable comparisons and practical relevance to real-world applications.

A.2 Extended applications

ReEvo is generally applicable to other string-based optimization scenarios [57] as long as reflecting the relative performance of strings is meaningful. Preliminary experiments on prompt tuning verify the advantage of ReEvo over random search and vanilla genetic programming. Furthermore, we identify in reasoning-capable LLM approaches released after ReEvo such as OpenAI o1 [106] an interesting avenue of future works and experimentation that could yield even better sample efficiency and performance.

Appendix B Prompts

We gather prompts used for ReEvo in this section. Our prompt structure is flexible and extensible. To adapt ReEvo to a new problem setting, one only needs to define its problem description, function description, and function signature.

B.1 Common prompts

The prompt formats are given below. They are used for all COP settings.

⬇

You are an expert in the domain of optimization heuristics. Your task is to design heuristics that can effectively solve optimization problems.

Your response outputs Python code and nothing else. Format your code as a Python code string: "“‘python … “‘".

Prompt 1: System prompt for generator LLM.

⬇

You are an expert in the domain of optimization heuristics. Your task is to give hints to design better heuristics.

Prompt 2: System prompt for reflector LLM.

⬇

Write a {function_name} function for {problem_description}

{function_description}

Prompt 3: Task description.

⬇

{task_description}

{seed_function}

Refer to the format of a trivial design above. Be very creative and give ‘{func_name}_v2‘. Output code only and enclose your code with Python code block: “‘python … “‘.

{initial_long-term_reflection}

Prompt 4: User prompt for population initialization.

⬇

Below are two {function_name} functions for {problem_description}

{function_description}

You are provided with two code versions below, where the second version performs better than the first one.

[Worse code]

{worse_code}

[Better code]

{better_code}

You respond with some hints for designing better heuristics, based on the two code versions and using less than 20 words.

Prompt 5: User prompt for short-term reflection.

The user prompt used for short-term reflection in black-box COPs is slightly different from the one used for white-box COPs. We explicitly ask the reflector LLM to infer the problem settings and to give hints about how the node and edge attributes correlate with the black-box objective value.

⬇

Below are two {function_name} functions for {problem_description}

{function_description}

You are provided with two code versions below, where the second version performs better than the first one.

[Worse code]

{worse_code}

[Better code]

{better_code}

Please infer the problem settings by comparing two code versions and give hints for designing better heuristics. You may give hints about how edge and node attributes correlate with the black-box objective value. Use less than 50 words.

Prompt 6: User prompt for short-term reflection on black-box COPs.

⬇

{task_description}

[Worse code]

{function_signature0}

{worse_code}

[Better code]

{function_signature1}

{better_code}

[Reflection]

{short_term_reflection}

[Improved code]

Please write an improved function ‘{function_name}_v2‘, according to the reflection. Output code only and enclose your code with Python code block: “‘python … “‘.

Prompt 7: User prompt for crossover.

The function signature variables here are used to adjust function names with their versions, which is similar to the design in [68]. For example, when designing “heuristics”, the worse code is named “heuristics_v0” while the better code “heuristics_v1”. In LABEL:lst:_user_prompt_for_elitist_mutation, the elitist code is named “heuristic_v1”.

⬇

Below is your prior long-term reflection on designing heuristics for {problem_description}

{prior_long-term_reflection}

Below are some newly gained insights.

{new_short-term_reflections}

Write constructive hints for designing better heuristics, based on prior reflections and new insights and using less than 50 words.

Prompt 8: User prompt for long-term reflection.

⬇

{task_description}

[Prior reflection]

{long-term_reflection}

[Code]

{function_signature1}

{elitist_code}

[Improved code]

Please write a mutated function ‘{function_name}_v2‘, according to the reflection. Output code only and enclose your code with Python code block: “‘python … “‘.

Prompt 9: User prompt for elitist mutation.

B.2 Problem-specific prompt components

Problem-specific prompt components are given below.

•

Problem descriptions of all COP settings are given in Table 6.
•

The function descriptions of all COP settings are presented in LABEL:tab:_function_descriptions. The descriptions crafted for black-box settings avoid disclosing any information that could link to the original COP.
•

The function signatures are gathered in LABEL:lst:_function_signatures_used_in_reevo.
•

The seed functions are shown in LABEL:lst:_seed_heuristics_used_for_reevo. The seed function used for TSP_constructive is drawn from [46]. The seed functions used for black-box ACO settings are expert-designed heuristics [71, 6, 72, 17, 39], while those used for while-box ACO settings are trivial all-ones matrices.
•

The initial long-term reflections for some COP settings are presented in LABEL:lst:_initial_long-term_reflections, while are left empty for the others.

⬇

# TSP_NCO

def heuristics(distance_matrix: torch.Tensor) -> torch.Tensor:

# CVRP_NCO

def heuristics(distance_matrix: torch.Tensor, demands: torch.Tensor) -> torch.Tensor:

# DPP_GA_crossover

def crossover(parents: np.ndarray, n_pop: int) -> np.ndarray:

# DPP_GA_mutation

def mutation(population: np.ndarray, probe: int, prohibit: np.ndarray, size: int=100) -> np.ndarray:

# TSP_GLS

def heuristics(distance_matrix: np.ndarray) -> np.ndarray:

# TSP_ACO

def heuristics(distance_matrix: np.ndarray) -> np.ndarray:

# CVRP_ACO

def heuristics(distance_matrix: np.ndarray, coordinates: np.ndarray, demands: np.ndarray, capacity: int) -> np.ndarray:

# OP_ACO

def heuristics(prize: np.ndarray, distance: np.ndarray, maxlen: float) -> np.ndarray:

# MKP_ACO

def heuristics(prize: np.ndarray, weight: np.ndarray) -> np.ndarray:

# BPP_ACO

def heuristics(demand: np.ndarray, capacity: int) -> np.ndarray:

# TSP_ACO (black-box)

def heuristics(edge_attr: np.ndarray) -> np.ndarray:

# CVRP_ACO (black-box)

def heuristics(edge_attr: np.ndarray, node_attr: np.ndarray) -> np.ndarray: # For simplicity, we omit ‘coordinates‘ and ‘capacity‘ after using capacity to normalize demands, i.e. node_attr

# OP_ACO (black-box)

def heuristics(node_attr: np.ndarray, edge_attr: np.ndarray, node_constraint: float) -> np.ndarray:

# MKP_ACO (black-box)

def heuristics(item_attr1: np.ndarray, item_attr2: np.ndarray) -> np.ndarray:

# BPP_ACO (black-box)

def heuristics(node_attr: np.ndarray, node_constraint: int) -> np.ndarray:

# TSP_constructive

def select_next_node(current_node: int, destination_node: int, unvisited_nodes: set, distance_matrix: np.ndarray) -> int:

Prompt 10: Function signatures used in ReEvo.

⬇

# TSP_NCO

def heuristics(distance_matrix: torch.Tensor) -> torch.Tensor:

distance_matrix[distance_matrix == 0] = 1e5

K = 100

# Compute top-k nearest neighbors (smallest distances)

values, indices = torch.topk(distance_matrix, k=K, largest=False, dim=1)

heu = -distance_matrix.clone()

# Create a mask where topk indices are True and others are False

topk_mask = torch.zeros_like(distance_matrix, dtype=torch.bool)

topk_mask.scatter_(1, indices, True)

# Apply -log(d_ij) only to the top-k elements

heu[topk_mask] = -torch.log(distance_matrix[topk_mask])

return heu

# CVRP_NCO

def heuristics(distance_matrix: torch.Tensor, demands: torch.Tensor) -> torch.Tensor:

return torch.zeros_like(distance_matrix)

# DPP_GA_crossover

def crossover(parents: np.ndarray, n_pop: int) -> np.ndarray:

n_parents, n_decap = parents.shape

# Split genomes into two halves

left_halves = parents[:, :n_decap // 2]

right_halves = parents[:, n_decap // 2:]

# Create parent pairs

parents_idx = np.stack([np.random.choice(range(n_parents), 2, replace=False) for _ in range(n_pop)])

parents_left = left_halves[parents_idx[:, 0]]

parents_right = right_halves[parents_idx[:, 1]]

# Create offspring

offspring = np.concatenate([parents_left, parents_right], axis=1)

return offspring

# DPP_GA_mutation

def mutation(population: np.ndarray, probe: int, prohibit: np.ndarray, size: int=100) -> np.ndarray:

n_pop, n_decap = population.shape

for i in range(n_pop):

ind = population[i]

unique_actions = np.unique(population[i])

if len(unique_actions) < n_decap:

# Find the indices wherein the action is taken the second time

dup_idx = []

action_set = set()

for j, action in enumerate(ind):

if action in action_set:

dup_idx.append(j)

action_set.add(action)

# Mutate the duplicated actions

infeasible_actions = np.concatenate([prohibit, [probe], unique_actions])

feasible_actions = np.setdiff1d(np.arange(size), infeasible_actions)

assert n_decap - len(unique_actions) == len(dup_idx)

new_actions = np.random.choice(feasible_actions, len(dup_idx), replace=False)

ind[dup_idx] = new_actions

return population

# TSP_GLS

def heuristics(distance_matrix: np.ndarray) -> np.ndarray:

return distance_matrix

# TSP_ACO

def heuristics(distance_matrix: np.ndarray) -> np.ndarray:

return 1 / distance_matrix

# CVRP_ACO

def heuristics(distance_matrix: np.ndarray, coordinates: np.ndarray, demands: np.ndarray, capacity: int) -> np.ndarray:

return 1 / distance_matrix

# OP_ACO

def heuristics(prize: np.ndarray, distance: np.ndarray, maxlen: float) -> np.ndarray:

return prize[np.newaxis, :] / distance

# MKP_ACO

def heuristics(prize: np.ndarray, weight: np.ndarray) -> np.ndarray:

return prize / np.sum(weight, axis=1)

# BPP_ACO

def heuristics(demand: np.ndarray, capacity: int) -> np.ndarray:

return np.tile(demand/demand.max(), (demand.shape[0], 1))

# TSP_ACO (black-box)

def heuristics(edge_attr: np.ndarray) -> np.ndarray:

return np.ones(edge_attr.shape[0])

# CVRP_ACO (black-box)

def heuristics(edge_attr: np.ndarray, node_attr: np.ndarray) -> np.ndarray:

return np.ones_like(edge_attr)

# OP_ACO (black-box)

def heuristics(node_attr: np.ndarray, edge_attr: np.ndarray, edge_constraint: float) -> np.ndarray:

return np.ones_like(edge_attr)

# MKP_ACO (black-box)

def heuristics(item_attr1: np.ndarray, item_attr2: np.ndarray) -> np.ndarray:

n, m = item_attr2.shape

return np.ones(n,)

# BPP_ACO (black-box)

def heuristics(node_attr: np.ndarray, node_constraint: int) -> np.ndarray:

n = node_attr.shape[0]

return np.ones((n, n))

# TSP_constructive

def select_next_node(current_node: int, destination_node: int, unvisited_nodes: set, distance_matrix: np.ndarray) -> int:

threshold = 0.7

c1, c2, c3, c4 = 0.4, 0.3, 0.2, 0.1

scores = {}

for node in unvisited_nodes:

all_distances = [distance_matrix[node][i] for i in unvisited_nodes if i != node]

average_distance_to_unvisited = np.mean(all_distances)

std_dev_distance_to_unvisited = np.std(all_distances)

score = c1 * distance_matrix[current_node][node] - c2 * average_distance_to_unvisited + c3 * std_dev_distance_to_unvisited - c4 * distance_matrix[destination_node][node]

scores[node] = score

next_node = min(scores, key=scores.get)

return next_node

Prompt 11: Seed heuristics used for ReEvo.

⬇

# White-box COP_ACO

- Try combining various factors to determine how promising it is to select a solution component.

- Try sparsifying the matrix by setting unpromising elements to zero.

# TSP_constructive

- Try look-ahead mechanisms.

Prompt 12: Initial long-term reflections

Table 6: Problem descriptions used in prompts.

Problem	Problem description
TSP_NCO	Assisting in solving the Traveling Salesman Problem (TSP) with some prior heuristics. TSP requires finding the shortest path that visits all given nodes and returns to the starting node.
CVRP_NCO	Assisting in solving Capacitated Vehicle Routing Problem (CVRP) with some prior heuristics. CVRP requires finding the shortest path that visits all given nodes and returns to the starting node. Each node has a demand and each vehicle has a capacity. The total demand of the nodes visited by a vehicle cannot exceed the vehicle capacity. When the total demand exceeds the vehicle capacity, the vehicle must return to the starting node.
DPP_GA	Assisting in solving black-box decap placement problem with genetic algorithm. The problem requires finding the optimal placement of decaps in a given power grid.
TSP_GLS	Solving Traveling Salesman Problem (TSP) via guided local search. TSP requires finding the shortest path that visits all given nodes and returns to the starting node.
TSP_ACO	Solving Traveling Salesman Problem (TSP) via stochastic solution sampling following "heuristics". TSP requires finding the shortest path that visits all given nodes and returns to the starting node.
TSP_ACO_black-box	Solving a black-box graph combinatorial optimization problem via stochastic solution sampling following "heuristics".
CVRP_ACO	Solving Capacitated Vehicle Routing Problem (CVRP) via stochastic solution sampling. CVRP requires finding the shortest path that visits all given nodes and returns to the starting node. Each node has a demand and each vehicle has a capacity. The total demand of the nodes visited by a vehicle cannot exceed the vehicle capacity. When the total demand exceeds the vehicle capacity, the vehicle must return to the starting node.
CVRP_ACO_black-box	Solving a black-box graph combinatorial optimization problem via stochastic solution sampling following "heuristics".
OP_ACO	Solving Orienteering Problem (OP) via stochastic solution sampling following "heuristics". OP is an optimization problem where the goal is to find the most rewarding route, starting from a depot, visiting a subset of nodes with associated prizes, and returning to the depot within a specified travel distance.
OP_ACO_black-box	Solving a black-box graph combinatorial optimization problem via stochastic solution sampling following "heuristics".
MKP_ACO	Solving Multiple Knapsack Problems (MKP) through stochastic solution sampling based on "heuristics". MKP involves selecting a subset of items to maximize the total prize collected, subject to multi-dimensional maximum weight constraints.
MKP_ACO_black-box	Solving a black-box combinatorial optimization problem via stochastic solution sampling following "heuristics".
BPP_ACO	Solving Bin Packing Problem (BPP). BPP requires packing a set of items of various sizes into the smallest number of fixed-sized bins.
BPP_ACO_black-box	Solving a black-box combinatorial optimization problem via stochastic solution sampling following "heuristics".
TSP_constructive	Solving Traveling Salesman Problem (TSP) with constructive heuristics. TSP requires finding the shortest path that visits all given nodes and returns to the starting node.

Table 7: Function descriptions used in prompts.

Problem	Function description
TSP_NCO	The ‘heuristics‘ function takes as input a distance matrix and returns prior indicators of how bad it is to include each edge in a solution. The return is of the same shape as the input. The heuristics should contain negative values for undesirable edges and positive values for promising ones. Use efficient vectorized implementations.
CVRP_NCO	The ‘heuristics‘ function takes as input a distance matrix (shape: n by n) and a vector of customer demands (shape: n), where the depot node is indexed by 0 and the customer demands are normalized by the total vehicle capacity. It returns prior indicators of how promising it is to include each edge in a solution. The return is of the same shape as the distance matrix. The heuristics should contain negative values for undesirable edges and positive values for promising ones. Use efficient vectorized implementations.
DPP_GA_crossover	The ‘crossover‘ function takes as input a 2D NumPy array parents and an integer n_pop. The function performs a genetic crossover operation on parents to generate n_pop offspring. Use vectorized implementation if possible.
DPP_GA_mutation	The ‘mutation‘ function modifies a given 2D population array to ensure exploration of the genetic algorithm. You may also take into account the feasibility of each individual. An individual is considered feasible if all its elements are unique and none are listed in the prohibited array or match the probe value. Use a vectorized implementation if possible. The function takes as input the below arguments: - population (np.ndarray): Population of individuals; shape: (P, n_decap). - probe (int): Probe value; each element in the population should not be equal to this value. - prohibit (np.ndarray): Prohibit values; each element in the population should not be in this set. - size (int): Size of the PDN; each element in the population should be in the range [0, size).
TSP_GLS	The ‘heuristics‘ function takes as input a distance matrix, and returns prior indicators of how bad it is to include each edge in a solution. The return is of the same shape as the input.
TSP_ACO	The ‘heuristics‘ function takes as input a distance matrix, and returns prior indicators of how promising it is to include each edge in a solution. The return is of the same shape as the input.
TSP_ACO_black-box	The ‘heuristics‘ function takes as input a matrix of edge attributes with shape ‘(n_edges, n_attributes)‘, where ‘n_attributes=1‘ in this case. It computes prior indicators of how promising it is to include each edge in a solution. The return is of the shape of ‘(n_edges,)‘.
CVRP_ACO	The ‘heuristics‘ function takes as input a distance matrix (shape: n by n), Euclidean coordinates of nodes (shape: n by 2), a vector of customer demands (shape: n), and the integer capacity of vehicle capacity. It returns prior indicators of how promising it is to include each edge in a solution. The return is of the same shape as the distance matrix. The depot node is indexed by 0.
CVRP_ACO_black-box	The ‘heuristics‘ function takes as input a matrix of edge attributes (shape: n by n) and a vector of node attributes (shape: n). A special node is indexed by 0. ‘heuristics‘ returns prior indicators of how promising it is to include each edge in a solution. The return is of the same shape as the input matrix of edge attributes.
OP_ACO	Suppose ‘n‘ represents the number of nodes in the problem, with the depot being the first node. The ‘heuristics‘ function takes as input a ‘prize‘ array of shape (n,), a ‘distance‘ matrix of shape (n,n), and a ‘max_len‘ float which is the constraint to total travel distance, and it returns ‘heuristics‘ of shape (n, n), where ‘heuristics[i][j]‘ indicates the promise of including the edge from node #i to node #j in the solution.
OP_ACO_black-box	The ‘heuristics‘ function takes as input a vector of node attributes (shape: n), a matrix of edge attributes (shape: n by n), and a constraint imposed on the sum of edge attributes. A special node is indexed by 0. ‘heuristics‘ returns prior indicators of how promising it is to include each edge in a solution. The return is of the same shape as the input matrix of edge attributes.
MKP_ACO	Suppose ‘n‘ indicates the scale of the problem, and ‘m‘ is the dimension of weights each item has. The constraint of each dimension is fixed to 1. The ‘heuristics‘ function takes as input a ‘prize‘ of shape (n,), a ‘weight‘ of shape (n, m), and returns ‘heuristics‘ of shape (n,). ‘heuristics[i]‘ indicates how promising it is to include item i in the solution.
MKP_ACO_black-box	Suppose ‘n‘ indicates the scale of the problem, and ‘m‘ is the dimension of some attributes each involved item has. The ‘heuristics‘ function takes as input an ‘item_attr1‘ of shape (n,), an ‘item_attr2‘ of shape (n, m), and returns ‘heuristics‘ of shape (n,). ‘heuristics[i]‘ indicates how promising it is to include item i in the solution.
BPP_ACO	Suppose ‘n‘ represents the number of items in the problem. The heuristics function takes as input a ‘demand‘ array of shape (n,) and an integer as the capacity of every bin, and it returns a ‘heuristics‘ array of shape (n,n). ‘heuristics[i][j]‘ indicates how promising it is to put item i and item j in the same bin.
BPP_ACO_black-box	Suppose ‘n‘ represents the scale of the problem. The heuristics function takes as input an ‘item_attr‘ array of shape (n,) and an integer as a certain constraint imposed on the item attributes. The heuristics function returns a ‘heuristics‘ array of shape (n, n). ‘heuristics[i][j]‘ indicates how promising it is to group item i and item j.
TSP_constructive	The select_next_node function takes as input the current node, the destination node, a set of unvisited nodes, and a distance matrix, and returns the next node to visit.

Appendix C Detailed experimental setup

Hyperparameters for ReEvo.

Unless otherwise stated, we adopt the parameters in Table 8 for ReEvo runs. During initialization, the LLM temperature is added by 0.3 to diversify the initial population.

Heuristic generation pipeline.

We perform 3 ReEvo runs for each COP setting. Unless otherwise stated, the heuristic with the best validation performance is selected for final testing on 64 held-out instances.

Cost and hardware.

When the hardware permits, heuristics from the same generation are generated, reflected upon, and evaluated in parallel. The duration of a single ReEvo run can range from approximately two minutes to hours, depending on the evaluation runtime and the hardware used. Each run costs about $0.06 when using GPT3.5 Turbo. When conducting runtime comparisons, we employ a single core of an AMD EPYC 7742 CPU and an NVIDIA GeForce RTX 3090 GPU.

Table 8: Parameters of ReEvo.

Parameter	Value
LLM (generator and reflector)	gpt-3.5-turbo
LLM temperature (generator and reflector)	1
Population size	10
Number of initial generation	30
Maximum number of evaluations	100
Crossover rate	1
Mutation rate	0.5

C.1 Penalty heuristics for Guided Local Search

Guided Local Search (GLS) explores solution space through local search operations under the guidance of heuristics. We aim to use ReEvo to find the most effective heuristics to enhance GLS. In our experimental setup, we employed a variation of the classical GLS algorithm [78] that incorporated perturbation phases [1], wherein edges with higher heuristic values will be prioritized for penalization. In the training phase, we evaluate each heuristic with TSP200 using 1200 GLS iterations. For generating results in Table 1, we use the parameters in Table 9. The iterations stop when reaching the predefined threshold or when the optimality gap is reduced to zero.

Table 9: GLS parameters used for the evaluations in Table 1.

Problem	Perturbation moves	Number of iterations	Scale parameter $\lambda$
TSP20	5	73	0.1
TSP50	30	175
TSP100	40	1800
TSP200	40	800

C.2 Heuristic measures for Ant Colony Optimization

Ant Colony Optimization is an evolutionary algorithm that interleaves solution samplings with the update of pheromone trails. Stochastic solution samplings are biased toward more promising solution space by heuristics, and ReEvo searches for the best of such heuristics. For more details, please refer to [94].

Table 10 presents the ACO parameters used for heuristic evaluations during LHH evolution. They are adjusted to maximize ACO performance while ensuring efficient evaluations. Instance generations and ACO implementations follow Ye et al. [94]. To conduct tests in Fig. 2, we increase the number of iterations to ensure full convergence.

Table 10: ACO parameters used for heuristic evaluations during training.

Problem	Population size	Number of iterations
TSP	30	100
CVRP	30	100
OP	20	50
MKP	10	50
BPP	20	15

C.3 Genetic operators for Electronic Design Automation

Here we briefly introduce the expert-design GA for DPP. Further details can be found in [31, Appendix B].

The GA designed by Kim et al. [31] is utilized as an expert policy to collect expert guiding labels for imitation learning. The GA is a widely used search heuristic method for the Decoupling Capacitor Placement Problem (DPP), which aims to find the optimal placement of a given number of decoupling capacitors (decaps) on a Power Distribution Network (PDN) with a probing port and 0-15 keep-out regions to best suppress the impedance of the probing port.

Key aspects of the designed GA include:

•

Encoding and initialization. The GA generates an initial population randomly, and each solution consists of a set of numbers representing decap locations on the PDN. The population size is fixed to 20, and each solution is evaluated and sorted based on its objective value.
•

Elitism. After the initial population is formulated, the top-performing solutions (elite population) are kept for the next generation. The size of the elite population is predefined as 4.
•

Selection. The better half of the population is selected for crossover.
•

Crossover. This process generates new population candidates by dividing each solution from the selected population in half and performing random crossover.
•

Mutation. After crossover, solutions with overlapping numbers are replaced with random numbers while avoiding locations of the probing port and keep-out regions.

In this work, we sequentially optimize the crossover and mutation operators using ReEvo. When optimizing crossover, all other components of the GA pipeline remain identical to the expert-designed one. When optimizing mutation, we additionally set the crossover operator to the best one previously generated by ReEvo.

During training, we evaluate $F$ on three training instances randomly generated following [31, Appendix A.5]. The evaluation on each instance runs 10 GA iterations and returns the objective value of the best-performing solution. For the final test in Fig. 3, we utilize the same test dataset as in [31].

C.4 Attention reshaping for Neural Combinatorial Optimization

For autoregressive NCO solvers, e.g. POMO [37] and LEHD [51], the last decoder layer outputs the logits of the next node to visit. Then, the attention-reshaping heuristic values are added to the logits before masking, logit clipping, and softmax operation.

For the autoregressive NCO models with a heavy encoder and a light decoder, the last decoder layer computes logits using [36]

u_{(c)j}=\begin{cases}C\cdot\tanh\left(\frac{\mathbf{q}_{(c)}^{T}\mathbf{k}_{j}}{\sqrt{d_{\text{k}}}}\right)&\text{if }j\neq\pi_{t^{\prime}}\quad\forall t^{\prime}<t\\ -\infty&\text{otherwise.}\end{cases}

(3)

Here, $u_{(c)j}$ is the compatibility between current context and node $j$ , $C$ a constant for logit clipping, $\mathbf{q}_{(c)}$ the query embedding of the current context, $\mathbf{k}_{j}$ the key embedding of node $j$ , and $d_{\text{k}}$ the query/key dimensionality. For each node $j$ already visited, i.e. $j=\pi_{t^{\prime}},\exists t^{\prime}<t$ , $u_{(c)j}$ is masked.

We reshape the attention scores by using

u_{(c)j}=\begin{cases}C\cdot\tanh\left(\frac{\mathbf{q}_{(c)}^{T}\mathbf{k}_{j}+h_{(c)j}}{\sqrt{d_{\text{k}}}}\right)&\text{if }j\neq\pi_{t^{\prime}}\quad\forall t^{\prime}<t\\ -\infty&\text{otherwise.}\end{cases}

(4)

$h_{(c)j}$ is computed via attention-reshaping heuristics. In practice, for TSP and CVRP, $h_{(c)j}=\mathbf{H}_{c,j}$ , where $\mathbf{H}$ is the heuristic matrix and $c$ is simplified to the current node.

For the autoregressive NCO models with a light encoder and a heavy decoder [51], or only a decoder [14], the last decoder layer computes logits using:

u_{i}=W_{o}\mathbf{h}_{i},

(5)

where $W_{o}$ is a learnable matrix at the output layer and node $i$ is among the available nodes. We reshape the logits with

u_{i}=W_{o}\mathbf{h}_{i}+\mathbf{H}_{c,i},

(6)

where node $c$ is the current node.

For evaluations in Table 3, we generalize the models trained on TSP100 and CVRP100 to larger instances with 200, 500, and 1000 nodes. For TSP, we apply the same ReEvo-generated heuristic across all sizes, whereas for the CVRP, we use distinct heuristics for each size due to the observed variations in desirable heuristics.

Appendix D Benchmark problems

Appendix E Generated heuristics

This section presents the best heuristics generated by ReEvo for all problem settings.

⬇

def heuristics(distance_matrix: torch.Tensor) -> torch.Tensor:

distance_matrix[distance_matrix == 0] = 1e5

beta = 0.3

gamma = 0.5

reciprocal = -1 / distance_matrix

log_values = -torch.log(distance_matrix)

local_heu = log_values + beta * (reciprocal.mean(dim=1, keepdim=True) - reciprocal)

global_mean = distance_matrix.mean()

global_heu = -gamma * torch.log(torch.abs(distance_matrix - global_mean))

heu = local_heu + global_heu

return heu

Heuristic 1: The best ReEvo-generated heuristic for TSP_NCO_POMO.

⬇

def heuristics(distance_matrix: torch.Tensor) -> torch.Tensor:

n = distance_matrix.size(0)

# Calculate the average distance for each node with symmetrical adjustments

avg_distances = (torch.sum(distance_matrix, dim=1, keepdim=True) + torch.sum(distance_matrix, dim=0, keepdim=True) - 2 * torch.diag(distance_matrix).unsqueeze(1)) / (2 * (n - 1))

# Calculate heuristics based on the difference between each distance and the average distances with emphasis on node-centric averages

heuristics = 2 * (distance_matrix - avg_distances) + 0.5 * (distance_matrix - torch.mean(distance_matrix, dim=1, keepdim=True))

# Normalize the heuristics to have a mean of 0 and standard deviation of 1

heuristics = (heuristics - torch.mean(heuristics)) / torch.std(heuristics)

return heuristics

Heuristic 2: The best ReEvo-generated heuristic for TSP_NCO_LEHD.

⬇

# For CVRP200

def heuristics(distance_matrix: torch.Tensor, demands: torch.Tensor) -> torch.Tensor:

n = distance_matrix.size(0)

# Calculate the normalized demand-density for each edge

norm_demand_density = 2 * demands.view(n, 1) / (distance_matrix + 1e-6) # Normalizing factor 2

# Set penalties for edges exceeding capacity and scale the heuristics

heuristics = norm_demand_density

heuristics[torch.max(demands.view(n, 1), demands.view(1, n)) > 1] = -1

return heuristics

# For CVRP500 and CVRP1000

def heuristics(distance_matrix: torch.Tensor, demands: torch.Tensor) -> torch.Tensor:

excess_demand_penalty = torch.maximum(demands.sum() - demands, torch.tensor(0.))

return 1 / (distance_matrix + 1e-6) - excess_demand_penalty

Heuristic 3: The best ReEvo-generated heuristics for CVRP_NCO_POMO.

⬇

# For CVRP200

def heuristics(distance_matrix: torch.Tensor, demands: torch.Tensor) -> torch.Tensor:

total_demand = demands.sum()

normalized_demand = demands / total_demand

balanced_edge_weights = 1 / (distance_matrix + 1e-6)

over_capacity_penalty = torch.clamp(demands.unsqueeze(1) + demands.unsqueeze(0) - 2, max=0)

heuristics = balanced_edge_weights * normalized_demand.view(-1, 1) - normalized_demand - 2*over_capacity_penalty

return heuristics

# For CVRP500

def heuristics(distance_matrix: torch.Tensor, demands: torch.Tensor) -> torch.Tensor:

total_demand = torch.cumsum(demands, dim=0)

vehicle_capacity = total_demand[-1]

exceed_capacity_penalty = (total_demand.unsqueeze(1) > vehicle_capacity).float()

unmet_demand_penalty = (vehicle_capacity - total_demand).clamp(min=0) / vehicle_capacity

promisiness = (1 / (distance_matrix + 1)) * (1 - 0.5 * exceed_capacity_penalty - 0.5 * unmet_demand_penalty)

return promisiness

# For CVRP1000

def heuristics(distance_matrix: torch.Tensor, demands: torch.Tensor) -> torch.Tensor:

total_demand = demands.sum().item()

demand_norm = demands / total_demand

edge_savings = distance_matrix - demand_norm[:, None] - demand_norm

return edge_savings

Heuristic 4: The best ReEvo-generated heuristics for CVRP_NCO_LEHD.

⬇

# Crossover

def crossover(parents: np.ndarray, n_pop: int) -> np.ndarray:

n_parents, n_decap = parents.shape

parents_idx = np.random.choice(n_parents, (n_pop, 2))

crossover_points = np.random.randint(1, n_decap, n_pop)

mask = np.tile(np.arange(n_decap), (n_pop, 1))

offspring = np.where(mask < crossover_points.reshape(-1, 1),

parents[parents_idx[:, 0], :],

parents[parents_idx[:, 1], :])

return offspring

# Mutation (A repairing step follows this mutation to ensure the feasibility of the population)

def mutation(population: np.ndarray, probe: int, prohibit: np.ndarray, size: int = 100) -> np.ndarray:

p, n_decap = population.shape

is_not_probe = np.all(population != probe, axis=1)

is_not_prohibited = np.all(np.isin(population, prohibit, invert=True), axis=1)

is_feasible = is_not_probe & is_not_prohibited

mutation_mask = np.random.rand(p, n_decap) < 0.1

mutation_values = np.random.randint(0, size, size=(p, n_decap))

mutated_population = np.where(mutation_mask & is_feasible[:, None], mutation_values, population)

return mutated_population

Heuristic 5: The best ReEvo-generated heuristic for DPP_GA.

⬇

def heuristics(distance_matrix: np.ndarray) -> np.ndarray:

# Calculate the average distance for each node

average_distance = np.mean(distance_matrix, axis=1)

# Calculate the distance ranking for each node

distance_ranking = np.argsort(distance_matrix, axis=1)

# Calculate the mean of the closest distances for each node

closest_mean_distance = np.mean(distance_matrix[np.arange(distance_matrix.shape[0])[:, None], distance_ranking[:, 1:5]], axis=1)

# Initialize the indicator matrix and calculate ratio of distance to average distance

indicators = distance_matrix / average_distance[:, np.newaxis]

# Set diagonal elements to np.inf

np.fill_diagonal(indicators, np.inf)

# Adjust the indicator matrix using the statistical measure

indicators += closest_mean_distance[:, np.newaxis] / np.sum(distance_matrix, axis=1)[:, np.newaxis]

return indicators

Heuristic 6: The best ReEvo-generated heuristic for TSP_GLS.

LABEL:lst:_heuristic_for_TSPACO presents the best heuristic found for TSP_ACO, which is generated when viewing TSP as a black-box COP. ‘edge_attr’ represents the distance matrix.

⬇

import numpy as np

from sklearn.preprocessing import StandardScaler

def heuristics(edge_attr: np.ndarray) -> np.ndarray:

num_edges = edge_attr.shape[0]

num_attributes = edge_attr.shape[1]

heuristic_values = np.zeros_like(edge_attr)

# Apply feature engineering on edge attributes

transformed_attr = np.log1p(np.abs(edge_attr)) # Taking logarithm of absolute value of attributes

# Normalize edge attributes

scaler = StandardScaler()

edge_attr_norm = scaler.fit_transform(transformed_attr)

# Calculate correlation coefficients

correlation_matrix = np.corrcoef(edge_attr_norm.T)

# Calculate heuristic value for each edge attribute

for i in range(num_edges):

for j in range(num_attributes):

if edge_attr_norm[i][j] != 0:

heuristic_values[i][j] = np.exp(-8 * edge_attr_norm[i][j] * correlation_matrix[j][j])

return heuristic_values

Heuristic 7: The best ReEvo-generated heuristic for TSP_ACO.

⬇

def heuristics(distance_matrix: np.ndarray, coordinates: np.ndarray, demands: np.ndarray, capacity: int) -> np.ndarray:

num_nodes = distance_matrix.shape[0]

# Calculate the inverse of the distance matrix

inverse_distance_matrix = np.divide(1, distance_matrix, where=(distance_matrix != 0))

# Calculate total demand and average demand

total_demand = np.sum(demands)

average_demand = total_demand / num_nodes

# Calculate the distance from each node to the starting depot

depot_distances = distance_matrix[:, 0]

# Calculate the remaining capacity of the vehicle for each node

remaining_capacity = capacity - demands

# Initialize the heuristic matrix

heuristic_matrix = np.zeros_like(distance_matrix)

# Calculate the demand factor and distance factor

demand_factor = demands / total_demand

normalized_distance = distance_matrix / np.max(distance_matrix)

distance_factor = depot_distances / (normalized_distance + np.finfo(float).eps)

# Iterate over each node

for i in range(num_nodes):

# Calculate the heuristic value based on distance and capacity constraints

heuristic_values = inverse_distance_matrix[i] * (1 / (normalized_distance[i] ** 2))

# Adjust the heuristic values based on the remaining capacity

heuristic_values = np.where(remaining_capacity >= demands[i], heuristic_values, 0)

# Adjust the heuristic values based on the demand factor

heuristic_values *= demand_factor[i] / average_demand

# Adjust the heuristic values based on the distance factor

heuristic_values *= distance_factor[i]

heuristic_values[0] = 0 # Exclude the depot node

# Adjust the heuristic values based on the capacity utilization

utilization_factor = np.where(remaining_capacity >= demands[i], capacity - demands[i], 0)

heuristic_values *= utilization_factor

# Set the heuristic values for the current node in the heuristic matrix

heuristic_matrix[i] = heuristic_values

return heuristic_matrix

Heuristic 8: The best ReEvo-generated heuristic for CVRP_ACO.

⬇

def heuristics(prize: np.ndarray, distance: np.ndarray, maxlen: float) -> np.ndarray:

n = prize.shape[0]

heuristics = np.zeros((n, n))

# Calculate the prize-to-distance ratio with a power transformation

prize_distance_ratio = np.power(prize / distance, 3)

# Find the indices of valid edges based on the distance constraint

valid_edges = np.where(distance <= maxlen)

# Assign the prize-to-distance ratio to the valid edges

heuristics[valid_edges] = prize_distance_ratio[valid_edges]

return heuristics

Heuristic 9: The best ReEvo-generated heuristic for OP_ACO.

LABEL:lst:_heuristic_for_MKPACO presents the best heuristic found for MKP_ACO, which is generated when viewing MKP as a black-box COP. ‘item_attr1’ and ‘item_attr2’ represent the prizes and multi-dimensional weights of items, respectively.

⬇

def heuristics(item_attr1: np.ndarray, item_attr2: np.ndarray) -> np.ndarray:

n, m = item_attr2.shape

# Normalize item_attr1 and item_attr2

item_attr1_norm = (item_attr1 - np.min(item_attr1)) / (np.max(item_attr1) - np.min(item_attr1))

item_attr2_norm = (item_attr2 - np.min(item_attr2)) / (np.max(item_attr2) - np.min(item_attr2))

# Calculate the average value of normalized attribute 1

avg_attr1 = np.mean(item_attr1_norm)

# Calculate the maximum value of normalized attribute 2 for each item

max_attr2 = np.max(item_attr2_norm, axis=1)

# Calculate the sum of normalized attribute 2 for each item

sum_attr2 = np.sum(item_attr2_norm, axis=1)

# Calculate the standard deviation of normalized attribute 2 for each item

std_attr2 = np.std(item_attr2_norm, axis=1)

# Calculate the heuristics based on a combination of normalized attributes 1 and 2,

# while considering the average, sum, and standard deviation of normalized attribute 2

heuristics = (item_attr1_norm / max_attr2) * (item_attr1_norm / avg_attr1) * (item_attr1_norm / sum_attr2) * (1 / std_attr2)

# Normalize the heuristics to a range of [0, 1]

heuristics = (heuristics - np.min(heuristics)) / (np.max(heuristics) - np.min(heuristics))

return heuristics

Heuristic 10: The best ReEvo-generated heuristic for MKP_ACO.

⬇

def heuristics(demand: np.ndarray, capacity: int) -> np.ndarray:

n = demand.shape[0]

demand_normalized = demand / demand.max()

same_bin_penalty = np.abs((capacity - demand[:, None] - demand) / capacity)

overlap_penalty = (demand[:, None] + demand) / capacity

heuristics = demand_normalized[:, None] + demand_normalized - same_bin_penalty - overlap_penalty

threshold = np.percentile(heuristics, 90)

heuristics[heuristics < threshold] = 0

return heuristics

Heuristic 11: The best ReEvo-generated heuristic for BPP_ACO.

LABEL:lst:_heuristic_for_TSPconstructive gives the best-ReEvo generated constructive heuristic for TSP. We used the best heuristic found in AEL [46] as the seed for ReEvo. As a result, our heuristic closely mirrors the one in AEL, scoring each node mostly using a weighted combination of the four factors.

⬇

def select_next_node(current_node: int, destination_node: int, unvisited_nodes: set, distance_matrix: np.ndarray) -> int:

weights = {’distance_to_current’: 0.4,

’average_distance_to_unvisited’: 0.25,

’std_dev_distance_to_unvisited’: 0.25,

’distance_to_destination’: 0.1}

scores = {}

for node in unvisited_nodes:

future_distances = [distance_matrix[node, i] for i in unvisited_nodes if i != node]

if future_distances:

average_distance_to_unvisited = sum(future_distances) / len(future_distances)

std_dev_distance_to_unvisited = (sum((x - average_distance_to_unvisited) ** 2 for x in future_distances) / len(future_distances)) ** 0.5

else:

average_distance_to_unvisited = std_dev_distance_to_unvisited = 0

score = (weights[’distance_to_current’] * distance_matrix[current_node, node] -

weights[’average_distance_to_unvisited’] * average_distance_to_unvisited +

weights[’std_dev_distance_to_unvisited’] * std_dev_distance_to_unvisited -

weights[’distance_to_destination’] * distance_matrix[destination_node, node])

scores[node] = score

next_node = min(scores, key=scores.get)

return next_node

Heuristic 12: The best ReEvo-generated heuristic for TSP_constructive.

Appendix F Licenses for used assets

Table 11 lists the used assets and their licenses. Our code is licensed under the MIT License.

Table 11: Used assets and their licenses.

Type	Asset	License	Usage
Code	NeuOpt [52]	MIT License	Evaluation
	GNNGLS [24]	MIT License	Evaluation
	EoH [47]	MIT License	Evaluation
	DeepACO [94]	MIT License	Evaluation
	DevFormer [31]	Apache-2.0 license	Evaluation
	POMO [37]	MIT License	Evaluation
	LEHD [51]	MIT License	Evaluation
Dataset	TSPLIB [67]	Available for any non-commercial use	Testing
Dataset	DPP PDNs [63]	Apache-2.0 license	Testing