RCsearcher: Reaction Center Identification in Retrosynthesis
via Deep Q-Learning

Zixun Lan Zuo Zeng Binjie Hong Zhenfu Liu Fei Ma

{}^{\textrm{{\char 0\relax}}}

Abstract

The reaction center consists of atoms in the product whose local properties are not identical to the corresponding atoms in the reactants. Prior studies on reaction center identification are mainly on semi-templated retrosynthesis methods. Moreover, they are limited to single reaction center identification. However, many reaction centers are comprised of multiple bonds or atoms in reality. We refer to it as the multiple reaction center. This paper presents RCsearcher, a unified framework for single and multiple reaction center identification that combines the advantages of the graph neural network and deep reinforcement learning. The critical insight in this framework is that the single or multiple reaction center must be a node-induced subgraph of the molecular product graph. At each step, it considers choosing one node in the molecular product graph and adding it to the explored node-induced subgraph as an action. Comprehensive experiments demonstrate that RCsearcher consistently outperforms other baselines and can extrapolate the reaction center patterns that have not appeared in the training set. Ablation experiments verify the effectiveness of individual components, including the beam search and one-hop constraint of action space.

Machine Learning, ICML

1 Intruduction

Reaction center identification plays a vital role in single-step retrosynthesis. The reaction center (RC) consists of atoms in the product whose local properties are not identical to the corresponding atoms in the reactants (Coley et al., 2019). Experienced chemists first disconnect the target molecule through the potential reaction center when performing retrosynthesis analysis (Corey, 1991). In computer-aided retrosynthesis, template-based single-step retrosynthesis methods (Coley et al., 2017; Dai et al., 2019; Chen & Jung, 2021) rank the reaction templates, where the reaction center is part of the reaction template. The first step of the semi-template-based retrosynthesis methods (Shi et al., 2020; Yan et al., 2020; Somnath et al., 2021) is to use one graph neural network to predict the reaction center of the target molecule. Moreover, some template-free retrosynthesis methods (Wan et al., 2022; Wang et al., 2021) begin explicitly or implicitly adding reaction center information to improve performance and interpretability.

Refer to caption — Figure 1: (a) Examples of Single reaction center and multiple reaction center. The red highlight is the reaction center. (b) The case of the same intermediate molecules are visited many times in the multi-step retrosynthesis. The red highlight and arrow denote the predicted reaction center and retrosynthetic prediction respectively.

Prior studies on reaction center identification are mainly on semi-templated retrosynthesis methods. Moreover, they are limited to single reaction center identification. Here, a single reaction center refers to the reaction center composed of a bond or an atom (Fig. 1(a)). G2G (Shi et al., 2020) and RetroXpert (Yan et al., 2020) regards the bond that appears in the product but not in the reactants as the reaction center. They use R-GCN (Schlichtkrull et al., 2018) and EGAT (Kamiński et al., 2022) respectively to predict the probability of each bond as a single reaction center. GRAPHRETRO (Somnath et al., 2021) also considers the atom whose number of attached hydrogens changes in the product as a single reaction center. Then, it uses MPN (Gilmer et al., 2017) to model the probability of each bond or atom as a single reaction center. Despite the excellent performance of semi-templated retrosynthesis methods, they are still not widely used. The main reason is that the current reaction center identification method cannot detect multiple reaction center effectively.

In reality, many reaction centers in reactions’ products are composed of multiple bonds or atoms, which is not fully represented in the USPTO-50k dataset. In this paper, we refer to these as the multiple reaction center (Fig. 1(a)). (Dai et al., 2019) creates a large dataset USPTO-full from the entire set of USPTO 1976-2016 (1,808,937 raw reactions). USPTO-full has roughly 1M unique reactions, and the proportion of samples with multiple reaction center reaches 38.84%. For example, the reaction center of the product of most ring-opening reactions is the multiple reaction center (Coley et al., 2019). In retrosynthesis, the prior models sometimes misidentify multiple reaction center as the single reaction center leading to predictions deviating from the ground truth. It results in the same intermediate molecules being visited many times in the process of multi-step retrosynthesis (Xie et al., 2022). As shown in Fig. 1(b), due to misidentification of multiple reaction center, the ring-opening operation is not performed, which leads to an infinite loop in multi-step retrosynthesis.

To the best of our knowledge, the only existing method that can simultaneously address single and multiple reaction center identification task appear in RetroXpert (Yan et al., 2020). It uses one shared EGAT (Kamiński et al., 2022) to output two tensors (Fig. 3). One tensor represents the probability that each bond is the reaction center. Another tensor indicates the result of multi-classification, where each category indicates the number of bonds forming the reaction center. During inference, the prediction of the multi-classification task is used to select the corresponding number of bonds as the reaction center according to the predicted probabilities. There is a gap between its two joint optimization objectives and the purpose of finding the right single or multiple reaction center. Hence, designing a unified and reasonable framework for single and multiple reaction center identification remains a challenge.

In this paper, we present RCsearcher, a unified framework for single and multiple reaction center identification that combines the advantages of graph neural network and deep reinforcement learning (Fig. 3). The critical insight in this framework is that the single or multiple reaction center must be a node-induced subgraph of the molecular product graph (Coley et al., 2019). RCsearcher represents states and actions in continuous embeddings by graph representation learning (Kamiński et al., 2022) and uses a Deep Q-Network (DQN) (Mnih et al., 2013) to predict action distributions. At each step, it considers choosing one node in the molecular product graph and adding it to the explored node-induced subgraph as an action. After satisfying the stop condition, RCsearcher regards the explored node-induced subgraph as the predicted reaction center. For effective and fair evaluation, we stratifingly sample 40k reactions from USPTO-full to build a general dataset USPTO-40k, where the proportion of samples with multiple reaction center is also 38.84%. Comprehensive experiments demonstrate that RCsearcher consistently outperforms other baselines for the reaction centre identification task, and it can extrapolate the reaction center patterns that have not appeared in the training set. Ablation experiments verify the effectiveness of individual components, including the beam search and one-hop constraint of action space. In brief, we highlight our main contributions as follows:

•

We address the important yet challenging task of reaction center identification and propose a unified framework RCsearcher both for single and multiple reaction center as the solution.
•

The key novelty is the GNN-based DQN which provides a strategy to select the node-induced subgraph as the predicted reaction center, and the core insight in this framework is that single or multiple reaction center must be a node-induced subgraph of the molecular product graph.
•

For fair and effective evaluation, we build a general dataset USPTO-40k. We conduct extensive experiments to demonstrate the effectiveness and generalization of the RCsearcher. Ablation experiments also verify the effectiveness of individual components.

2 Problem Definition

The reaction center (RC) consists of atoms in the product whose local properties are not identical to the corresponding atoms in the reactants (Coley et al., 2019). In this paper, we use RDChiral (Coley et al., 2019) to extract the super general reaction center. Hence, regardless of whether the single or multiple reaction center must be a node-induced subgraph of the molecular product graph.

Notation A molecular product graph $\mathcal{G}_{p}=(\mathcal{V}_{p},\mathcal{E}_{p})$ is represented as a set of $|\mathcal{V}_{p}|$ nodes (atoms) and a set of $|\mathcal{E}_{p}|$ edges (bonds). The reaction center graph $\mathcal{G}_{rc}=(\mathcal{V}_{rc},\mathcal{E}_{rc})$ is a node-induced subgraph of the molecular product graph $\mathcal{G}_{p}$ such that $\mathcal{V}_{rc}\subseteq\mathcal{V}_{p}$ and $\mathcal{E}_{rc}=\left\{(u,v)\mid u,v\in\mathcal{V}_{rc},(u,v)\in\mathcal{E}_{p}\right\}$ .

Reaction Center Identification Given the molecular product graph $\mathcal{G}_{p}$ , reaction center identification aims to detect the corresponding reaction center graph $\mathcal{G}_{rc}$ , i.e. $\mathcal{G}_{rc}$ ’s node set $\mathcal{V}_{rc}$ .

Evaluation There may be several isomorphic node-induced subgraphs in a graph. Therefore, the condition for correct identification is that the node set of the predicted reaction center graph is consistent with the ground-truth $\mathcal{V}_{rc}$ .

3 Related work

Efforts on Reaction Center Identification in Retrosynthesis G2G (Shi et al., 2020) and RetroXpert (Yan et al., 2020) regard the bond that appears in the product but not in the reactants as the reaction center. G2G uses R-GCN (Schlichtkrull et al., 2018) to predict the probability of each bond as a single reaction center. In contrast, RetroXpert uses EGAT (Kamiński et al., 2022) to model the probability of each bond as the reaction center, and it adds a graph-level auxiliary task to predict the total number of disconnection bonds (Fig. 3). Hence, RetroXpert is not limited to single reaction center identification. GRAPHRETRO (Somnath et al., 2021) also considers the atom whose number of attached hydrogens changes in the product as a single reaction center. Then, it uses MPN (Gilmer et al., 2017) to model the probability of each bond or atom as a single reaction center. It is worth noting that GRAPHRETRO roughly proposed an autoregressive model for multiple reaction center identification in its appendix. However, GRAPHRETRO does not elaborate on the specific details of the autoregressive model, and the code of the autoregressive model is not available in the official GitHub link of GRAPHRETRO (https://github.com/vsomnath/graphretro).

Efforts on Single-step Retrosynthesis Prediction Single-step prediction models can be categorized into three main classes, i.e., template-based, template-free and semi-template-based method. Template-based methods rely on templates that encode the core of chemical reactions, converting product molecules into reactants. The key is to rank the templates and select the appropriate one to apply, and recent attempts (Coley et al., 2017; Dai et al., 2019) have addressed the template selection problem by similarity and neural networks. Despite their superior interpretability, template-based approaches are disadvantaged by poor generalization to structures beyond templates. On the other hand, template-free methods (Liu et al., 2017; Schwaller et al., 2019) regard single-step retrosynthesis prediction as a translation task and translate a product molecule represented in SMILES strings (Weininger, 1988) to reactant SMILES strings. To combine the benefits of template-based and template-free methods, recent works (Shi et al., 2020; Yan et al., 2020; Somnath et al., 2021) seek semi-template-based methods where they first predict single reaction center via graph neural networks and then intermediate synthons are secondly translated into reactants via seq2seq or graph translation models.

Efforts on Multi-step Retrosynthetic planning HgSearch (Schwaller et al., 2020) and Proof Number Search (Kishimoto et al., 2019) are traditional heuristic search algorithms in which chemical feasibility and failed pathway values are not considered. (Segler et al., 2018) adopt Monte Carlo tree search to generate search trees and explore multiple synthetic paths dynamically. (Chen et al., 2020) devise a neural-based A*-like algorithm that learns an additional value network with automatically constructed and only successful paths to bias the search prior. (Han et al., 2022) and (Xie et al., 2022) use a GNN-based value network to capture intermolecular/intra-pathway level information to further improve A*-like retrosynthesis planning algorithms. Notably, (Xie et al., 2022) observe that the same intermediate molecules are visited many times in the searching process. Sometimes, the ring-opening operation is not performed due to misidentifying multiple reaction center as single reaction center, which leads to an infinite loop in multi-step retrosynthesis (Fig. 1(b)).

4 Preliminaries and Proposed Method

In this section, we present our RL based reaction center identification in retrosynthesis method, RCsearcher. The rest of Section 4 is organized as follows. Section 4.1 introduces the preliminaries. Section 4.2 presents a high-level overview of how to leverage Deep Q-Network (DQN) (Mnih et al., 2013) to tackle the reaction center identification in retrosynthesis (Fig. 3). Section 4.3 describes the details of the state, action, reward (Fig. 3 Left). Section 4.4 explains how to train the DQN efficiently. Section 4.5 shows how RCsearcher does inference with beam search mechanism (Fig. 3 Right). Section 4.6 analyzes the time complexity.

4.1 Preliminaries

In this paper, we adopt the EGAT (Kamiński et al., 2022) to compute both the node embeddings and the edge embedding. Given $\mathbf{H}^{(t)}=[\boldsymbol{h}_{1}^{(t)};\boldsymbol{h}_{2}^{(t)};\cdots;\boldsymbol{h}_{n}^{(t)}]\in R^{n\times d}$ and $\mathbf{f}_{ij}^{(t)}\in R^{1\times d}$ are the node embedding matrix and the embedding of edge $(i,j)$ at the $t$ -th layer repectively, where $\boldsymbol{h}_{i}^{(t)}\in R^{1\times d}$ is the node-level embedding for node $i$ of the graph and is also the $i$ -th row of $\mathbf{H}^{(t)}$ , $d$ is the dimension of node-level embedding and $n$ is the number of nodes. $\boldsymbol{h}_{i}^{(0)}$ and $\boldsymbol{f}_{ij}^{(0)}$ are the initial features of node and edge in molecular product graph $\mathcal{G}_{p}$ . The details of the initial features can be found in the appendix. EGAT injects the graph structure into the attention mechanism by performing masked attention, namely it only computes $\alpha_{ij}$ for nodes $j\in\mathcal{N}_{i}$ , where $\mathcal{N}_{i}$ is the first-order neighbors of node $i$ in the graph:

	$\displaystyle\boldsymbol{f}_{ij}^{(t+1)}$	$\displaystyle=\text{ LeakyReLU }\left(\left[\boldsymbol{h}_{i}^{(t)}\mathbf{W}\left\\|\boldsymbol{f}_{ij}^{(t)}\right\\|\boldsymbol{h}_{j}^{(t)}\mathbf{W}\right]A\right),$		(1)
	$\displaystyle e_{ij}$	$\displaystyle=\mathbf{a}\cdot{\boldsymbol{f}_{ij}^{(t+1)}}^{T},\alpha_{ij}=\frac{\exp\left(e_{ij}\right)}{\sum_{k\in\mathcal{N}_{i}}\exp\left(e_{ik}\right)},$		(1)

where $e_{ij}\in R$ and $\alpha_{ij}\in R$ are a non-normalized attention coefficient and a normalized attention coefficient representing the weight of message aggregated from node $j$ to node $i$ respectively in the $t$ -th layer of EGAT, and $\|$ is the concatenation operation. Besides, $\mathbf{W}\in R^{d\times d}$ , $A\in R^{3d\times d}$ and $\mathbf{a}\in R^{1\times d}$ are learnable parameters in the $t$ -th layer.

EGAT employs multi-head attention to stabilize the learning process of self-attention, similar to Transformer (Vaswani et al., 2017). If there are $K$ heads, $K$ independent attention mechanisms execute the Eq. 1, and then their features are concatenated:

\displaystyle\boldsymbol{h}_{i}^{(t+1)}

\displaystyle=\operatorname{MLP}\left(\|_{k=1}^{K}\sigma\left(\sum_{j\in\mathcal{N}_{i}}\alpha_{ij}^{k}\boldsymbol{h}_{j}^{(t)}\mathbf{W}^{k}\right)\right)

(2)

where $\|$ represents concatenation, $\alpha_{ij}^{k}$ are normalized attention coefficients computed by the $k$ -th learnable $\mathbf{W}^{k}\in R^{d\times d}$ , $A^{k}\in R^{3d\times d}$ and $\mathbf{a}^{k}\in R^{1\times d}$ following Eq. 1. Besides, $\operatorname{MLP}$ denotes multi-perceptron. For simplicity, we denote the encoding process by $\mathrm{EGAT}(\cdot)$ in this study.

4.2 Overview of RCSeacher

RCSeacher enables graph representation learning techniques to tackle the reaction center identification in retrosynthesis and uses deep Q-learning to select one node that is added to the current explored subgraph in each search state. RCSeacher represents states and actions in continuous embeddings, and maps $(s_{t},a_{t})$ to a score $Q(s_{t},a_{t})$ via a DQN which consists of a Graph Neural Network encoder and learnable components to project the representations into the final score. Here, $s_{t}$ and $a_{t}$ denote the state and action respectively in the step $t$ . Once RCSeacher is trained, it can be applied to any new graphs that are unseen during training.

State $s_{t}$ consists of the (1) the molecular product graph $\mathcal{G}_{p}$ , (2) current explored reaction center graph $\hat{\mathcal{G}}_{rc}^{t}=\mathcal{G}_{p}[\hat{\mathcal{V}}_{rc}^{t}]$ . Action $a_{t}$ is defined as selecting one node from the first-order neighbor of the current explored subgraph $\hat{\mathcal{V}}_{rc}^{t}$ , or a stop action implying stop of the exploration process. Since most reaction center graphs have only one connected branch, we impose a one-hop constraint on the action space to narrow the search space and improve the performance. For RCSeacher, given our goal, the intermediate reward is defined as $r_{t}=0$ at any intermediate step $t$ and $r_{T}=1$ when predicted node-set $\hat{\mathcal{V}}_{rc}^{T}$ is consistent with ground-truth $\mathcal{V}_{rc}$ , otherwise $r_{T}=0$ at the final step $T$ . Therefore, our optimization goal, maximizing the expected remaining future reward, means finding a reaction center that is exactly the same as the ground-truth.

Since the action space can be large, we leverage the representation learning capacity of continuous representations for DQN design. At state $s_{t}$ , for each action $a_{t}$ , our DQN predicts a $Q(s_{t},a_{t})$ representing the remaining future reward after selecting action $a_{t}$ . Based on the above insights, we can design a simple DQN leveraging the graph encoder EGAT (Kamiński et al., 2022) to obtain one embedding per node, $\left\{\boldsymbol{h}_{i}\mid\forall i\in\mathcal{V}_{p}\right\}$ . Denote $\|$ as concatenation, READOUT as a readout operation that aggregates node-level embeddings into subgraph embeddings $\boldsymbol{h}_{\hat{\mathcal{G}}_{rc}^{t}}$ , and whole-graph embedding $\boldsymbol{h}_{\mathcal{G}_{p}}$ . A state can then be represented as $\boldsymbol{h}_{s_{t}}=\boldsymbol{h}_{\mathcal{G}_{p}}\|\boldsymbol{h}_{\hat{\mathcal{G}}_{rc}^{t}}$ . An action can be represented as $\boldsymbol{h}_{a_{t}}\in\left\{\boldsymbol{h}_{i}\mid\forall i\in\mathcal{V}_{p}\right\}\cup\{\boldsymbol{h}_{stop}\}$ , where $\boldsymbol{h}_{stop}$ denotes the stop action. The $Q$ function can then be designed as:

		$\displaystyle\,\,\,\,\,\,\,\,Q\left(s_{t},a_{t}\right)$		(3)
		$\displaystyle=\operatorname{MLP}\left(\boldsymbol{h}_{s_{t}}\\|\boldsymbol{h}_{a_{t}}\right)$
		$\displaystyle=\operatorname{MLP}\left(\boldsymbol{h}_{\mathcal{G}_{p}}\\|\boldsymbol{h}_{\hat{\mathcal{G}}_{rc}^{t}}\\|\boldsymbol{h}_{i}\right)or\operatorname{MLP}\left(\boldsymbol{h}_{\mathcal{G}_{p}}\\|\boldsymbol{h}_{\hat{\mathcal{G}}_{rc}^{t}}\\|\boldsymbol{h}_{stop}\right),$

where $\operatorname{MLP}$ denotes multi-perceptron.

4.3 Details of State, Action and Reward

State State $s_{t}$ consists of the (1) the molecular product graph $\mathcal{G}_{p}$ , (2) current explored reaction center graph $\hat{\mathcal{G}}_{rc}^{t}=\mathcal{G}_{p}[\hat{\mathcal{V}}_{rc}^{t}]$ . Firstly, we use EGAT (Kamiński et al., 2022) to obtain the graph-level embedding $\boldsymbol{h}_{\mathcal{G}_{p}}\in R^{1\times d}$ representing the molecular product graph $\mathcal{G}_{p}$ :

\displaystyle\left\{\boldsymbol{h}_{i}\mid\forall i\in\mathcal{V}_{p}\right\},\left\{\boldsymbol{f}_{ij}\mid\forall(i,j)\in\mathcal{E}_{p}\right\}=\mathrm{EGAT}\left(\mathcal{G}_{p}\right),

(4)

	$\displaystyle\boldsymbol{h}_{nodes}$	$\displaystyle=\operatorname{MEAN}\left(\left\{\boldsymbol{h}_{i}\mid\forall i\in\mathcal{V}_{p}\right\}\right),$		(5)
	$\displaystyle\boldsymbol{h}_{edges}$	$\displaystyle=\operatorname{MEAN}\left(\left\{\boldsymbol{f}_{ij}\mid\forall(i,j)\in\mathcal{E}_{p}\right\}\right),$		(5)

	$\displaystyle\boldsymbol{h}_{\mathcal{G}_{p}}^{\prime}$	$\displaystyle=\operatorname{ABS}\left(\boldsymbol{h}_{nodes},\boldsymbol{h}_{edges}\right)\\|\left(\boldsymbol{h}_{nodes}+\boldsymbol{h}_{edges}\right),$		(6)
	$\displaystyle\boldsymbol{h}_{\mathcal{G}_{p}}$	$\displaystyle=\operatorname{MLP}\left(\boldsymbol{h}_{\mathcal{G}_{p}}^{\prime}\right),$		(6)

where $\boldsymbol{h}_{nodes}\in R^{1\times d}$ and $\boldsymbol{h}_{edges}\in R^{1\times d}$ are the whole graph information from node and edge perspectives respectively. $\operatorname{ABS}$ denotes absolute difference and $\|$ refers to concatenation. It ensures our representations are permutation invariant. Besides, $\operatorname{MLP}$ denotes multi-perceptron.

Secondly, we use the above node-level embeddings $\left\{\boldsymbol{h}_{i}\mid\forall i\in\mathcal{V}_{p}\right\}$ to derive the subgraph embedding $\boldsymbol{h}_{\hat{\mathcal{G}}_{rc}^{t}}\in R^{1\times d}$ representing the current explored reaction center graph $\hat{\mathcal{G}}_{rc}^{t}$ :

\boldsymbol{h}_{\hat{\mathcal{G}}_{rc}^{t}}=\left\{\begin{aligned} &\vec{0}\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,,t=0\\ &\operatorname{MEAN}\left(\left\{\boldsymbol{h}_{i}\mid\forall i\in\hat{\mathcal{V}}_{rc}^{t}\right\}\right)\,\,,t\not=0\\ \end{aligned}\right..

(7)

Since the exploration process did not start at step $t=0$ ( $\hat{\mathcal{V}}_{rc}^{0}=\emptyset$ ), we use $\vec{0}\in R^{1\times d}$ to represent $\hat{\mathcal{V}}_{rc}^{0}$ .

Action Since most reaction center graphs have only one connected branch, we impose a one-hop constraint on the action space to narrow the search space and to improve the performance. In other words, the action $a_{t}$ is to select one node from the first-order neighbour of the current explored subgraph $\hat{\mathcal{V}}_{rc}^{t}$ or a stop action implying stop of the exploration process. We use the node $i$ ’s embedding $\boldsymbol{h}_{i}\in R^{1\times d}$ and one learnable embedding $\boldsymbol{h}_{stop}\in R^{1\times d}$ to represent the action of the selecting one node and the stop action respectively:

\boldsymbol{h}_{a_{t}}=\left\{\begin{aligned} &\boldsymbol{h}_{i}\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,,i\in\mathcal{V}_{p},t=0\\ &\boldsymbol{h}_{i}\,or\,\boldsymbol{h}_{stop}\,\,,i\in\operatorname{ONE-HOP}\left(\hat{\mathcal{V}}_{rc}^{t}\right),t\not=0\\ \end{aligned}\right.,

(8)

where $\operatorname{ONE-HOP}$ is the operation of 1-hop constrain that can obtain the first-order neighbour of the current explored subgraph.

Reward When the action $a_{t}$ is the stop action, we refer to this step $t$ as the final step $T$ . We define the reward $r_{t}$ as:

r_{t}=\left\{\begin{aligned} &0\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,,t=0,1,2,\cdots,T-1\\ &0\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,,\hat{\mathcal{V}}_{rc}^{T}\not=\mathcal{V}_{rc},t=T\\ &1\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,,\hat{\mathcal{V}}_{rc}^{T}=\mathcal{V}_{rc},t=T\\ \end{aligned}\right..

(9)

Therefore, our optimization goal, maximizing the expected remaining future reward, means finding a reaction center that is exactly the same as the ground truth.

4.4 Training

We adopt the standard Deep Q-learning framework (Mnih et al., 2013). Since imitation learning is known to help with training stability and performance (Levine & Koltun, 2013), we allow the agent to follow ground-truth trajectories. For each experience $(s_{t},a_{t},r_{t},s_{t+1})$ in the experience pool, our loss function is:

	$\displaystyle loss$	$\displaystyle=(y_{t}-Q(s_{t},a_{t}))^{2},$		(10)
	$\displaystyle y_{t}$	$\displaystyle=\gamma r_{t}+\operatorname{MAX}\left(Q(s_{t+1},a_{t+1})\right),$		(10)

where $\gamma\in(0,1]$ is the decay rate, and $y_{t}$ is a constant and NO-GRAD.

4.5 Inference

We use the beam search mechanism (Meister et al., 2020) to derive the $k$ best predictions for inference. With a hyperparameter budget BEAM SIZE $k$ , the agent can transition to at most BEAM SIZE $k$ number of best new states at any given state. Thus, we have $k^{2}$ candidate states before the next level in the beam search process. From the $k^{2}$ candidate states, we select the $k$ best states according to the predicted value of the $Q$ function at the next level. For example, in Fig. 3, BEAM SIZE = 3, each level of the beam search can have up to 3 state nodes (red dashed box). The pseudocode is shown in Alg. 1.

Algorithm 1 Inference with Beam Search

Input:

Q

function

Q(s,a)

; Initial State

s_{0}

; Beam Size

k

Output:

k

best results K-results

\textsc{beam}=\{s_{0}\}

\textsc{candidate}=\emptyset

repeat

\textsc{K-results}=\emptyset

for

s

in beam do

\textsc{Top-K-actions}=\underset{a}{\operatorname{argtopK}}\left(Q(s,a)\right)

for

a

in Top-K-actions do

add

(s,a)

to candidate

end for

\textsc{beam}^{\prime}=\underset{(s,a)\in\textsc{candidate}}{\operatorname{argtopK}}\left(Q(s,a)\right)

\textsc{candidate}=\emptyset

\textsc{beam}=\emptyset

for

(s,a)

\textsc{beam}^{\prime}

s^{\prime}\leftarrow

execute

a

s

a

is the stop action then

add

(s,a)

to candidate

add

s

to K-results

else

add

s^{\prime}

to beam

end if

end for

until

\textsc{beam}=\emptyset

Return: K-results

4.6 Complexity Analysis

At each state, the agent calculates the future values which have worst-case time complexity $\mathcal{O}\left(|\mathcal{V}_{p}|\right)$ due to the action space consisting of all atoms in the product at step 0. Overall the beam search depth is bounded by $|\mathcal{V}_{p}|$ , and each level of the beam search has at most BEAM SIZE $k$ states. Thus, the overall time complexity is $\mathcal{O}\left(k\times|\mathcal{V}_{p}|^{2}\right)$ .

5 Evaluation

In this section, we evaluate the performance of our RCsearcher with comparisons to few baseline approaches for the reaction center identification in retrosynthesis, and with significant goals of addressing the following questions: Q1: How effective and robust is RCsearcher compared to the baseline approaches? Q2: How well does RCsearcher perform on the data with the multiple reaction center? Q3: How does the proposed one-hop constrain and beam search in inference improve performance? Q4: Does RCsearcher have more robust generalization than baselines?

Data Since the data with the multiple reaction center in USPTO-50k only accounts for about 5%, for effective evaluation, we created a new dataset USPTO-40k. We adopt stratified random sampling for USPTO-full and take out 40k samples, ensuring that the data distribution is close enough to USPTO-full. In the USPTO-40k, the data with multiple reaction center accounts for 38.84%. In the USPTO-40k, 12.42% of the samples’ reaction center consists of more than one connected branch. We randomly select 80% of the samples as the training set and divide the rest into validation and test sets with equal sizes. Fig. 4 shows the distribution of the dataset. The details of the USPTO-40k can be found in the appendix.

Table 1: Top-k accuracy for Reaction Center Identification on USPTO-40K. The notations ’-20’ and ’-30’ denote 20 sampling experiments and 30 sampling experiments respectively.

Methods		Top-1(%)	Top-2(%)	Top-3(%)	Top-4(%)
GNN-based	MPN-based	9.30 $\pm$ 0.56	10.28 $\pm$ 0.38	11.58 $\pm$ 0.31	14.07 $\pm$ 0.14
	EGAT-based	18.04 $\pm$ 0.79	23.49 $\pm$ 0.66	25.33 $\pm$ 0.43	25.78 $\pm$ 0.25
	RGCN-based	9.27 $\pm$ 0.46	14.25 $\pm$ 0.47	17.53 $\pm$ 0.35	20.78 $\pm$ 0.17
Seq2Seq	Product2RC-20	3.25 $\pm$ 0.51	4.95 $\pm$ 0.34	6.45 $\pm$ 0.28	8.68 $\pm$ 0.19
	Product2RC-30	3.15 $\pm$ 0.42	4.61 $\pm$ 0.33	6.42 $\pm$ 0.20	8.48 $\pm$ 0.13
	Product2RC-Aug-20	7.78 $\pm$ 0.37	10.22 $\pm$ 0.20	12.70 $\pm$ 0.16	15.00 $\pm$ 0.06
	Product2RC-Aug-30	8.51 $\pm$ 0.40	10.36 $\pm$ 0.46	13.33 $\pm$ 0.29	15.85 $\pm$ 0.22
Similarity-based	Sim-based-20	19.51 $\pm$ 0.41	23.33 $\pm$ 0.20	26.54 $\pm$ 0.17	27.04 $\pm$ 0.06
Similarity-based	Sim-based-30	19.58 $\pm$ 0.39	23.21 $\pm$ 0.20	26.81 $\pm$ 0.19	27.13 $\pm$ 0.07
Ours	RCsearcher	30.75 $\pm$ 0.41	31.45 $\pm$ 0.35	31.92 $\pm$ 0.28	32.38 $\pm$ 0.18

Baselines We compare RCsearcher to three baselines for evaluating overall performance: GNN-based, Seq2Seq, and Similarity-based methods. These include:

(1) GNN-based: We use the method in RetroXpert. It uses one graph encoder to model the probability of each bond as the reaction center, and adds a graph-level auxiliary task to predict the total number of disconnection bonds. Here, we use three graph encoder, including R-GCN (Schlichtkrull et al., 2018), EGAT (Kamiński et al., 2022), MPN (Gilmer et al., 2017). Thus, we refer to these three baselines as RGCN-based, EGAT-based and MPN-based.

(2) Seq2Seq: Here, the SMILES string of the product is the input sequence, and the SMARTS string of the reaction center is the input sequence. We use Transformer (Vaswani et al., 2017) as the neural sequence-to-sequence model. We follow the same data augmentation tricks used by (Wan et al., 2022) for the SMILES generative models. Instead of expanding the training dataset off-the-shelf, we perform the augmentation on-the-fly. At each iteration, there is a probability of 50% to permute the SMILES. We refer to this baseline as Product2RC and Product2RC-Aug.

(3) Similarity-based: We calculate the similarity between the input product and each product in the training set and then sort the products in training set based on the similarity. We regard the reaction center of the corresponding product in training set as the prediction. Here, we use molecular morgan fingerprints to calculate the similarity. We refer to this baseline as Sim-based.

It is worth noting that Product2RC and Sim-based can only predict the reaction center pattern and cannot accurately obtain the specific position of the reaction center in the product. We regard the reaction center pattern as the query graph and perform subgraph matching on the product. When there is only one solution in the matching result, the only solution is regarded as the predicted result, otherwise, we randomly select a solution as the predicted result. In this way, we can calculate the top-k results of the overall test set once. We repeat this experiment N times, and we finally take the average top-k accuracy of these N times as the final result.

Evaluation Metric For different methods, we transform the predicted results into the predicted reaction center graph’s node set $\hat{\mathcal{V}}_{rc}^{T}$ . The condition for correct identification is that the node set $\hat{\mathcal{V}}_{rc}^{T}$ of the predicted reaction center graph is consistent with the ground-truth $\mathcal{V}_{rc}$ . We use the top-k exact match accuracy as our evaluation metrics. The k predictions containing ground-truth are counted as correct.

Implementation Settings Our proposed RCsearcher is implemented with Deep Graph Library (DGL) (Wang et al., 2019) and Pytorch (Paszke et al., 2019). As for the EGAT, we stack four identical four-head attentive layers of which the hidden dimension is 256. All embedding sizes in the model are set to 256. We use $\operatorname{ELU}(x)=\alpha(\exp(x)-1)$ for $x\leq 0$ and $x$ for $x>0$ as our activation function where $\alpha=1$ . We conduct all the experiments on a machine with an Intel Xeon 4114 CPU and one Nvidia Titan GPU. For training, we set the learning rate to 0.001, the number of training iterations to 100000, and use the Adam optimizer (Kingma & Ba, 2015). The first 10000 iterations are imitation learning. Checkpoints are saved for each 1000 iteration to select the best checkpoints on the evaluation set. The source code can be found in the supplementary materials.

5.1 Overall Performance

We operate the training process five times and report the mean and standard deviation of accuracy. The overall performance is illustrated in Table 1. We have two observations. First, our method achieves state-of-the-art performance on the dataset. It indicates that our method can better handle single and multiple reaction center identification. Second, the average performance of Product2RC and Sim-based with 20 and 30 sampling is close. It implies that the average of 30 results is enough to fairly reflect the capabilities of Product2RC and Sim-based.

5.2 Performance on Multiple Reaction Center

To address the Q2, we train RCsearcher on the train data with multiple reaction center to explore its performance on the test data with multiple reaction center. The multiple reaction center identification results are shown in Fig. 5. The overall top-1 accuracy of Rcsearcher is about eight times higher than that of prior GNN-based methods. Also, the prior GNN-based methods completely fail on data with the reaction center consisting of more than two edges. Conversely, RCsearcher still works. It shows that RCsearcher’s performance on multiple reaction center identification is significantly better than the baselines. We attribute it to consistency between our optimization objectives and the purpose of finding the right single or multiple reaction center.

5.3 Performance of Generalization

In order to evaluate the generalization ability of the RCsearcher, we count the number of reaction center patterns that are extrapolated successfully and are not in the training set. The results are shown in Figure 6. The number of extrapolations of Rcsearcher is about twenty times higher than that of prior GNN-based methods. It demonstrates that our method can predict novel reaction center patterns that have not appeared in the training set.

5.4 Ablation Study

The results of the ablation study are illustrated in Table 2, on which we have the two observations: 1) The setting with the beam search results is 0.52 percentage points higher than the setting without the beam search, which indicates beam search can improve performance. 2) The setting with the one-hop constrain results are 3.62 percentage points higher than the setting without the one-hop constrain. It demonstrates that although the one-hop constraint fails in some samples with reaction center of multi-connected branches, it can improve the model’s performance.

Table 2: The Results of Ablation Study.

Ablation Setting	Top-1 Accuracy
w/ beam search	30.75%
w/o beam search	30.23%
w/ one-hop constrain	30.75%
w/o one-hop constrain	27.13%

5.5 Hyperparameter Sensitivity Analysis

We respectively fix the number of heads to 4 and the dimension of hidden layers $d$ to 256 to explore the impact of several vital hyperparameters (Fig. 7). The performance under different hyperparameters is consistent, revealing the robustness of our method. We observe that the performance of the RCsearcher improves as the dimension of hidden layers increases. We hypothesise that the higher dimension allows the model to contain more parameters, thus improving experimental results.

6 Conclusion and Future Work

We present RCsearcher, a unified framework for single and multiple reaction center identification that combines the advantages of the graph neural network and deep reinforcement learning. The critical insight in this framework is that the single or multiple reaction center must be a node-induced subgraph of the molecular product graph. Comprehensive experiments demonstrate that RCsearcher consistently outperforms other baselines for the reaction centre identification task and can extrapolate the reaction center patterns that have not appeared in the training set. Ablation experiments verify the effectiveness of individual components. However, RCsearche is still limited to the reaction center of the single connected branch. In the future, we will not only take advantage of the one-hop constraint but also improve the current mechanism so that the model can generalize to the reaction center of multi-connected branches.

Appendix A Dataset information

We adopt stratified random sampling for USPTO-full and take out 40k samples, ensuring that the data distribution is close enough to USPTO-full. More detailed information about the distribution of the dataset can be found in Figure 8.

Appendix B Atom and bond features

In this paper, initial features of atom and bond can be found in Table 3 and Table 4.

Table 3: Atom Features used in EGAT. All features are one-hot encoding.

Feature	Description	Size
Atom type	Type of an atom by atomic number.	100
Total degree	Degree of an atom including Hs.	6
Explicit valence	Explicit valence of an atom.	6
Implicit valence	Explicit valence of an atom.	6
Hybridization	sp, sp2, sp3, sp3d, or sp3d2.	5
# Hs	Number of bonded Hydrogen atom.	5
Formal charge	Integer electronic charge assigned to atom.	5
Aromaticity	Whether an atom is part of an aromatic system.	1
In ring	Whether an atom is in ring	1

Table 4: Bond features used in EGAT. All features are one-hot encoding.

Feature	Description	Size
Bond type	Single, double, triple, or aromatic.	4
Conjugation	Whether the bond is conjugated.	1
In ring	Whether the bond is part of a ring.	1
Stereo	None, any, E/Z or cis/trans.	6
Direction	The direction of the bond.	3

Appendix C Hyperparameters for baselines

For fair comparison, we set the number of layers, size of hidden dimensions and the number of attention heads in three GNN-based baselines to 4, 156, and 4 respectively. In Seq2Seq-based baseline, we set the number of layers, size of hidden dimensions and the number of attention heads in Transformer (Lan et al., 2021, 2022, 2023; Ma et al., 2021) to 4, 2048, 8 respectively.

References

Chen et al. (2020) Chen, B., Li, C., Dai, H., and Song, L. Retro*: learning retrosynthetic planning with neural guided a* search. In International Conference on Machine Learning, pp. 1608–1616. PMLR, 2020.
Chen & Jung (2021) Chen, S. and Jung, Y. Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au, 1(10):1612–1620, 2021.
Coley et al. (2017) Coley, C. W., Rogers, L., Green, W. H., and Jensen, K. F. Computer-assisted retrosynthesis based on molecular similarity. ACS central science, 3(12):1237–1245, 2017.
Coley et al. (2019) Coley, C. W., Green, W. H., and Jensen, K. F. Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application. Journal of chemical information and modeling, 59(6):2529–2537, 2019.
Corey (1991) Corey, E. J. The logic of chemical synthesis: multistep synthesis of complex carbogenic molecules (nobel lecture). Angewandte Chemie International Edition in English, 30(5):455–465, 1991.
Dai et al. (2019) Dai, H., Li, C., Coley, C., Dai, B., and Song, L. Retrosynthesis prediction with conditional graph logic network. Advances in Neural Information Processing Systems, 32, 2019.
Gilmer et al. (2017) Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. Neural message passing for quantum chemistry. In International conference on machine learning, pp. 1263–1272. PMLR, 2017.
Han et al. (2022) Han, P., Zhao, P., Lu, C., Huang, J., Wu, J., Shang, S., Yao, B., and Zhang, X. Gnn-retro: Retrosynthetic planning with graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 4014–4021, 2022.
Kamiński et al. (2022) Kamiński, K., Ludwiczak, J., Jasiński, M., Bukala, A., Madaj, R., Szczepaniak, K., and Dunin-Horkawicz, S. Rossmann-toolbox: a deep learning-based protocol for the prediction and design of cofactor specificity in rossmann fold proteins. Briefings in bioinformatics, 23(1):bbab371, 2022.
Kingma & Ba (2015) Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
Kishimoto et al. (2019) Kishimoto, A., Buesser, B., Chen, B., and Botea, A. Depth-first proof-number search with heuristic edge cost and application to chemical synthesis planning. Advances in Neural Information Processing Systems, 32, 2019.
Lan et al. (2021) Lan, Z., Yu, L., Yuan, L., Wu, Z., Niu, Q., and Ma, F. Sub-gmn: The subgraph matching network model. arXiv preprint arXiv:2104.00186, 2021.
Lan et al. (2022) Lan, Z., Hong, B., Ma, Y., and Ma, F. More interpretable graph similarity computation via maximum common subgraph inference. arXiv preprint arXiv:2208.04580, 2022.
Lan et al. (2023) Lan, Z., Ma, Y., Yu, L., Yuan, L., and Ma, F. Aednet: Adaptive edge-deleting network for subgraph matching. Pattern Recognition, 133:109033, 2023.
Levine & Koltun (2013) Levine, S. and Koltun, V. Guided policy search. In International conference on machine learning, pp. 1–9. PMLR, 2013.
Liu et al. (2017) Liu, B., Ramsundar, B., Kawthekar, P., Shi, J., Gomes, J., Luu Nguyen, Q., Ho, S., Sloane, J., Wender, P., and Pande, V. Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science, 3(10):1103–1113, 2017.
Ma et al. (2021) Ma, Y., Lan, Z., Zong, L., and Huang, K. Global-aware beam search for neural abstractive summarization. Advances in Neural Information Processing Systems, 34:16545–16557, 2021.
Meister et al. (2020) Meister, C., Cotterell, R., and Vieira, T. If beam search is the answer, what was the question? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2173–2185, 2020.
Mnih et al. (2013) Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
Paszke et al. (2019) Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
Schlichtkrull et al. (2018) Schlichtkrull, M., Kipf, T. N., Bloem, P., Berg, R. v. d., Titov, I., and Welling, M. Modeling relational data with graph convolutional networks. In European semantic web conference, pp. 593–607. Springer, 2018.
Schwaller et al. (2019) Schwaller, P., Laino, T., Gaudin, T., Bolgar, P., Hunter, C. A., Bekas, C., and Lee, A. A. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS central science, 5(9):1572–1583, 2019.
Schwaller et al. (2020) Schwaller, P., Petraglia, R., Zullo, V., Nair, V. H., Haeuselmann, R. A., Pisoni, R., Bekas, C., Iuliano, A., and Laino, T. Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chemical science, 11(12):3316–3325, 2020.
Segler et al. (2018) Segler, M. H., Preuss, M., and Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic ai. Nature, 555(7698):604–610, 2018.
Shi et al. (2020) Shi, C., Xu, M., Guo, H., Zhang, M., and Tang, J. A graph to graphs framework for retrosynthesis prediction. In International conference on machine learning, pp. 8818–8827. PMLR, 2020.
Somnath et al. (2021) Somnath, V. R., Bunne, C., Coley, C., Krause, A., and Barzilay, R. Learning graph models for retrosynthesis prediction. Advances in Neural Information Processing Systems, 34:9405–9415, 2021.
Vaswani et al. (2017) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. Attention is all you need. Advances in neural information processing systems, 30, 2017.
Wan et al. (2022) Wan, Y., Hsieh, C.-Y., Liao, B., and Zhang, S. Retroformer: Pushing the limits of end-to-end retrosynthesis transformer. In International Conference on Machine Learning, pp. 22475–22490. PMLR, 2022.
Wang et al. (2019) Wang, M., Zheng, D., Ye, Z., Gan, Q., Li, M., Song, X., Zhou, J., Ma, C., Yu, L., Gai, Y., Xiao, T., He, T., Karypis, G., Li, J., and Zhang, Z. Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315, 2019.
Wang et al. (2021) Wang, X., Li, Y., Qiu, J., Chen, G., Liu, H., Liao, B., Hsieh, C.-Y., and Yao, X. Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions. Chemical Engineering Journal, 420:129845, 2021.
Weininger (1988) Weininger, D. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1):31–36, 1988.
Xie et al. (2022) Xie, S., Yan, R., Han, P., Xia, Y., Wu, L., Guo, C., Yang, B., and Qin, T. Retrograph: Retrosynthetic planning with graph search. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2120–2129, 2022.
Yan et al. (2020) Yan, C., Ding, Q., Zhao, P., Zheng, S., Yang, J., Yu, Y., and Huang, J. Retroxpert: Decompose retrosynthesis prediction like a chemist. Advances in Neural Information Processing Systems, 33:11248–11258, 2020.

RCsearcher: Reaction Center Identification in Retrosynthesis via Deep Q-Learning