A Neural Edge-Editing Approach for Document-Level Relation Graph Extraction

Kohei Makino
&Makoto Miwa
Toyota Technological Institute
2-12-1 Hisakata, Tempaku-ku, Nagoya, 468-8511, Japan
{sd21505, makoto-miwa, yutaka.sasaki}@toyota-ti.ac.jp
&Yutaka Sasaki

Abstract

In this paper, we propose a novel edge-editing approach to extract relation information from a document. We treat the relations in a document as a relation graph among entities in this approach. The relation graph is iteratively constructed by editing edges of an initial graph, which might be a graph extracted by another system or an empty graph. The way to edit edges is to classify them in a close-first manner using the document and temporally-constructed graph information; each edge is represented with a document context information by a pretrained transformer model and a graph context information by a graph convolutional neural network model. We evaluate our approach on the task to extract material synthesis procedures from materials science texts. The experimental results show the effectiveness of our approach in editing the graphs initialized by our in-house rule-based system and empty graphs.¹¹1The source code is available at https://github.com/tti-coin/edge-editing.

1 Introduction

Relation extraction (RE), the task to predict relations between pairs of given entities from literature, is an important task in natural language processing. While most existing work focused on sentence-level RE Zeng et al. (2014), recent studies extended the extraction to the document level since many relations are expressed across sentences Christopoulou et al. (2019); Nan et al. (2020).

In document-level RE, models need to deal with relations among multiple entities over a document. Several document-level RE methods construct a document-level graph, which is built on nodes of words or other linguistic units, to capture document-level interactions between entities Christopoulou et al. (2019); Nan et al. (2020). However, such methods do not directly consider interactions among relations in a document, while such relations are often dependent on each other, and other relations can be considered as important contexts for a relation.

Refer to caption — Figure 1: Overview of edge editing approach

We propose a novel, iterative, edge-editing approach to document-level RE. The overview of our approach and an example of the extraction results are illustrated in Figure 1. Our approach treats relations as a relation graph that is composed of entities as nodes and their relations as edges. The relation graph is first initialized using the edges predicted by an existing RE model if provided. Edges are then edited by a neural edge classifier that represents edges using the document information, prebuilt graph information, and the current edge information. The document information is represented with pretrained Longformer models Beltagy et al. (2020), while the graph information is represented with graph convolutional networks Kipf and Welling (2017). Edges are edited iteratively in a close-first manner so that the approach can utilize the information of edges between close entity pairs in editing edges of distant entity pairs, which are often difficult to predict. We evaluate our approach on the task to extract synthesis procedures from text Mysore et al. (2019) and show the effectiveness of our approach.

The contribution of this paper is three-fold. First, we propose a novel edge-editing approach for document-level RE that utilizes contexts in both relation graphs and documents. Second, we build a strong rule-based model and show that our approach can effectively utilize and enhance the output of the rule-based model. Third, we build and evaluate a neural model for extracting synthesis procedures from text for the first time.

2 Approach

Our approach extracts a relation graph on given entities from a document. We formulate the extraction task as an edge-editing task, where the approach iteratively edits edges with a neural edge classifier in a close-first manner Miwa and Sasaki (2014).

2.1 Iterative Edge Editing

We build a relation graph by editing the edges iteratively using the edge classifier in Section 2.2. The building finishes when all edges are edited. The edges are edited in a close-first manner Miwa and Sasaki (2014); Ma et al. (2019) that edits the close edges first and far edges later. The distance between the entity pair is defined based on the appearing order of entities in a document; if two entities in a pair appear $m$ -th and $m+3$ -th, the distance becomes $3$ . Note that each edge is edited only once throughout the entire editing process.

Algorithm 1 shows the method to build the graph by the iterative edge editing. To reduce the computational cost, the pairs with the same distance are edited simultaneously and the pairs with distances more than or equal to the maximum distance $d_{max}$ are edited simultaneously. This reduces the number of edits from $|\mathcal{N}|^{2}$ to $d_{max}$ .

\mathrm{Distance}({\mathcal{N}},d_{1},d_{2})

returns pairs that have distance

d

(

d_{1}\leq d<d_{2})

\mathrm{doc}

: document,

\mathcal{E}

: initial edges

d_{max}

: maximum distance

\mathcal{E}

: edited edges

\bar{\mathcal{N}}\Leftarrow\mathrm{EncodeNode}(\mathrm{doc},\mathcal{N})

while

d~{}\mathrm{in}~{}\mathrm{range}(\max(|{\mathcal{N}}|,d_{max}))

\bar{\mathcal{N}}^{G}\Leftarrow\mathrm{GCN}(\bar{\mathcal{N}},\mathcal{E})

\bar{\mathcal{E}}\Leftarrow\mathrm{EncodeEdge}(\bar{\mathcal{N}}^{G},\mathcal{E})

d=d_{max}

then

\mathcal{P}\Leftarrow\mathrm{Distance}(\mathcal{N},d_{max},\infty)

else

\mathcal{P}\Leftarrow\mathrm{Distance}(\mathcal{N},d,d+1)

end if

while

(i,j)~{}\mathrm{in}~{}\mathcal{P}

\mathcal{E}_{ij}\Leftarrow\mathrm{ClassifyEdge}(\bar{\mathcal{E}}_{ij})

end while

Algorithm 1 Iterative Edge Editing

2.2 Edge Classifier

An edge classifier predicts the class of the target edge $\hat{\mathcal{E}}_{ij}$ from inputs that are composed of a document information $\mathrm{doc}$ , a graph of nodes $\mathcal{N}$ and edges $\mathcal{E}$ , and the node pair $(\mathcal{N}_{i},\mathcal{N}_{j})$ of a target edge. The classifier composed of three modules:
EncodeNode that produces document-based node representations $\bar{\mathcal{N}}$ using the document $\mathrm{doc}$ and the entity information of the nodes $\mathcal{N}$ .
EncodeEdge that obtains the representation of edges $\bm{\bar{}}{\mathcal{E}}$ that applies GCN on a prebuilt graph with the node representations $\bar{\mathcal{N}}$ and edges $\mathcal{E}$ .
ClassifyEdge that predicts the class of the edge $\hat{\mathcal{E}}_{ij}$ using the edge representation $\bar{\mathcal{E}}_{ij}$ between the node pair $(\mathcal{N}_{i},\mathcal{N}_{j})$ .
We explain the details of these modules in the remaining part of this section.

$\mathrm{EncodeNode}$ employs Longformer Beltagy et al. (2020) to obtain the document-level representation. It aggregates subword representations within each entity by max-pooling $\mathrm{Pool}$ and concatenates the aggregated information with the entity’s class label representation $\bm{v}^{lab}$ .

	$\displaystyle\bar{\mathcal{N}}$	$\displaystyle=$	$\displaystyle\mathrm{EncodeNode}(\mathrm{doc},\mathcal{N})$		(1)
		$\displaystyle=$	$\displaystyle[\mathrm{Pool}(\mathrm{Longformer}(\mathrm{doc}));\bm{v}^{lab}],$		(1)

where $[\cdot;\cdot]$ denotes concatenation.

To prepare the input to $\mathrm{EncodeEdge}$ , the obtained document-based node representation is enriched by GCN to introduce the context of each node in the prebuilt graph: $\bar{\mathcal{N}}^{G}=\mathrm{GCN}(\bar{\mathcal{N}},\mathcal{E})$ . We add inverse directions to the graph and assign different weights to different classes in graph convolutional network (GCN) following Schlichtkrull et al. (2018). The produced node representation $\bar{\mathcal{N}}^{G}$ includes both document and prebuilt graph contexts.

$\mathrm{EncodeEdge}$ produces the edge representation $\bar{\mathcal{E}}$ from $\bar{\mathcal{N}}^{G}$ . It individually calculates the representation of the edge $\bar{\mathcal{E}}_{ij}$ for each pair of nodes $(\mathcal{N}_{i},\mathcal{N}_{j})$ by combining the representations of nodes similarly to Zhou et al. (2021) with the embedding of the distance of the entity pair $\bm{b}_{ij}$ and the edge class $\bm{e}_{ij}^{old}$ before editing. The distance between the entity pairs is calculated in the same way as in Section 2.1. If the distance exceeds a predefined maximum distance, it will be treated as the maximum distance. We prepare fully connected (FC) layers, $\mathrm{FC}^{H}$ and $\mathrm{FC}^{T}$ , for the start point (head) and end point (tail) nodes and calculate the edge representation as follows:

	$\displaystyle\bar{\mathcal{E}}_{ij}=$	$\displaystyle\mathrm{EncodeEdge}(\bar{\mathcal{N}}^{G},\mathcal{E})_{ij}$
	$\displaystyle=$	$\displaystyle[\mathrm{FC}^{H}(\bar{\mathcal{N}}_{i}^{G})^{\top}\bm{W}\mathrm{FC}^{T}(\bar{\mathcal{N}}_{j}^{G});\bm{b}_{ij};\bm{e}_{ij}^{old}],$			(2)

where $\bm{W}$ denotes a trainable weight parameter.

$\mathrm{ClassifyEdge}$ classifies the target edge $\mathcal{E}_{ij}$ into a relation class or no relation. It applies a dropout layer Srivastava et al. (2014), a FC layer for output $\mathrm{FC}^{out}$ and softmax to the edge representation $\bar{\mathcal{E}}_{ij}$ to predict the class $\hat{\mathcal{E}}_{ij}$ with the highest probability.

	$\displaystyle{\hat{\mathcal{E}}}_{ij}=$	$\displaystyle\mathrm{ClassifyEdge}(\bm{\bar{}}{\mathcal{E}}_{ij})=\mathop{\rm arg~{}max}\limits\bm{\hat{p}}_{ij}$
	$\displaystyle\bm{\hat{p}}_{ij}=$	$\displaystyle\mathrm{Softmax}(\mathrm{FC}^{out}(\mathrm{Dropout}(\bar{\mathcal{E}}_{ij})))$			(3)

We maximize the log-likelihood in training the edge classifier.

3 Experiments

3.1 Experimental Settings

We evaluate our approach on the materials science procedural text corpus Mysore et al. (2019). In the corpus, the synthesis procedures are annotated as a graph in a document, where 19 node types such as materials, operations, and conditions and 15 directed relation types are defined. The corpus consists of 200 documents for training, 15 for development, and 15 for test. The statistics of the corpus are shown in Appendix A. We chose this corpus since this corpus is publicly available, manually annotated, and it deals with a dense document-level relation graph.

We prepared a rule-based model (Rule) as a baseline and as an existing model to initialize the edges, which was adapted from the rule-based system in Kuniyoshi et al. (2020). The rules are summarized in Appendix B.

We employ the micro F-score for each relation class as the evaluation metric. We tune the hyper-parameters such as the number and dimensions of layers and dropout rate on the development set using the hyper-parameter optimization framework Optuna Akiba et al. (2019) and the details are shown in Appendix C. We employ the Adam Kingma and Ba (2015) optimizer with the default parameters in PyTorch Paszke et al. (2019) except for the learning rate. The training was performed without finetuning for the Longformer because the corpus is small to train a large transformer model.

We compare the following models on graphs initialized by the rule-based model (with Rule) and empty graphs (without Rule).
Edit: Proposed model
Edit-IE: Edit without iterative edge editing, i.e., $d_{max}=1$ .
Edit-GCN: Edit without GCN by replacing $\bar{\mathcal{N}}^{G}$ with $\bar{\mathcal{N}}$ in Equation (2)
Random Edit: Edit with random-order editing
Additionally, we evaluate the following model with randomly initialized graphs.
Random Init: Edit with randomly connected edges, the number of which is same as that of the extraction results of Rule, with random classes

Note that although we did not provide the direct comparison with the existing models, our Edit-GCN without Rule is similar to BRAN Verga et al. (2018); the only differences are that we use Longformer Beltagy et al. (2020) instead of transformers, and NER training is not included. Moreover, most of the models for the document-level RE require dataset annotating both entities and their mentions, so the existing models like ATLOP Zhou et al. (2021) cannot be directly applied to the current task.

3.2 Results without Rule

	Dev	Test
Edit	0.788	0.729
Edit-IE	0.732	0.685
Edit-GCN	0.744	0.703
Random Edit	0.751	0.690
Random Init	0.756	0.720

Table 1: Evaluation results in micro F-score without Rule

We show the results with empty initial graphs in Table 1. Edit shows the highest scores and this indicates the effectiveness of our approach when the initial graphs are empty. When we compare Edit, Edit-IE, and Random Edit, we find that both iterative edge editing and close-first strategy are effective. Since Edit-GCN extracts from context without graph structure information, the better performance of Edit over Edit-GCN shows the effectiveness of the information in the graph structure. The low performance with Random Init shows that the edge information needs to be reliable.

3.3 Results with Rule

	Dev	Test
Rule	0.797	0.807
Edit	0.878	0.851
Edit-IE	0.863	0.863
Edit-GCN	0.857	0.834
Random Edit	0.791	0.744

Table 2: Evaluation results in micro F-score with Rule

We summarize the results with Rule in Table 2. We show the detailed results for Edit without Rule, Rule, and Edit-IE with Rule in Appendix D.

When we compare the results with Table 1, the performance with Rule is better than the counterpart without Rule for all the settings. Furthermore, all the scores in Table 2 are better than those in Table 1, which shows the strength of Rule.

Surprisingly, the results with our approach are better than that of Rule even though Rule is better than our approach without Rule. This indicates our Edit approach can make the prediction accurate. We can conclude that our Edit approach can utilize the information from the rule-based model and the initialization of the edges by Rule is useful.

As for the performance of the models, most results are consistent with Table 1 except that Edit-IE shows the highest score on the test set. This may be partly because the initial graph by Rule is already reliable and editing does not help to improve the context. Results with Random Edit support this since the performance degradation with Random Edit is large compared to Table 1 and Random Edit is harmful in this case. Moreover, the different behaviors on the development and test sets indicate an imbalance in the corpus split.

4 Case Study

[Uncaptioned image] — Figure 2: Example document

We illustrated 6 graphs for an example document Zhang et al. (2007) in the development data set shown in Figure 6: the result on the right side of Figure 1 shows our best extraction result using Edit-IE with Rule; Figure 6 shows the correct extraction; Figure 6 shows the extraction result using Edit without Rule; Figure 6 shows the extraction result using Rule; and Figure 6 shows the extraction result using Edit with Rule. Figure 6 shows the material synthesis starts from mixed with materials SrCO3, MoO3 and Ni to prefired and so on, and the material SrMo1-xNixO4 is synthesized. When we compare Figure 6 with Figure 6, the extraction results are similar to Rule. Although the overall performance is low, Figure 6, which does not depend on the rule, extracts relations that are not extracted by the other systems and this shows the models with Rule and without Rule capture different relations.

5 Related Work

RE has been widely studied to identify the relation between two entities in a sentence. In addition to traditional feature/kernel-based methods Zelenko et al. (2003); Miwa and Sasaki (2014), many neural RE methods have been proposed based on convolutional neural networks (CNNs) Zeng et al. (2014), recurrent neural networks (RNNs) Xu et al. (2015); Miwa and Bansal (2016), graph convolutional networks (GCNs) Zhang et al. (2018); Schlichtkrull et al. (2018), and transformers Wang et al. (2019). However, sentence-level RE is not enough to cover the relations in a document, and document-level RE has increasingly received research attention in recent years.

Major approaches for document-level RE are graph-based methods and transformer-based methods. For graph-based methods, Quirk and Poon (2017) first proposed a document graph for document-level RE. Christopoulou et al. (2019) constructed a graph that included heterogeneous nodes such as entity mentions, entities, and sentences and represented edges between entities from the graph. Nan et al. (2020) proposed the automatic induction of a latent graph for relational reasoning across sentences. The document graphs in these methods are defined on nodes of linguistic units such as words and sentences, which are different from our relation graphs. Unlike our method, these methods do not directly deal with relation graphs among entities.

For transformer-based methods, Verga et al. (2018) introduced a method to encode a document with transformers to obtain entity embedding and classify the relations between entities using the embedding. Tang et al. (2020) proposed a Hierarchical Inference Network (HIN) for document-level RE, which aggregates information from entity level to document level. Zhou et al. (2021) tackled document-level RE with an Adaptive Thresholding and Localized cOntext Pooling (ATLOP) model that introduces a learnable entity-dependent threshold for classification and aggregated local mention-level contexts that are relevant to both entities.

Several studies focus on procedural texts such as cooking recipes Bosselut et al. (2018), scientific processes Dalvi et al. (2018) and open domain procedures Tandon et al. (2020). They, however, do not directly treat relation graphs. Several efforts have been made to annotate procedural or action graphs in procedural text Mori et al. (2014); Mysore et al. (2019); Kuniyoshi et al. (2020). Kuniyoshi et al. (2020) and Mehr et al. (2020) individually proposed rule-based systems to extract procedures from a document, but no neural methods have been proposed for the extraction.

6 Conclusions

We proposed a novel edge editing approach for document-level relation extraction. This approach treats the task as the edge editing of relation graphs, given nodes. It edits edges considering contexts in the document and the relation graph. We evaluated the approach on the material synthesis procedure corpus, and the results showed the usefulness of initializing edges by the rule-based model, utilizing prebuilt graph information for editing and editing in a close-first manner. As a result, our model performed an F-score of 86.3% for edge prediction.

In future work, we plan to improve the approach to obtain more consistent and accurate relation graphs. We also would like to apply the approach to other data sets such as cooking recipes Mori et al. (2014) and temporal graphs Pustejovsky et al. (2003); Cassidy et al. (2014).

References

Akiba et al. (2019) Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
Beltagy et al. (2020) Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv:2004.05150.
Bergstra et al. (2011) James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems, volume 24, pages 2546–2554. Curran Associates, Inc.
Bosselut et al. (2018) Antoine Bosselut, Omer Levy, Ari Holtzman, Corin Ennis, Dieter Fox, and Yejin Choi. 2018. Simulating action dynamics with neural process networks. In International Conference on Learning Representations.
Cassidy et al. (2014) Taylor Cassidy, Bill McDowell, Nathanael Chambers, and Steven Bethard. 2014. An annotation framework for dense event ordering. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 501–506, Baltimore, Maryland. Association for Computational Linguistics.
Christopoulou et al. (2019) Fenia Christopoulou, Makoto Miwa, and Sophia Ananiadou. 2019. Connecting the dots: Document-level neural relation extraction with edge-oriented graphs. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4925–4936, Hong Kong, China. Association for Computational Linguistics.
Dalvi et al. (2018) Bhavana Dalvi, Lifu Huang, Niket Tandon, Wen-tau Yih, and Peter Clark. 2018. Tracking state changes in procedural text: a challenge dataset and models for process paragraph comprehension. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1595–1604, New Orleans, Louisiana. Association for Computational Linguistics.
Kingma and Ba (2015) Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
Kipf and Welling (2017) Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR).
Kuniyoshi et al. (2020) Fusataka Kuniyoshi, Kohei Makino, Jun Ozawa, and Makoto Miwa. 2020. Annotating and extracting synthesis process of all-solid-state batteries from scientific literature. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 1941–1950, Marseille, France. European Language Resources Association.
Li et al. (2020) Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Jonathan Ben-tzur, Moritz Hardt, Benjamin Recht, and Ameet Talwalkar. 2020. A system for massively parallel hyperparameter tuning. In Proceedings of Machine Learning and Systems, volume 2, pages 230–246.
Ma et al. (2019) Shuai Ma, Gang Wang, Yansong Feng, and Jinpeng Huai. 2019. Easy first relation extraction with information redundancy. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3851–3861, Hong Kong, China. Association for Computational Linguistics.
Mehr et al. (2020) S Hessam M Mehr, Matthew Craven, Artem I Leonov, Graham Keenan, and Leroy Cronin. 2020. A universal system for digitization and automatic execution of the chemical synthesis literature. Science, 370(6512):101–108.
Miwa and Bansal (2016) Makoto Miwa and Mohit Bansal. 2016. End-to-end relation extraction using LSTMs on sequences and tree structures. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1105–1116, Berlin, Germany. Association for Computational Linguistics.
Miwa and Sasaki (2014) Makoto Miwa and Yutaka Sasaki. 2014. Modeling joint entity and relation extraction with table representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1858–1869, Doha, Qatar. Association for Computational Linguistics.
Mori et al. (2014) Shinsuke Mori, Hirokuni Maeta, Yoko Yamakata, and Tetsuro Sasada. 2014. Flow graph corpus from recipe texts. In LREC, pages 2370–2377.
Mysore et al. (2019) Sheshera Mysore, Zachary Jensen, Edward Kim, Kevin Huang, Haw-Shiuan Chang, Emma Strubell, Jeffrey Flanigan, Andrew McCallum, and Elsa Olivetti. 2019. The materials science procedural text corpus: Annotating materials synthesis procedures with shallow semantic structures. In Proceedings of the 13th Linguistic Annotation Workshop, pages 56–64, Florence, Italy. Association for Computational Linguistics.
Nan et al. (2020) Guoshun Nan, Zhijiang Guo, Ivan Sekulic, and Wei Lu. 2020. Reasoning with latent structure refinement for document-level relation extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1546–1557, Online. Association for Computational Linguistics.
Paszke et al. (2019) Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc.
Pustejovsky et al. (2003) James Pustejovsky, Patrick Hanks, Roser Sauri, Andrew See, Robert Gaizauskas, Andrea Setzer, Dragomir Radev, Beth Sundheim, David Day, Lisa Ferro, et al. 2003. The TimeBank corpus. In Proceedings of Corpus Linguistics, pages 647–656, Lancaster, UK.
Quirk and Poon (2017) Chris Quirk and Hoifung Poon. 2017. Distant supervision for relation extraction beyond the sentence boundary. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 1171–1182, Valencia, Spain. Association for Computational Linguistics.
Schlichtkrull et al. (2018) Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In The Semantic Web, pages 593–607, Cham. Springer International Publishing.
Srivastava et al. (2014) Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958.
Tandon et al. (2020) Niket Tandon, Keisuke Sakaguchi, Bhavana Dalvi, Dheeraj Rajagopal, Peter Clark, Michal Guerquin, Kyle Richardson, and Eduard Hovy. 2020. A dataset for tracking entities in open domain procedural text. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6408–6417, Online. Association for Computational Linguistics.
Tang et al. (2020) Hengzhu Tang, Yanan Cao, Zhenyu Zhang, Jiangxia Cao, Fang Fang, Shi Wang, and Pengfei Yin. 2020. HIN: Hierarchical inference network for document-level relation extraction. In Advances in Knowledge Discovery and Data Mining, pages 197–209, Cham. Springer International Publishing.
Verga et al. (2018) Patrick Verga, Emma Strubell, and Andrew McCallum. 2018. Simultaneously self-attending to all mentions for full-abstract biological relation extraction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 872–884, New Orleans, Louisiana. Association for Computational Linguistics.
Wang et al. (2019) Haoyu Wang, Ming Tan, Mo Yu, Shiyu Chang, Dakuo Wang, Kun Xu, Xiaoxiao Guo, and Saloni Potdar. 2019. Extracting multiple-relations in one-pass with pre-trained transformers. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1371–1377, Florence, Italy. Association for Computational Linguistics.
Xu et al. (2015) Yan Xu, Lili Mou, Ge Li, Yunchuan Chen, Hao Peng, and Zhi Jin. 2015. Classifying relations via long short term memory networks along shortest dependency paths. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1785–1794, Lisbon, Portugal. Association for Computational Linguistics.
Zelenko et al. (2003) Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. 2003. Kernel methods for relation extraction. Journal of machine learning research, 3(Feb):1083–1106.
Zeng et al. (2014) Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, and Jun Zhao. 2014. Relation classification via convolutional deep neural network. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 2335–2344, Dublin, Ireland. Dublin City University and Association for Computational Linguistics.
Zhang et al. (2007) S.B. Zhang, Y.P. Sun, B.C. Zhao, X.B. Zhu, and W.H. Song. 2007. Influence of Ni doping on the properties of perovskite molybdates srmo1-xnixo3 (0.02 $\leq$ x $\leq$ 0.08). Solid State Communications, 142(12):671 – 675.
Zhang et al. (2018) Yuhao Zhang, Peng Qi, and Christopher D. Manning. 2018. Graph convolution over pruned dependency trees improves relation extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2205–2215, Brussels, Belgium. Association for Computational Linguistics.
Zhou et al. (2021) Wenxuan Zhou, Kevin Huang, Tengyu Ma, and Jing Huang. 2021. Document-level relation extraction with adaptive thresholding and localized context pooling. In Proceedings of the AAAI Conference on Artificial Intelligence.

Appendix A Statistics of the Materials Science Procedural Text Corpus

We present the statistics of the materials science procedural text corpus²²2https://github.com/olivettigroup/annotated-materials-syntheses proposed by Mysore et al. (2019). Table 3 and Table 4 summarize the numbers of entities and relations, respectively.

Appendix B Rule-based Relation Extraction Model

We built a rule-based model by defining the rules to extract relations between entity pairs for the materials science procedural text corpus Mysore et al. (2019). The rules were adapted from the rule-based model in (Kuniyoshi et al., 2020) for the target corpus. The rules depend on labels of the entities of an entity pair, distance, and the order of occurrence of the entities. According to the combination of labels of the entities, our rules are divided into three types: Operation–Operation, Operation–Material and other relations. In the following, the starting point of a relation is called head and the ending point is called tail, and an edge is denoted as Head–Tail.

B.1 Operation–Operation

The relation Operation–Operation takes only a Next_Operation label, which means the progress of operation.

Next_Operation: Close Operation entities are linked with the relation from the beginning to the end in the document order, in which the entities of Operation appear.

B.2 Operation–Material

For the edges of Operation–Material, there are five relation labels: Recipe_Precursor indicates the input of a material; Recipe_Target indicates the generation of a product; Participant_Material indicates the generation of an intermediate product; Solvent_Material indicates the solvent material of an operation; and Atmospheric_Material indicates the atmosphere of an operation.

Entity class	Train	Dev	Test
Material	4,271	277	316
Operation	3,249	212	242
Number	2,872	224	219
Condition-Unit	1,363	101	87
Material-Descriptor	1,214	67	89
Amount-Unit	1,193	96	98
Property-Misc	481	25	16
Condition-Misc	468	32	20
Synthesis-Apparatus	433	20	34
Nonrecipe-Material	329	33	25
Brand	291	30	27
Apparatus-Descriptor	165	10	9
Amount-Misc	149	14	7
Meta	128	12	13
Property-Type	124	10	4
Condition-Type	119	2	1
Reference	106	10	11
Property-Unit	92	7	8
Apparatus-Unit	89	6	16
Character.-Apparatus	54	2	11
Apparatus-Property-Type	26	0	6

Table 3: Entities in the materials science procedural text corpus

Relation class	train	dev	test
Next_Operation	2,898	184	202
Recipe_Precursor	876	67	89
Recipe_Target	363	31	22
Participant_Material	1,723	113	124
Solvent_Material	463	28	33
Atmospheric_Material	183	11	14
Property_Of	586	35	21
Condition_Of	1,810	132	107
Number_Of	2,805	219	209
Amount_Of	1,512	130	121
Descriptor_Of	1,495	91	102
Brand_Of	423	42	41
Type_Of	164	7	13
Apparatus_Of	455	20	36
Apparatus_Attr_Of	90	6	11
Coref_Of	267	12	14

Table 4: Relations in the materials science procedural text corpus

For Solvent_Material, Atmospheric_Material and Participant_Material labels, a dictionary is prepared manually for each label. The relations are linked from the nearest Operation to a Material in the sentence if the Material match in the dictionary since these relations take specific Material entities. The dictionary is included in the source code.

Recipe_Precursor is linked from all Material that do not match the dictionary of Solvent_Material, Atmospheric_Material, and Participant_Material to the nearest Operation. This rule-based model does not produce the relation Recipe_Target. The reason for these decisions is that it is difficult to classify these relations with simple rules.

B.3 Remaining Relations

The remaining 9 relation labels are defined between the other pairs of entity labels: Property_Of, which indicates a condition of a material; Condition_Of, which indicates a condition of an operation; Number_Of, which indicates the relationship between a number and a unit; Amount_Of, which indicates a condition of a quantity; Type_Of, which indicates a condition of a numerical condition; Brand_Of, which indicates the brand of a material or equipment; Apparatus_Of, which indicates equipment used in an operation; Apparatus_Attr_Of, which indicates a numerical condition of on equipment; and Descriptor_Of, which indicates other conditions. For these labels, the rules are defined based only on the labels of head and tail entities and the distance between them. We explain the detailed rules in the remainder of this section.

Property_Of: The relation can take Property-Unit or Property-Misc as the head and Material or Nonrecipe-Material as the tail. When Property-Unit is a head, it is linked with the nearest Material in the sentence. When Property-Misc is a head, it is linked to the nearest Material or Nonrecipe-Material in the sentence.

Condition_Of: Condition-Unit and Condition-Misc are linked to the nearest Operation with the relation in the sentence.

Number_Of: Number is linked to the nearest Property-Unit, Condition-Unit, or Apparatus-Unit that appear after the Number in the sentence.

Amount_Of: The relation is linked from Amount-Unit and Amount-Unit to the nearest Material or Nonrecipe-Material in the sentence.

Descriptor_Of: When Material-Descriptor is a head, it is linked to the nearest Material or Nonrecipe-Material in the sentence. When Apparatus-Descriptor is a head, it is linked to the nearest Synthesis-Apparatus in the sentence.

Apparatus_Of: The relation is linked from Synthesis-Apparatus and Characterization-Apparatus to the nearest Operation with the priority given to the Operation that appear before the Apparatus in the sentence.

Type_Of: Property-Type and Apparatus-Property-Type are linked to the nearest Property-Unit and Apparatus-Unit in the sentence with the relation, respectively. When Condition-Type is a head, it is linked to the nearest Condition-Unit that appears before the Condition-Type in the sentence.

Brand_Of: The relation is linked from Brand to the nearest entities that may have brands (i.e., Material, Nonrecipe-Material, Synthesis-Apparatus, and Characterization-Apparatus) in the sentence.

Apparatus_Attr_Of: Apparatus-Unit is linked to the nearest Synthesis-Apparatus or Characterization-Apparatus.

Coref_Of: The relation is not detected by the rules because it is difficult to describe rules.

Parameter	Range	Value
Learning rate	[1e-5, 1e-2)	0.001
No. of GCN layers	[0, 4]	3
$d_{max}$	[1, 10]	4
Dimension of hidden layers	[32, 128]	85
No. of $\mathrm{FC}^{out}$ layers	[1, 5]	4
No. of $\mathrm{FC}^{h}$ and $\mathrm{FC}^{t}$ layers	[1, 5]	1
Dropout rate	[0.0, 1.0)	0.46
Dimension of $\bm{e}_{ij}^{old}$	[1, 32]	3
Maximum distance for $\bm{b}_{ij}$	[1, 32]	3
Dimension of $\bm{b}_{ij}$	[1, 100]	1
Use bidirectional GCN	True or False	True

Table 5: Search space for optimization of hyper-parameters and the selected values after optimization

Appendix C Tuning Details

Relation	Prec.	Recall	F-score
Next_Operation	0.622	0.693	0.656
Recipe_Precursor	0.632	0.539	0.582
Recipe_Target	0.640	0.727	0.681
Participant_Material	0.641	0.476	0.546
Solvent_Material	0.491	0.818	0.614
Atmospheric_Material	0.733	0.786	0.759
Property_Of	0.773	0.810	0.791
Condition_Of	0.798	0.850	0.824
Number_Of	0.874	0.962	0.916
Amount_Of	0.722	0.645	0.681
Descriptor_Of	0.761	0.814	0.787
Brand_Of	0.567	0.415	0.479
Type_Of	0.900	0.692	0.783
Apparatus_Of	0.657	0.639	0.648
Apparatus_Attr_Of	0.769	0.909	0.833
Coref_Of	0.875	0.500	0.636
Overall	0.717	0.722	0.720

Table 6: Detailed results using Edit without Rule on the test set

Relation	Prec.	Recall	F-score
Next_Operation	0.990	0.881	0.932
Recipe_Precursor	0.730	0.414	0.528
Recipe_Target	0.000	0.000	0.000
Participant_Material	0.419	0.800	0.550
Solvent_Material	0.697	0.418	0.522
Atmospheric_Material	1.000	0.378	0.549
Property_Of	0.905	1.000	0.950
Condition_Of	0.963	0.981	0.972
Number_Of	0.943	0.961	0.952
Amount_Of	0.744	0.865	0.800
Descriptor_Of	0.931	0.979	0.955
Brand_Of	0.561	0.920	0.697
Type_Of	0.769	1.000	0.870
Apparatus_Of	0.972	0.854	0.909
Apparatus_Attr_Of	0.909	0.769	0.833
Coref_Of	0.000	0.000	0.000
Overall	0.807	0.808	0.807

Table 7: Detailed results with Rule on the test set

Relation	Prec.	Recall	F-score
Next_Operation	0.905	0.990	0.946
Recipe_Precursor	0.810	0.573	0.671
Recipe_Target	0.560	0.636	0.596
Solvent_Material	0.733	0.667	0.698
Participant_Material	0.624	0.790	0.698
Atmospheric_Material	0.778	1.000	0.875
Property_Of	0.905	0.905	0.905
Condition_Of	0.953	0.944	0.948
Number_Of	0.958	0.990	0.974
Amount_Of	0.854	0.868	0.861
Descriptor_Of	0.941	0.931	0.936
Brand_Of	0.880	0.537	0.667
Type_Of	1.000	0.692	0.818
Apparatus_Of	0.833	0.972	0.897
Apparatus_Attr_Of	0.769	0.909	0.833
Coref_Of	0.750	0.429	0.545
Overall	0.856	0.870	0.863

Table 8: Detailed results using Edit-IE with Rule on the test set

We tuned our model using a hyper-parameter optimization framework Optuna Akiba et al. (2019). We searched for the hyper-parameters that maximize micro-F scores within 600 trials on the development set. We employed the tree-structured Parzen estimator algorithm Bergstra et al. (2011) for the sampler and the successive halving algorithm Li et al. (2020) for the pruner with default options in Optuna. In each trial of the search, we trained our model for 100 epochs, which was confirmed by preliminary experiments to be sufficient for convergence. We searched hyper-parameters on 20 NVIDIA GPUs, which include Tesla V100, TITAN V, RTX 3090, and GTX TITAN Xp GPUs.

We defined the search space as shown in Table 5; the hyper-parameters for the search are composed of the learning rate for Adam, the number of GCN layers, the maximum edit distance $d_{max}$ , the dimensions of all hidden layers, the number of $\mathrm{FC}^{out}$ layers, the number of $\mathrm{FC}^{h}$ and $\mathrm{FC}^{t}$ layers, the dropout rate, the dimension of $\bm{e}_{ij}^{old}$ , the maximum distance and the dimension for $\bm{b}_{ij}$ and whether to use bidirectional GCNs or uni-directional GCNs. In the table, the range column shows the range of values to search and the final value column shows the rounded selected values after the optimization.

Appendix D Detailed Evaluation Results

Our editing models for evaluation are trained with a TITAN V GPU for Edit with Rule and a Tesla V100 GPU for the others. The training takes about 6 hours 30 minutes with Edit-IE using Rule and 21 hours with Edit not using Rule.

We show the detailed evaluation results with precision (Prec.), recall, and F-score on the test set in Table 6 for Edit without Rule, Table 7 for Rule, and Table 8 for Edit-IE without Rule. The results show the relations that are not covered by Rule, i.e., Recipe_Target and Coref_Of, are extracted by our approach, and for these classes, Edit without Rule show the better performance than the models with Rule. Some relations with high performance by Rule, including Next_Operation, Condition_Of, and Descriptor_Of, are extracted by Edit-IE with Rule in high performance. This shows our approach can effectively utilize the outputs of Rule.

A series of polycrystalline samples of SrMo1-xNixO4 (0.02 $<$ =x $<$ =0.08) were prepared through the conventional solid-state reaction method in air. Appropriate proportions of high-purity SrCO3, MoO3, and Ni powders were thoroughly mixed according to the desired stoichiometry, and then prefired at 900 [?]C for 24 h. The obtained powders were ground, pelletized, and calcined at 1000, 1100 and 1200 [?]C for 24 h with intermediate grinding twice. White compounds, SrMo1-xNixO4, were obtained. The compounds were ground and pressed into small pellets about 10 mm diameter and 2 mm thickness. These pellets were reduced in a H2/Ar (5%: 95%) flow at 920 [?]C for 12 h, and then the deep red colored products of SrMo1-xNixO3 were obtained. Figure 2: Example document
Figure 3: Gold graph for the document in Figure 6	Figure 4: Example extraction results from the document in Figure 6 by Edit without Rule	Figure 5: Example extraction results from the document in Figure 6 by Rule	Figure 6: Example extraction results from the document in Figure 6 by our Edit with Rule