Better Query Graph Selection for Knowledge Base Question Answering

Yonghui Jia, Wenliang Chen
Institute of Artificial Intelligence, School of Computer Science and Technology,
Soochow University, China
[email protected], [email protected]

Abstract

This paper presents a novel approach based on semantic parsing to improve the performance of Knowledge Base Question Answering (KBQA). Specifically, we focus on how to select an optimal query graph from a candidate set so as to retrieve the answer from knowledge base (KB). In our approach, we first propose to linearize the query graph into a sequence, which is used to form a sequence pair with the question. It allows us to use mature sequence modeling, such as BERT, to encode the sequence pair. Then we use a ranking method to sort candidate query graphs. In contrast to the previous studies, our approach can efficiently model semantic interactions between the graph and the question as well as rank the candidate graphs from a global view. The experimental results show that our system achieves the top performance on ComplexQuestions and the second best performance on WebQuestions.

Introduction

Knowledge Base Question Answering (KBQA) is a popular task defined to take natural language questions as input and return corresponding entities or attributes from knowledge bases, such as DBPedia (Auer et al. 2007) and Freebase (Bollacker et al. 2008). One representative line of approaches to KBQA builds on semantic parsing (SP) that converts input questions into formal meaning representations and then transforms them into query languages like SPARQL (Berant et al. 2013; Yih et al. 2015; Sun et al. 2020). There are two types of SP-based solutions. One is using generic meaning representations, such as $\lambda-DCS$ (Liang 2013). However, this type of solutions tends to suffer from the gap between the set of ontology or relationship in the meaning representations and the set in knowledge bases (Kwiatkowski et al. 2013).

Refer to caption — Figure 1: The example question and its corresponding query graph structure.

Another type of solutions is to use query graph to represent the semantics of questions, which is supposed to be able to overcome the issue mentioned above (Yih et al. 2015; Bao et al. 2016; Hu et al. 2017). Figure 1 presents an illustrating query graph whose nodes and edges correspond to the entities and relationships in a knowledge base. By using query graph as the representation, the process of KBQA can be divided into two steps: query graph generation and query graph selection. The former step constructs a set of query graph candidates from the input question, while the latter decides the optimal query graph that is used to retrieve the final answer. We can see that the component of query graph selection is critical to the overall performance of KBQA systems.

Query graph selection is essentially a matching task between the question and candidate query graphs. Existing systems put focus on encoding the query graphs with hand-crafted features (Yih et al. 2015; Luo et al. 2018). In these works, they first calculate the semantic similarity between the query graph and the question using cosine similarity function. Then the similarity score is used as a feature, together with other local features to represent the query graph and the question. Finally, they feed the feature representation into a one-layer neural network model to obtain the score. Previous approaches have achieved a certain success for this task. However, we argue that 1) simply using cosine distance function to measure the semantic similarity leads to the loss of interaction information between the graph and the question, and 2) the hand-crafted features are usually not robust and are often not necessary for deep neural networks.

To address the above problems, in this paper we propose to translate the matching between the question and the query graph into the matching between two sequences for naturally modeling the interaction between the question and query graph. To this end, we linearize the query graphs into sequences. This strategy makes the question and linearized query graphs are both in sequence format, which is more convenient for using mature sequence modeling methods such as BERT (Devlin et al. 2019) and GPT-3 (Brown et al. 2020). In addition, we select the optimal query graph with a ranking strategy, hoping to take the relationship between candidate query graphs into consideration. Inspired by learning to rank methods (Li 2011; Pîrtoacă, Rebedea, and Ruseti 2019; Han et al. 2020), we utilize the listwise strategy to sort candidate query graphs from a global view, instead of the pairwise strategy which is used in the previous study (Luo et al. 2018). Experimental results on two widely-used KBQA datasets demonstrate the effectiveness of our proposed approach: the best performance on ComplexQuestions and the second best on WebQuestions.

Overall, we make the following contributions.

•

We propose a novel approach for better query graph selection in KBQA. In our approach, we convert the query graph into the corresponding sequence format and thus the problem of matching between the question and the query graph is translated into the matching between two sequences. This allows us to use BERT to efficiently model interactions between the graph and the question. Moreover, our approach does not require any hand-crafted features.
•

In addition, we use the listwise strategy to sort the candidate query graphs which takes the relationship among the candidate graphs into consideration from a global view. Compared with the Pairwise Ranking used in (Luo et al. 2018), Listwise Ranking achieves better performance on both two datasets.

Our Approach

In this section, we describe our approach in detail. We divide the KBQA process into two subtasks: query graph generation and query graph selection. Formally, given a question $q$ and a knowledge base (KB), the semantics of $q$ is analyzed through the query graph generation, and a set of candidate query graphs $G=\{g_{1},g_{2},...,g_{n}\}$ is obtained. Then, an optimal query graph $g^{*}$ is selected from the candidate set $G$ through the query graph selection. Finally, we convert $g^{*}$ into the SPARQL format to retrieve the final answer to question $q$ .

Compared with the previous studies of using query graphs (Yih et al. 2015; Hu et al. 2017; Luo et al. 2018), we improve our system by using a different solution on query graph selection. The basic idea is that, we linearize each candidate query graph into a sequence and thus the problem of matching between the question and the candidate query graph becomes the matching between two sequences. To select the optimal query graph $g^{*}$ , we propose a ranking method with the listwise strategy, hoping to take the relationship between the candidate query graphs into consideration from a global view.

Query Graph Generation

The goal of the query graph generation is to map the question into a semantic representation in the form of graphs. In this step, we follow the procedure of the previous studies (Yih et al. 2015; Luo et al. 2018) to generate the candidate query graphs.

Given question $q$ , we first conduct focus nodes linking to identify four types of constraints in the question, which are entity, implicit type, time interval, and ordinal. For entity linking, we utilize the tool SMART (Yang and Chang 2015) to obtain (mention, entity) pairs. For type linking, we use word embeddings to calculate the similarity between consecutive sub-sequences in the question (up to three words) and all the type words in the knowledge base, and select the top-10 (mention, type) pairs according to the similarity scores. Regarding time word linking, we use regular expression matching to extract time information. As for ordinal number linking, we use a predefined ordinal vocabulary and “ordinal number+superlative” pattern to extract the integer expressions. ¹¹1About 20 superlative words, such as largest, highest, latest. We also use the entity enrichment method in Luo et al. (2018) to improve focus nodes linking. Figure 2(a) shows an example after focus nodes linking.

After focus nodes linking, we get the main path by performing a one-hop and two-hop search based on the linked entity words, as shown in Figure 2(b). Next, entity constraints are added to the nodes in the main path. Figure 2(c) shows the state after the operation of this step. Then we add type constraints, time constraints and ordinal constraints in turn, and finally get the query graph as shown in Figure 2(d).

Through the above procedure, we can obtain the candidate query graph set $G=\{g_{1},g_{2},...,g_{n}\}$ for query graph selection.

Query Graph Selection

Due to the existence of ambiguity, the query graph generation may produce more than one, often hundreds of, candidate query graphs. Thus it is necessary to apply a matching operation to select the optimal query graph $g^{*}$ from the candidates. In this section, we first describe how to convert each $g$ in $G$ into sequence $g^{s}$ . Then, we show how to encode the pair of $q$ and $g^{s}$ . Finally, we describe the selection process. Inspired by the research in learning to rank (Li 2011; Pîrtoacă, Rebedea, and Ruseti 2019; Han et al. 2020), we select query graph with different ranking strategies, i.e., Pointwise Ranking, Pairwise Ranking, and Listwise Ranking.

Transforming Query Graph into Sequence

The process of converting query graph $g$ to sequence $g^{s}$ can be regarded as the disassembly process of constructing the query graph. When constructing the query graph, we first search the main path and then add four constraints of type, entity, time, and ordinal to the main path. Therefore, the whole query graph structure contains at most five components, which is much simpler compared with the general graph structure. And more importantly, each component has a fixed semantic meaning.

Considering the fixed structure of the query graph, we transform the query graph into the corresponding sequence according to the predefined sub-paths order. Specifically, we divide the query graph into different sub-paths according to different components. Through graph decomposition, we can get five sub-path sequences: TypePath, EntityPath, TimePath, OrdinalPath and MainPath. For example, the EntityPath corresponding to the entity constraint “Prime minister” in Figure 3 is “basic title prime minister.”. Finally, the five sub-path sequences are combined to form the corresponding query graph sequence. It is worth noting that we use some additional tokens, [unused0-3], to separate different sub-path sequences. As shown in Figure 3, the query graph sequence is “people person. [unused0] basic title prime minister. [unused1] from after 1980. [unused2] height max 1. [unused3] spain governing officials – office holder [A]”, in which ‘[A]’ is the real answer string, not just unified padding.

Encoding Query Graph Sequence and Question

After query graph $g$ is converted into sequence $g^{s}$ , the task of matching question $q$ and query graph $g$ becomes a task of matching question $q$ and query graph sequence $g^{s}$ . This allows us to naturally use mature sequence encoding models, such as BERT (Devlin et al. 2019), GPT-3 (Brown et al. 2020), and so on, which can establish the interactive information of two sequences.

We choose the BERT architecture as our encoder, which has been widely used in natural language processing in recent years. BERT is a pre-trained language model based on the bidirectional Transformer architecture (Vaswani et al. 2017), which can be used to encode a single sentence or a sentence pair. To introduce the interactive information between the question and query graph sequence, we use the structure of encoding the sentence pair in BERT. The encoding framework is shown in Figure 4(a). Given the question $q=\{w_{1},w_{2},...,w_{m}\}$ and the query graph sequence $g^{s}=\{u_{1},u_{2},...,u_{n}\}$ , we connect $q$ and $g^{s}$ through special tags to form the sentence pair, denoted as $p_{qg^{s}}=\{[CLS],w_{1},...,w_{m},[SEP],u_{1},...,u_{n},[SEP]\}$ . Each candidate query graph $g$ in set $G$ can form a sentence pair $p_{qg^{s}}$ with the corresponding question $q$ . Then the pairs are fed to BERT for encoding one by one. And we use the output of the $[CLS]$ node of BERT as the semantic representation of the question and query graph sequence, denoted as f.

Ranking Query Graphs

In this section, we rank the candidates with three different strategies, namely Pointwise Ranking, Pairwise Ranking and Listwise Ranking, respectively. These three optimization methods correspond to three typical ranking strategies in information retrieval (Li 2011). Luo et al. (2018) use the pairwise strategy in their system and achieve a certain success.

Before performing ranking, we preprocess the training data. According to whether the correct answer can be retrieved, the candidates can be grouped into two sets: $G^{+}$ and $G^{-}$ , where $G^{+}$ includes the positive graphs and $G^{-}$ includes the negative ones. We use $g_{i}^{+}$ and $g_{j}^{-}$ to denote a positive graph and a negative graph, respectively. Each graph $g_{i}$ in the two sets is encoded as representation $\textbf{f}_{i}$ . After that, $\textbf{f}_{i}$ is fed into the linear layer to get a score $s_{i}$ that indicates the possibility of $g_{i}$ being the optimal query graph.

Pointwise Ranking.

The Pointwise Ranking processes the graphs one by one. When ranking candidate query graphs, the query graphs to be sorted do not need have a perfect order, only the difference between positive and negative graphs. That is, we treat the query graph ordering problem as a simple binary classification task. As shown in Figure 4(b), the query graph $g_{i}$ in Pointwise Ranking is optimized independently.

For each candidate query graph $g_{i}$ , it corresponds to the label $y_{i}\in\{1,0\}$ , where ‘1’ represents the label of a positive graph, and ‘0’ represents the label of a negative graph. In the optimization process, we use the cross-entropy loss function to optimize and select the query graph with the highest score as the optimal query graph $g^{*}$ . The optimized loss function is as follows,

s^{{}^{\prime}}_{i}=\frac{1}{1+e^{-s_{i}}},

(1)

L_{point}=-\sum y_{i}log(s^{{}^{\prime}}_{i})+(1-y_{i})log(1-s^{{}^{\prime}}_{i}).

(2)

Pairwise Ranking.

The Pairwise Ranking can model the mutual connection between two candidate elements, and realize the ranking by continuously optimizing the relative rank of the two elements. When using the pairwise strategy to rank candidate query graphs, we regard the ranking problem as a problem of how to distinguish positive query graphs and negative query graphs. That is, we first construct the form of pairs of positive and negative graphs, and then the order of positive and negative graphs is optimized by using the pairwise strategy, as shown in Figure 4(b).

For each pair of positive and negative query graphs $(g_{i}^{+},g_{j}^{-})$ , we can get the scores $s_{i}$ and $s_{j}$ through BERT and linear layer encoding, respectively. Then $s_{i}$ and $s_{j}$ are normalized to $s^{{}^{\prime}}_{i}$ and $s^{{}^{\prime}}_{j}$ by Equation (1). We use the hinge loss optimization function to optimize the scores so that the difference between the scores of the positive and negative query graphs is maintained at a fixed value $\lambda$ . The hinge loss is defined as follows,

L_{pair}=max\{0,\lambda-s^{{}^{\prime}}_{i}+s^{{}^{\prime}}_{j}\},

(3)

where $\lambda$ is set to 0.5.

Listwise Ranking.

The Listwise Ranking can also model the interconnection between all candidates, and can directly optimize the order of the entire candidate set. In this strategy, we construct a list of positive and negative graphs for global optimization. In the query graph selection, we don’t care too much about the ranking among positive graphs or the ranking among negative graphs. Our goal of global optimization is to successfully rank the positive graph in the first place.

The application of Listwise in query graph ranking is slightly different from the method used in traditional information retrieval. When constructing the training data, we select each positive graph and a fixed number of negative graphs to form a list $C=\{{g}_{0}^{+},{g}_{1}^{-},{g}_{2}^{-},…,{g}_{m}^{-}\}$ , whose label is $\{y_{0},y_{1},y_{2},...,y_{m}\}$ . The score of group $C$ after BERT encoding and linear layer mapping is recorded as $\{s_{0},s_{1},s_{2},…,s_{m}\}$ . During the training, we design the following optimization objective function,

s^{{}^{\prime}}_{i}=\frac{exp(s_{i})}{\sum_{i=0}^{m}exp(s_{i})},

(4)

L_{list}=-\sum_{i=0}^{m}y_{i}log(s^{{}^{\prime}}_{i})+(1-y_{i})log(1-s^{{}^{\prime}}_{i}).

(5)

Experiments

Experimental Setup

Datasets.

Dataset	train	validation	test
WebQ	3,023	755	2,032
CompQ	1,000	300	800

Table 1: The partitions of WebQuestions and ComplexQuestions.

We conduct experiments on two widely-used datasets: WebQuestions (WebQ) (Berant et al. 2013) ²²2https://nlp.stanford.edu/software/sempre/ and ComplexQuestions (CompQ) (Bao et al. 2016) ³³3https://github.com/JunweiBao/MulCQA/ tree/ComplexQuestions. The WebQ dataset contains both simple questions (84%) and complex reasoning questions (16%), which is close to natural language questions used by people in daily life. The dataset contains 5,810 question-answer pairs. The CompQ dataset is designed for complex question answering. The dataset contains a total of 2,100 question-answer pairs. Both WebQ and CompQ are divided into train set, validation set and test set, as shown in table 1. Both datasets use Freebase as the knowledge base ⁴⁴4https: //developers.google.com/freebase/, which has been widely used in the KBQA systems.

Implementation Details.

For encoding questions and query graphs, we utilize the BERT-base model. We choose the hyper-parameter settings according to the performance on the validation sets. Regarding the hyperparameters in the BERT-base model, we set the dropout ratio to 0.1 and the hidden size to 768. We use Adam as the optimizer and the learning rate is set to $5\times 10^{-5}$ . The maximum number of training epoch is set to 5. At the end of each epoch, we use the validation set to evaluate the model, and the model with the best performance on the validation set is selected as the final testing model. For performance evaluation, we report the average F1 score, as do in Berant et al. (2013).

One question left is how to construct training data for query graph selection. Given a question and its corresponding query graph candidates, we use the query graph whose answer has an F1 value greater than 0.1 as a positive graph and randomly sample negative graphs from the rest query graph candidates.

Main Results

Method	WebQ (F1%)	CompQ (F1%)
Pointwise	52.4	38.4
Pairwise	53.7	42.7
Listwise	55.3	44.4

Table 2: The comparison results of the three ranking strategies on the test sets.

Category	Method	WebQ(F1%)	CompQ(F1%)
	Yih et al. (2015)	52.5	-
	Bao et al. (2016)	52.4	40.9
Using Query Graph	Hu, Zou, and Zhang (2018)	53.6	-
	Luo et al. (2018)	52.7	42.8
	Lan and Jiang (2020)	-	43.3
	Berant et al. (2013)	36.4	-
Others	Jain (2016)	55.6	-
	Chen, Wu, and Zaki (2019)	51.8	-
	Xu et al. (2019)	54.6	-
Our	Listwise	55.3	44.4

Table 3: The comparison results with previous works on the test sets of WebQuestions and ComplexQuestions.

Table 2 shows the comparison results of the three ranking strategies. From the table, we can see Listwise Ranking and Pairwise Ranking outperform the Pointwise Ranking. This fact indicates the necessity of modeling the inter-relations between query graph candidates. In addition, we also find that the superiority of Listwise Ranking and Pairwise Ranking are more significant on CompQ than WebQ, which is in line with our intuition that complex questions may require more information to disambiguate query graph candidates. Listwise Ranking yields the best result on both two datasets. The reason may be that Listwise considers more than two graphs at once which has a global view when optimization, compared with Pairwise.

Table 3 shows the comparison results between our system with Listwise Ranking and the previous works on the test sets of WebQ and CompQ, where category “Using Query Graph” includes the previous systems that use query graph and “Others” includes the previous ones that do not use query graph. From the table, we can see that our system yields the best result on CompQ and the second best on WebQ among all the systems. Especially, when compared with the approaches using query graph, our system achieves the best performance.

Discussion and Analysis

Effect of Different Components in Query Qraph Selection

Sequence Info	WebQ (F1%)	CompQ (F1%)
All Path	55.3	44.4
w/o constrains	53.7	42.3
w/o answer	54.3	43.6

Table 4: The effect of different components in query graph sequence on query graph selection.

When representing query graphs, we propose to transform a query graph into a sequence which is composed of sub-paths of different types. In order to explore the effect of different components on the final performance, we conduct experiments by removing some components from our Listwise system. The experimental results are presented in Table 4, where “All Path” refers to our final system that includes the main path plus four constraint paths, “w/o constrains” refers to the system removing four constraint paths, and “w/o answer” refers to the system removing the answer string. The results show that by continuously accumulating different components, the system performance can be steadily improved. This indicates that all the components of the query graph sequence are useful for our system.

Error Analysis

In this section, we want to know the reasons why our system gives the wrong answers for many cases by error analysis. If the candidate set generated by the query graph generation does not include the correct answer, our system is not possible to find it. Here we check the cases where the candidate set includes the correct answer but our system (Listwise) fails to find it. We randomly select 100 cases and check them manually. The errors are summarized as follows:

Incorrect Query Graph Generation. There are some query graphs which can retrieve the correct answer, but actually are not correctly parsed. For example, the question “where was david berkowitz arrested?”, the query graph generation provides “david berkowitz places lived - location brooklyn, new york city” as a candidate. The candidate graph can retrieve the correct answer, but in fact is not correct to the question. Of 100 cases, we find that this type of errors contains 45 cases. As for this type, we have to improve the performance of query graph generation and the coverage of KB.

Incorrect Query Graph Selection. As for the other cases, the candidate set includes the correct query graph which can perfectly retrieve the answer. However, our system still fails to find it. These errors can be grouped into two categories. The first one (40%) is to select the graph that includes the incorrect relationship (main path) between the topic word and answer. The second one (15%) is to select incorrect constraints. To solve this type of errors, we may perform a deeper analysis to provide additional information for query graph selection.

Effect of Negative Examples on Ranking

To further explore the characteristics of the three ranking strategies of Pointwise Ranking, Pairwise Ranking, and Listwise Ranking, we select different numbers of negative graphs to build the systems. The performance is shown in Figure 5. From the figures, we find that when the number of negative graphs increases, the performance of all three systems first increases and then is in a relatively stable state. We also find that Listwise Ranking can yield a good performance with few negative samples. These facts indicate that we do not need too many negative graphs when training our systems.

Case Study

Question1: what type of breast cancer did sheryl crow have ?
True: sheryl crow condition meningioma.
False: sheryl crow films – film breast cancer: the path of wellness & healing.
Question2: what role did paul mccartney play in the beatles ?
True: member paul mccartney. the beatles member – role backing vocalist, lead vocalist, bass.
False: the beatles (tv series) regular cast – actor george harrison, john lennon, lance percival, paul frees, paul mccartney, ringo starr.

Table 5: The case study.

We analyze some specific examples on which Listwise performs better than Pointwise. Two typical examples are listed in table 5. For the question “what type of breast cancer did sheryl crow have?”, the true answer should be a type of cancer. Listwise Ranking can determine that ‘condition’ is the correct relation, but Pointwise Ranking chooses the wrong path that contains “breast cancer”. We argue that Listwise Ranking can better model the overall semantics of the sequence because it considers the relationship between the candidate query graphs during optimization, while Pointwise Ranking tends to focus on the semantics of local words. Besides, for the example “what role did paul mccartney play in the beatles?”, the correct query graph contains the true entity constraint. But Pointwise Ranking chooses a path without the entity constraint. This indicates to some extent that Pointwise Ranking is not effective enough in identifying constrain paths.

Related Work

Information retrieval (IR) and semantic parsing (SP) based approaches are two mainstreams for knowledge base question answering. Among them, IR-based methods (Yu et al. 2017; Gupta, Chinnakotla, and Shrivastava 2018; Chen, Wu, and Zaki 2019; Petrochuk and Zettlemoyer 2018; Zhao et al. 2019; Saxena, Tripathi, and Talukdar 2020) obtain relevant candidate answers according to the topic entity and then rank the answers to obtain the final result. The core of IR-based approaches is to identify the KB relation paths that the question refers to (Wu et al. 2019). For example, Dong et al. (2015) use multi-column Convolutional Neural Networks (CNN) to encode questions and paths to the same vector space and calculate the similarity. Hao et al. (2017) use Long Short-Term Memory (LSTM) instead of CNN for the same purpose.

Different from IR-based methods, SP-based approaches put more attention to the semantic analysis of the question (Bao et al. 2016). The basic process of SP-based approaches is to parse the semantics of the question into some meaning representation and then map the meaning representation with KB (Hu et al. 2017). For example, Berant et al. (2013) parse the question into $\lambda-DCS$ , and then map it to the knowledge base through alignment and bridging operations to obtain answers. Sun et al. (2020) design a novel skeleton grammar to express complex questions and improve the ability to parse complex questions. Query graph is also a widely-used meaning representation in SP-based systems. Yih et al. (2015) are the pioneer into query graph research for KBQA, which propose a staged query graph generation method for this task. Following this line, Luo et al. (2018) propose a complex query graph matching approach that simultaneously encodes multiple sub-paths to achieve better query graph representation. More recently, Lan and Jiang (2020) propose a method to expand multiple relations so that it can handle more complex questions. In contrast to previous works that mostly put focus on the representation of query graphs, we instead put focus on the phase of selecting the optimal query graph.

Conclusions

We present a novel semantic matching approach based on semantic parsing to improve the performance of Knowledge Base Question Answering (KBQA). In this paper, the process of KBQA is divided into two steps: query graph generation and query graph selection, and we put focus on the second step. In our approach, we linearize the query graphs into sequences. Then, we use BERT to encode the pair of the query graph sequence and the question to obtain the semantic representation. In addition, we select the optimal query graph with different ranking strategies, which take the relationship between candidate query graphs into consideration. Experimental results on two benchmark datasets demonstrate the effectiveness of our proposed approach. Specifically, our best-performance system achieves the top performance on ComplexQuestions and the second best performance on WebQuestions.

References

Auer et al. (2007) Auer, S.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; and Ives, Z. 2007. DBpedia: A nucleus for a web of open data. In Proceedings of ISWC, 722–735.
Bao et al. (2016) Bao, J.; Duan, N.; Yan, Z.; Zhou, M.; and Zhao, T. 2016. Constraint-based question answering with knowledge graph. In Proceedings of COLING, 2503–2514.
Berant et al. (2013) Berant, J.; Chou, A.; Frostig, R.; and Liang, P. 2013. Semantic parsing on Freebase from question-answer pairs. In Proceedings of EMNLP, 1533–1544.
Bollacker et al. (2008) Bollacker, K.; Evans, C.; Paritosh, P.; Sturge, T.; and Taylor, J. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of SIGMOD, 1247–1250.
Brown et al. (2020) Brown, T. B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; Agarwal, S.; Herbert-Voss, A.; Krueger, G.; Henighan, T.; Child, R.; Ramesh, A.; Ziegler, D. M.; Wu, J.; Winter, C.; Hesse, C.; Chen, M.; Sigler, E.; Litwin, M.; Gray, S.; Chess, B.; Clark, J.; Berner, C.; McCandlish, S.; Radford, A.; Sutskever, I.; and Amodei, D. 2020. Language Models are Few-Shot Learners. arXiv:2005.14165.
Chen, Wu, and Zaki (2019) Chen, Y.; Wu, L.; and Zaki, M. J. 2019. Bidirectional attentive memory networks for question answering over knowledge bases. In Proceedings of NAACL-HLT, 2913–2923.
Devlin et al. (2019) Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, 4171–4186.
Dong et al. (2015) Dong, L.; Wei, F.; Zhou, M.; and Xu, K. 2015. Question answering over Freebase with multi-column convolutional neural networks. In Proceedings of ACL, 260–269.
Gupta, Chinnakotla, and Shrivastava (2018) Gupta, V.; Chinnakotla, M.; and Shrivastava, M. 2018. Retrieve and re-rank: A simple and effective IR approach to simple question answering over knowledge graphs. In Proceedings of FEVER, 22–27.
Han et al. (2020) Han, S.; Wang, X.; Bendersky, M.; and Najork, M. 2020. Learning-to-Rank with BERT in TF-Ranking. arXiv:2004.08476.
Hao et al. (2017) Hao, Y.; Zhang, Y.; Liu, K.; He, S.; Liu, Z.; Wu, H.; and Zhao, J. 2017. An end-to-end model for question answering over knowledge base with cross-attention combining global knowledge. In Proceedings of ACL, 221–231.
Hu et al. (2017) Hu, S.; Zou, L.; Yu, J. X.; Wang, H.; and Zhao, D. 2017. Answering natural language questions by subgraph matching over knowledge graphs. IEEE Transactions on Knowledge and Data Engineering, 30(5): 824–837.
Hu, Zou, and Zhang (2018) Hu, S.; Zou, L.; and Zhang, X. 2018. A state-transition framework to answer complex questions over knowledge base. In Proceedings of EMNLP, 2098–2108.
Jain (2016) Jain, S. 2016. Question answering over knowledge base using factual memory networks. In Proceedings of the NAACL Student Research Workshop, 109–115.
Kwiatkowski et al. (2013) Kwiatkowski, T.; Choi, E.; Artzi, Y.; and Zettlemoyer, L. 2013. Scaling semantic parsers with on-the-fly ontology matching. In Proceedings of EMNLP, 1545–1556.
Lan and Jiang (2020) Lan, Y.; and Jiang, J. 2020. Query graph generation for answering multi-hop complex questions from knowledge bases. In Proceedings of ACL, 969–974.
Li (2011) Li, H. 2011. Learning to rank for information retrieval and natural language processing. Synthesis Lectures on Human Language Technologies, 4(1): 1–113.
Liang (2013) Liang, P. 2013. Lambda dependency-based compositional semantics. arXiv:1309.4408.
Luo et al. (2018) Luo, K.; Lin, F.; Luo, X.; and Zhu, K. 2018. Knowledge base question answering via encoding of complex query graphs. In Proceedings of EMNLP, 2185–2194.
Petrochuk and Zettlemoyer (2018) Petrochuk, M.; and Zettlemoyer, L. 2018. SimpleQuestions nearly solved: A new upperbound and baseline approach. In Proceedings of EMNLP, 554–558.
Pîrtoacă, Rebedea, and Ruseti (2019) Pîrtoacă, G.-S.; Rebedea, T.; and Ruseti, S. 2019. Answering questions by learning to rank–Learning to rank by answering questions. In Proceedings of EMNLP-IJCNLP, 2531–2540.
Saxena, Tripathi, and Talukdar (2020) Saxena, A.; Tripathi, A.; and Talukdar, P. 2020. Improving multi-hop question answering over knowledge graphs using knowledge base embeddings. In Proceedings of ACL, 4498–4507.
Sun et al. (2020) Sun, Y.; Zhang, L.; Cheng, G.; and Qu, Y. 2020. SPARQA: Skeleton-Based Semantic Parsing for Complex Questions over Knowledge Bases. In Proceedings of AAAI, 8952–8959.
Vaswani et al. (2017) Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. Attention is all you need. In Proceeddings of NeurIPS, 5998–6008.
Wu et al. (2019) Wu, P.; Huang, S.; Weng, R.; Zheng, Z.; Zhang, J.; Yan, X.; and Chen, J. 2019. Learning representation mapping for relation detection in knowledge base question answering. In Proceedings of ACL, 6130–6139.
Xu et al. (2019) Xu, K.; Lai, Y.; Feng, Y.; and Wang, Z. 2019. Enhancing key-value memory neural networks for knowledge based question answering. In Proceedings of NAACL, 2937–2947.
Yang and Chang (2015) Yang, Y.; and Chang, M. 2015. S-MART Novel tree-based structured learning algorithms applied to tweet entity linking. In Proceedings of ACL, 504–513.
Yih et al. (2015) Yih, S. W.-t.; Chang, M.-W.; He, X.; and Gao, J. 2015. Semantic parsing via staged query graph generation: Question answering with knowledge base. In Proceedings of ACL, 1321–1331.
Yu et al. (2017) Yu, M.; Yin, W.; Hasan, K. S.; Santos, C. d.; Xiang, B.; and Zhou, B. 2017. Improved neural relation detection for knowledge base question answering. In Proceedings of ACL, 571–581.
Zhao et al. (2019) Zhao, W.; Chung, T.; Goyal, A.; and Metallinou, A. 2019. Simple Question Answering with Subgraph Ranking and Joint-Scoring. In Proceedings of NAACL-HLT, 324–334.