This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\useunder

\ul

Training Entire-Space Models for Target-oriented Opinion Words Extraction

Yuncong Li Tencent IncShenzhenChina [email protected] Fang Wang Shenzhen UniversityShenzhenChina [email protected]  and  Sheng-Hua Zhong Shenzhen UniversityShenzhenChina [email protected]
(2022)
Abstract.

Target-oriented opinion words extraction (TOWE) is a subtask of aspect-based sentiment analysis (ABSA). Given a sentence and an aspect term occurring in the sentence, TOWE extracts the corresponding opinion words for the aspect term. TOWE has two types of instance. In the first type, aspect terms are associated with at least one opinion word, while in the second type, aspect terms do not have corresponding opinion words. However, previous researches trained and evaluated their models with only the first type of instance, resulting in a sample selection bias problem. Specifically, TOWE models were trained with only the first type of instance, while these models would be utilized to make inference on the entire space with both the first type of instance and the second type of instance. Thus, the generalization performance will be hurt. Moreover, the performance of these models on the first type of instance cannot reflect their performance on entire space. To validate the sample selection bias problem, four popular TOWE datasets containing only aspect terms associated with at least one opinion word are extended and additionally include aspect terms without corresponding opinion words. Experimental results on these datasets show that training TOWE models on entire space will significantly improve model performance and evaluating TOWE models only on the first type of instance will overestimate model performance111Data and code are available at https://github.com/l294265421/SIGIR22-TOWE.

target-oriented opinion words extraction, aspect-based sentiment analysis, sample selection bias
journalyear: 2022copyright: acmcopyrightconference: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval; July 11–15, 2022; Madrid, Spain.booktitle: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22), July 11–15, 2022, Madrid, Spainprice: 15.00isbn: 978-1-4503-8732-3/22/07doi: 10.1145/XXXXXX.XXXXXXccs: Information systems Sentiment analysis

1. Introduction

Aspect-based sentiment analysis (ABSA) (Hu and Liu, 2004; Pontiki et al., 2014, 2015, 2016) is a branch of sentiment analysis (Nasukawa and Yi, 2003; Liu, 2012). Target-oriented opinion words extraction (TOWE) (Fan et al., 2019) is a subtask of ABSA. Given a sentence and an aspect term occurring in the sentence, TOWE extracts the corresponding opinion words for the aspect term. For example, as shown in Figure 1, given the sentence “Try the rose roll (not on menu). ” and an aspect term “rose roll” appearing in the sentence, TOWE extracts the opinion word “Try”.

Refer to caption
Figure 1. Two examples of TOWE task. The words highlighted in blue represent the given aspect terms, while the words in orange represent the corresponding opinion words.
Refer to caption
Figure 2. Illustration of sample selection bias problem in TOWE modeling. Training space and evaluation space are composed of aspect terms associated with at least one opinion word (i.e. Type I instances). It is only part of the inference space which is composed of all aspect terms. Note that inference happens in real-world scenarios. In real-world scenarios, TOWE models cannot extract opinion words only for Type I instances since they can not know whether an aspect term has opinion words in advance.

TOWE has two types of instance. In the first type of instance, called Type I instance, aspect terms are associated with at least one opinion word. An example of Type I instance is shown in Figure 1 (a). In the second type of instance, called Type II instance, aspect terms don’t have corresponding opinion words. An example of Type II instance is shown in Figure 1 (b). However, previous studies (Fan et al., 2019; Wu et al., 2020; Pouran Ben Veyseh et al., 2020; Feng et al., 2021; Mensah et al., 2021; Jiang et al., 2021; Kang et al., 2021) trained and evaluated their models only on Type I instances and ignored Type II instances. The percentages of Type II instance in the four SemEval challenge datasets (Pontiki et al., 2014, 2015, 2016), which the four popular TOWE datasets only including Type I instances were built by Fan et al. (2019) based on, range from 9.05% to 32.33%. This indicates that there is a considerable amount of Type II instances and Type II instances should not be ignored.

Furthermore, as illustrated in Figure 2, ignoring Type II instances leads to a sample selection bias problem (Zadrozny, 2004; Ma et al., 2018). Specifically, TOWE models are trained with only Type I instances, while these models will be utilized to make inference on the entire space with both Type I and Type II instances. Thus, the generalization performance of trained models will be hurt. Moreover, the performance of these models on Type I instances cannot reflect their performance on entire space, i.e. real-world scenarios.

To validate the sample selection bias problem, four popular TOWE datasets containing only Type I instances are extended and additionally include Type II instances. Experimental results on these datasets show that training TOWE models on entire space will significantly improve model performance and evaluating TOWE models only on Type I instances will overestimate model performance.

2. Related Work

Target-oriented opinion words extraction (TOWE) extracts the corresponding opinion words from sentences for a given aspect term and is proposed by Fan et al. (2019). Moreover, Fan et al. (2019) built four TOWE datasets (i.e., Rest14, Lapt14, Rest15, and Rest16) based on four SemEval challenge datasets. The four SemEval challenge datasets include three restaurant datasets (i.e. Rest14, Rest15, and Rest16) from the SemEval Challenge 2014 Task 4 (Pontiki et al., 2014), SemEval Challenge 2015 task 12 (Pontiki et al., 2015), and SemEval Challenge 2016 task 5 (Pontiki et al., 2016), and a laptop dataset (i.e. Lapt14) from the SemEval Challenge 2014 Task 4. In the original SemEval challenge datasets, the aspect terms are annotated, but the opinion words and the correspondence with aspect terms are not provided. Thus Fan et al. (2019) annotated the corresponding opinion words for the annotated aspect terms. Note that, in the four TOWE datasets that Fan et al. (2019) built, only the sentences that contain pairs of aspect terms and opinion words are kept and only the aspect terms associated with at least one opinion term are used as instances. Fan et al. (2019) also proposed an Inward-Outward LSTM with Global context (IOG) for TOWE.

Later, several other models were proposed for TOWE. Pouran Ben Veyseh et al. (2020) proposed ONG including Ordered-Neuron Long Short-Term Memory Networks as well as GCN, and Jiang et al. (2021) proposed a novel attention-based relational graph convolutional neural network (ARGCN), both of which exploited syntactic information over dependency graphs to improve model performance. Feng et al. (2021) proposed Target-Specified sequence labeling with Multi-head Self-Attention for TOWE. Wu et al. (2020) leveraged latent opinions knowledge from resource-rich review sentiment classification datasets to improve TOWE task. Kang et al. (2021) concentrated on incorporating aspect term information into BERT. Mensah et al. (2021) conducted an empirical study to examine the actual contribution of position embeddings. These models obtained better performance. However, all these studies, following Fan et al. (2019), only used Type I instances to train and evaluate their models.

Table 1. Statistics of our new TOWE datasets. Ratio stands for the numbers of Type II instances to all instances in the dataset.
Datasets
#aspect
terms
Ratio (%)
Rest14-e training Type I instance 2138 28.35
Type II instance 846
validation Type I instance 500 29.58
Type II instance 210
test Type I instance 865 23.72
Type II instance 269
Lapt14-e training Type I instance 1304 32.33
Type II instance 623
validation Type I instance 305 30.21
Type II instance 132
test Type I instance 480 26.72
Type II instance 175
Rest15-e training Type I instance 864 9.05
Type II instance 86
validation Type I instance 212 14.86
Type II instance 37
test Type I instance 436 19.56
Type II instance 106
Rest16-e training Type I instance 1218 12.94
Type II instance 181
validation Type I instance 289 15.99
Type II instance 55
test Type I instance 456 25.49
Type II instance 156
Table 2. Performance of the models evaluated on entire space and Type I instances. All models are trained on Type I instances. Gains indicate how much higher the performance of the model evaluated on Type I instances is than the performance of the model evaluated on entire space. The best F1 score on entire space is marked in bold and the best F1 score on Type I instances is underlined.
Method Test instance type Rest14-e Lapt14-e Rest15-e Rest16-e
P R F1 P R F1 P R F1 P R F1
ARGCN Entire space 71.70 81.30 76.17 60.99 69.48 64.90 68.10 75.90 71.72 69.85 81.91 75.37
Type I instance 85.25 82.29 83.73 74.00 70.41 72.14 76.56 75.66 76.05 86.00 82.33 84.10
(Gains (%)) 18.89 1.22 9.93 21.32 1.33 11.15 12.43 -0.32 6.05 23.11 0.51 11.58
ARGCNbert Entire space 73.19 83.46 77.98 63.19 73.22 67.80 73.88 74.73 74.27 74.26 83.78 78.72
Type I instance 86.00 83.01 84.46 75.79 76.02 75.88 78.21 75.74 76.92 86.91 84.05 85.45
(Gains (%)) 17.50 -0.54 8.31 19.95 3.83 11.92 5.87 1.35 3.58 17.03 0.32 8.54
IOG Entire space 73.02 76.62 74.77 61.60 67.25 64.19 70.78 69.98 70.32 68.37 81.41 74.29
Type I instance 82.80 78.66 80.64 72.43 69.63 70.96 77.19 69.86 73.29 85.67 79.81 82.58
(Gains (%)) 13.39 2.66 7.85 17.57 3.54 10.55 9.05 -0.17 4.23 25.30 -1.97 11.15
IOGbert Entire space 73.38 85.92 79.14 60.41 80.32 68.92 71.03 81.70 75.96 69.69 90.31 78.62
Type I instance 86.50 85.81 \ul86.13 77.62 80.92 \ul79.22 78.88 81.70 \ul80.24 88.68 89.66 \ul89.16
(Gains (%)) 17.88 -0.14 8.84 28.50 0.75 14.95 11.05 0.00 5.64 27.25 -0.72 13.40
Table 3. Performance of the models trained on Type I instances and entire space. All models are evaluated on entire space. Gains indicate how much better the model trained on entire space is than the model trained on Type I instances. The best F1 score is marked in bold.
Method Training-validation instance type Rest14-e Lapt14-e Rest15-e Rest16-e
P R F1 P R F1 P R F1 P R F1
ARGCN Type I instance 71.70 81.30 76.17 60.99 69.48 64.90 68.10 75.90 71.72 69.85 81.91 75.37
Entire space 81.02 78.29 79.58 73.29 64.94 68.82 75.12 73.14 74.11 75.61 81.91 78.61
(Gains (%)) 13.00 -3.70 4.48 20.16 -6.54 6.04 10.31 -3.63 3.33 8.23 0.00 4.29
ARGCNbert Type I instance 73.19 83.46 77.98 63.19 73.22 67.80 73.88 74.73 74.27 74.26 83.78 78.72
Entire space 81.37 78.39 79.84 72.46 66.43 69.27 76.63 72.50 74.48 78.68 81.03 79.82
(Gains (%)) 11.17 -6.08 2.39 14.68 -9.27 2.17 3.73 -2.98 0.29 5.96 -3.28 1.39
IOG Type I instance 73.02 76.62 74.77 61.60 67.25 64.19 70.78 69.98 70.32 68.37 81.41 74.29
Entire space 75.78 74.68 75.19 71.08 62.91 66.71 75.81 67.38 71.29 75.66 77.75 76.68
(Gains (%)) 3.78 -2.53 0.56 15.39 -6.44 3.93 7.10 -3.71 1.38 10.66 -4.50 3.21
IOGbert Type I instance 73.38 85.92 79.14 60.41 80.32 68.92 71.03 81.70 75.96 69.69 90.31 78.62
Entire space 81.64 80.87 81.24 71.64 75.24 73.35 77.82 76.35 76.99 75.19 87.75 80.97
(Gains (%)) 11.26 -5.88 2.66 18.60 -6.32 6.43 9.56 -6.55 1.37 7.89 -2.83 3.00

3. Experimental Setup

3.1. Datasets and Metrics

To validate the sample selection bias problem in TOWE modeling, we built four new TOWE datasets (i.e. Rest14-e, Lapt14-e, Rest15-e, and Rest16-e) containing both Type I and Type II instances based on four popular TOWE datasets (i.e. Rest14, Lapt14, Rest15, and Rest16) (Fan et al., 2019) only including Type I instances. In our dataset names, the letter e stands for entire space. Specifically, Fan et al. (2019) built the four TOWE datasets by annotating the corresponding opinion words for the annotated aspect terms in four SemEval challenge datasets (Pontiki et al., 2014, 2015, 2016). However, only the aspect terms associated with at least one opinion word were kept and used as TOWE instances by Fan et al. (2019). To build our new TOWE datasets, first, our new TOWE datasets include all instances in the TOWE datasets built by Fan et al. (2019). Then, the aspect terms in the original SemEval challenge datasets, which don’t have corresponding opinion words and hence were excluded from the TOWE datasets built by Fan et al. (2019), are added as a part of our new TOWE datasets. The statistics of our new TOWE datasets are shown in Table 1.

Following previous works (Fan et al., 2019), we adopt evaluation metrics: precision (P), recall (R), and F1-score (F1). An extraction is considered as correct only when the opinion words from the beginning to the end are all predicted exactly as the ground truth.

Table 4. Performance of IOGbert trained on Type I instances and entire space. IOGbert is evaluated on Type I instances. The best F1 score is marked in bold.
Method Training-validation instance type Rest14-e Lapt14-e Rest15-e Rest16-e
P R F1 P R F1 P R F1 P R F1
IOGbert Type I instance 86.50 85.81 86.13 77.62 80.92 79.22 78.88 81.70 80.24 88.68 89.66 89.16
Entire space 88.37 80.79 84.40 80.23 75.73 77.90 82.295 76.06 79.01 89.84 88.32 89.06
Table 5. Case study. Incorrect predictions are marked in red.
Id Sentence Aspect term
Training-validation
instance type
Prediction Ground truth
chef Type I instance [”not”] []
1 Even when the chef is not in the house, the food and service are right on target . chef Entire space [] []
orange donut Type I instance [”never”] []
2 I never had an orange donut before so I gave it a shot . orange donut Entire space [] []
Entrees Type I instance [”classics”] []
3 Entrees include classics like lasagna, fettuccine Alfredo and chicken parmigiana. Entrees Entire space [] []
rose roll [”Try”] [”Try”]
menu Type I instance [”Try”] []
rose roll [] [”Try”]
4 Try the rose roll (not on menu). menu Entire space [] []

3.2. TOWE Models

We run experiments based on four TOWE models:

  • ARGCN (Jiang et al., 2021) first incorporates aspect term information by combining word representations with corresponding category embeddings with respect to the target tag of words. Then an attention-based relational graph convolutional network is used to learn semantic and syntactic relevance between words simultaneously. Finally, BiLSTM is utilized to capture the sequential information. Then obtained word representations are used to predict word tags {O,B,I}\{O,B,I\}.

  • ARGCNbert (Jiang et al., 2021) is the BERT (Devlin et al., 2018) version of ARGCN. The last hidden states of the pre-trained BERT are adopted as word representations and BERT is fine-tuned jointly.

  • IOG (Fan et al., 2019) uses an Inward-Outward LSTM to pass aspect term information to the left context and the right context of the aspect term, and obtains the aspect term-specific word representations. Then, a Bi-LSTM takes the aspect term-specific word representations as input and outputs the global contextualized word representations. Finally, the combination of the aspect term-specific word representations and the global contextualized word representations is used to predict word tags {O,B,I}\{O,B,I\}.

  • IOGbert is the BERT version of IOG. Specifically, the word embedding layer and Inward-Outward LSTM in IOG are replaced with BERT. Moreover, BERT takes “[CLS] sentence [SEP] aspect term [SEP]” as input.

We run all models for 5 times and report the average results on the test datasets.

4. Results

4.1. Evaluation on Entire Space

To observe the difference of TOWE model performance on Type I instances and entire space, the four models are trained on Type I instances (Both the training set and validation set only include Type I instances) like previous studies (Fan et al., 2019), then are evaluated on both Type I instances and entire space. Experimental results are shown in Table 2. We can see that all models across all four datasets obtain much better performance on Type I instances than on entire space in terms of F1 score. For example, The performance gains of the best model IOGbertIOG_{bert} on Rest14-e, Lapt14-e, Rest15-e and Rest16-e are 8.84%, 14.75%, 5.64% and 13.40%, respectively. The increase on Rest15-e is smallest, since the ratio of Type II instances in Rest15-e is smallest (Table 1). Whatever, evaluating TOWE models on Type I instances will overestimate model performance.

4.2. Training on Entire Space

In this section, the four models are trained in two settings: (1) both the training set and validation set only include Type I instances, and (2) both the training set and validation set include both Type I and Type II Instances (i.e. entire space). Then the trained models are evaluated on entire space. Experimental results are shown in Table 3. From Table 3 we draw the following two conclusions. First, all models trained on entire space outperform them trained on Type I instances in terms of F1 score across all four datasets, indicating that training models on entire space can improve the generalization performance of trained models. Second, while models trained on entire space obtain better precision, models trained on Type I instances obtain better recall. The reason is that the additional instances in entire space, i.e. Type II instances, only contain aspect terms without corresponding opinion words and hence help TOWE models to exclude incorrect opinion words for aspect terms, but also exclude some correct opinion words.

IOGbertIOG_{bert} trained on Type I instances and entire space is also evaluated on Type I instances. The results are shown in Table 4. We can see in Table 4 that IOGbertIOG_{bert} trained on Type I instances obtains better performance than IOGbertIOG_{bert} trained on entire space. This indicates that it is necessary to design models which work well on entire space and we leave this for future research.

4.3. Case Study

To further understand the impact of Type II instances on TOWE models, we show the predictions of IOGbert (the best TOWE model in our experiments) trained on Type I instance and entire space on four sentences. The four sentences are from the test set of the Rest14-e dataset. All the first three sentences contain only one aspect term and all the three aspect terms are Type II instances. While IOGbert trained on entire space makes correct inferences on the three instances, IOGbert trained on Type I instances erroneously extracts opinion words for the three aspect terms. In fact, the words that IOGbert trained on Type I instances extracts are not opinion words. The reason is that the entire-space training set of Rest14-e additionally includes Type II instances, some of which are even similar to the instances appearing in the first three sentences. For example, the sentence “food was delivered by a busboy, not waiter” and the aspect term “waiter” in the sentence is a Type II instance in the training set of the Rest14-e. This instance may suggest that IOGbert extract nothing for the aspect term “chef” from the first sentence. Thus, it is essential to train TOWE models on entire space. From the predictions of the fourth sentence, we can see that IOGbert trained on Type I instances prefers to extract more opinion words, while IOGbert trained on entire space prefers to extract less opinion words. This is a case that show why the TOWE models trained on entire space obtain lower Recall than them trained on Type I instances.

5. CONCLUSION

In this paper, we explore the sample selection bias problem in target-oriented opinion words extraction (TOWE) modeling. Specifically, we divide TOWE instances into two types: Type I instances where the aspect terms are associated with at least one opinion word and Type II instances where the aspect terms don’t have opinion words. Previous studies only use Type I instances to train and evaluate their models, resulting in a sample selection bias problem. Training TOWE models only on Type I instances may hurt the generalization performance of TOWE models. Evaluating TOWE models only on Type I instances can’t reflect the performance of TOWE models on real-world scenarios. To validate our hypotheses, we add Type II instances to previous TOWE datasets only including Type I instances. Experimental results on these datasets demonstrate that training TOWE models on entire space including both Type I instances and Type II instances will significantly improve model performance and evaluating TOWE models only on Type I instances will overestimate model performance.

Since the datasets used by aspect sentiment triplet extraction (ASTE) (Peng et al., 2020) are constructed based on the TOWE datasets built by Fan et al. (2019), hence may also have the sample selection bias problem. In the future, we will explore the sample selection bias problem on ASTE.

References

  • (1)
  • Devlin et al. (2018) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  • Fan et al. (2019) Zhifang Fan, Zhen Wu, Xinyu Dai, Shujian Huang, and Jiajun Chen. 2019. Target-oriented opinion words extraction with target-fused neural sequence labeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2509–2518.
  • Feng et al. (2021) Yuhao Feng, Yanghui Rao, Yuyao Tang, Ninghua Wang, and He Liu. 2021. Target-specified Sequence Labeling with Multi-head Self-attention for Target-oriented Opinion Words Extraction. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1805–1815.
  • Hu and Liu (2004) Minqing Hu and Bing Liu. 2004. Mining and Summarizing Customer Reviews. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Seattle, WA, USA) (KDD ’04). Association for Computing Machinery, New York, NY, USA, 168–177. https://doi.org/10.1145/1014052.1014073
  • Jiang et al. (2021) Junfeng Jiang, An Wang, and Akiko Aizawa. 2021. Attention-based Relational Graph Convolutional Network for Target-Oriented Opinion Words Extraction. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, Online, 1986–1997. https://www.aclweb.org/anthology/2021.eacl-main.170
  • Kang et al. (2021) Taegwan Kang, Minwoo Lee, Nakyeong Yang, and Kyomin Jung. 2021. RABERT: Relation-Aware BERT for Target-Oriented Opinion Words Extraction. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 3127–3131.
  • Liu (2012) Bing Liu. 2012. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies 5, 1 (2012), 1–167.
  • Ma et al. (2018) Xiao Ma, Liqin Zhao, Guan Huang, Zhi Wang, Zelin Hu, Xiaoqiang Zhu, and Kun Gai. 2018. Entire space multi-task model: An effective approach for estimating post-click conversion rate. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 1137–1140.
  • Mensah et al. (2021) Samuel Mensah, Kai Sun, and Nikolaos Aletras. 2021. An Empirical Study on Leveraging Position Embeddings for Target-oriented Opinion Words Extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 9174–9179. https://doi.org/10.18653/v1/2021.emnlp-main.722
  • Nasukawa and Yi (2003) Tetsuya Nasukawa and Jeonghee Yi. 2003. Sentiment Analysis: Capturing Favorability Using Natural Language Processing. In Proceedings of the 2nd International Conference on Knowledge Capture (Sanibel Island, FL, USA) (K-CAP ’03). Association for Computing Machinery, New York, NY, USA, 70–77. https://doi.org/10.1145/945645.945658
  • Peng et al. (2020) Haiyun Peng, Lu Xu, Lidong Bing, Fei Huang, Wei Lu, and Luo Si. 2020. Knowing what, how and why: A near complete solution for aspect-based sentiment analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8600–8607.
  • Pontiki et al. (2016) Maria Pontiki, Dimitris Galanis, Haris Papageorgiou, Ion Androutsopoulos, Suresh Manandhar, Mohammad AL-Smadi, Mahmoud Al-Ayyoub, Yanyan Zhao, Bing Qin, Orphée De Clercq, Véronique Hoste, Marianna Apidianaki, Xavier Tannier, Natalia Loukachevitch, Evgeniy Kotelnikov, Nuria Bel, Salud María Jiménez-Zafra, and Gülşen Eryiğit. 2016. SemEval-2016 Task 5: Aspect Based Sentiment Analysis. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). Association for Computational Linguistics, San Diego, California, 19–30. https://doi.org/10.18653/v1/S16-1002
  • Pontiki et al. (2015) Maria Pontiki, Dimitris Galanis, Haris Papageorgiou, Suresh Manandhar, and Ion Androutsopoulos. 2015. SemEval-2015 Task 12: Aspect Based Sentiment Analysis. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015). Association for Computational Linguistics, Denver, Colorado, 486–495. https://doi.org/10.18653/v1/S15-2082
  • Pontiki et al. (2014) Maria Pontiki, Dimitris Galanis, John Pavlopoulos, Harris Papageorgiou, Ion Androutsopoulos, and Suresh Manandhar. 2014. SemEval-2014 Task 4: Aspect Based Sentiment Analysis. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014). Association for Computational Linguistics, Dublin, Ireland, 27–35. https://doi.org/10.3115/v1/S14-2004
  • Pouran Ben Veyseh et al. (2020) Amir Pouran Ben Veyseh, Nasim Nouri, Franck Dernoncourt, Dejing Dou, and Thien Huu Nguyen. 2020. Introducing Syntactic Structures into Target Opinion Word Extraction with Deep Learning. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 8947–8956. https://doi.org/10.18653/v1/2020.emnlp-main.719
  • Wu et al. (2020) Zhen Wu, Fei Zhao, Xin-Yu Dai, Shujian Huang, and Jiajun Chen. 2020. Latent Opinions Transfer Network for Target-Oriented Opinion Words Extraction. arXiv preprint arXiv:2001.01989 (2020).
  • Zadrozny (2004) Bianca Zadrozny. 2004. Learning and evaluating classifiers under sample selection bias. In Proceedings of the twenty-first international conference on Machine learning. 114.