11email: {yingcc,wuz}@smail.nju.edu.cn, 11email: {daixinyu,huangsj,chenjj}@nju.edu.cn
Opinion Transmission Network for Jointly Improving Aspect-oriented Opinion Words Extraction and Sentiment Classification
Abstract
Aspect-level sentiment classification (ALSC) and aspect oriented opinion words extraction (AOWE) are two highly relevant aspect-based sentiment analysis (ABSA) subtasks. They respectively aim to detect the sentiment polarity and extract the corresponding opinion words toward a given aspect in a sentence. Previous works separate them and focus on one of them by training neural models on small-scale labeled data, while neglecting the connections between them. In this paper, we propose a novel joint model, Opinion Transmission Network (OTN), to exploit the potential bridge between ALSC and AOWE to achieve the goal of facilitating them simultaneously. Specifically, we design two tailor-made opinion transmission mechanisms to control opinion clues flow bidirectionally, respectively from ALSC to AOWE and AOWE to ALSC. Experiment results on two benchmark datasets show that our joint model outperforms strong baselines on the two tasks. Further analysis also validates the effectiveness of opinion transmission mechanisms.
Keywords:
Aspect-level sentiment classification Aspect-oriented opinion words extraction Opinion transmission network.1 Introduction
Aspect-based sentiment analysis (ABSA) is a fine-grained sentiment analysis task [11], which analyzes the sentiment or opinions toward a given aspect in a sentence. The task consists of a set of subtasks, including aspect category detection, aspect term extraction, aspect-level sentiment classification (ALSC), and aspect-oriented opinion words extraction (AOWE), etc. Most existing researches perform a certain subtask of ABSA through training machine learning algorithms on labeled data [15, 1, 17]. However, the public corpora of ABSA are all small-scale due to the expensive and labor-intensive manual annotation. Scarce training data limits the performance of data-driven approaches for ABSA. Therefore, an interesting and valuable research question is how to mine and exploit internal connections between ABSA subtasks to achieve the goal of facilitating them simultaneously. In this work, we focus on two subtasks ALSC and AOWE because they are highly mutually indicative. We first introduce them briefly before presenting our motivations.

Aspect-level sentiment classification (ALSC) aims to predict sentiment polarity towards a given aspect in a sentence. As Figure 1 shows, there are two aspects mentioned in the sentence “waiters are unfriendly but the pasta is out of this world.”, namely “waiters” and “pasta”. The sentiments expressed towards each aspect are negative and positive respectively. Different from ALSC, aspect-oriented opinion words extraction (AOWE) is a recently proposed ABSA subtask [3]. The objective of this task is to extract the corresponding opinion words towards a given aspect from the sentence. Opinion words refer to the word/phrase of a sentence used to express attitudes or opinions explicitly. In the example above, “unfriendly” is the opinion word towards the aspect “waiters”, and “out of this world” is the opinion words towards the aspect “pasta”.
It is a common sense that positive opinion words imply positive sentiment polarity, while negative opinion words correspond to negative sentiment polarity. Inspired by this common sense, we can find that the corresponding opinion words toward a given aspect (which AOWE aims at) help infer the corresponding sentiment (which ALSC aims at). Correspondingly, the sentiment determined in ALSC also can provide some clues to help extract polarity-related opinion words for the AOWE task. Therefore, the goals of AOWE and ALSC are mutually indicative and they can benefit each other.
To exploit the above relation of mutual indication, we propose a novel model, Opinion Transmission Network (OTN), to jointly model two tasks of ALSC and AOWE and finally improve them simultaneously. Overall, OTN contains two base modules, namely the attention-based ALSC module and the CNN-based AOWE module, and two tailor-made opinion transmission mechanisms, respectively from AOWE to ALSC and ALSC to AOWE. Specifically, we utilize the extracted results of AOWE as complementary opinions information and inject them into the ALSC module in the form of additional attention. To successfully transmit implicit opinions from ALSC to AOWE, we unearth that the features in attention layer of the ALSC module keep abundant useful aspect-related opinions, which can be utilized to facilitate AOWE. It is worth noting that our proposed model works without requiring simultaneous annotations of AOWE and ALSC on the same data, thus it can be applied in more practical scenarios.
The main contributions of this work can be summarized as follows:
-
1.
To make full use of high-cost labeled data, we are the first to propose exploiting mutual indication between ALSC and AOWE to improve both tasks.
-
2.
To exploit the connection effectively, we propose a joint neural model Opinion Transmission Network (OTN) with two novel opinion transmission mechanisms. During network training, opinion clues in both modules can flow bi-directionally through the interactions.
-
3.
We conduct experiments and analysis on the benchmark datasets. Experiment results confirm that the performance of ALSC and AOWE can be both improved through our designed opinion transmission mechanisms, and our model outperforms strong baselines on two tasks.
2 Preliminary
In this section, we introduce some necessary notations and the task formalizations of the ALSC and AOWE.
2.1 ALSC Formalization
ALSC aims to classify the sentiment of a given aspect in a sentence into one set of pre-defined sentiment categories. Specifically, given a sentence containing words and an aspect in (we notate an aspect as one word for simplicity, and is the index of the aspect in the sentence), the task is to assign a label to an input pair , where is the set of pre-defined sentiment categories (i.e., positive, negative and neutral).
2.2 AOWE Formalization
AOWE aims at extracting the corresponding opinion words towards a given aspect from a sentence. Different from ALSC, it is formalized as an aspect-oriented sequence labeling task [3]. Given an input pair , the task is to assign a label for each word in the sentence . The three labels B, I and O refer to the beginning, inside and outside of an aspect, respectively, and they follow the standard BIO notation used in sequence labeling. The spans composed by the tags and represent the corresponding opinion words of the aspect . It is obvious that a sentence may have different labeling results for different aspects. An example is shown in Table 1.
1. | Waiters/O are/O very/O friendly/B and/O the/O pasta/O is/O out/O of/O this/O world/O ./O |
---|---|
2. | Waiters/O are/O very/O friendly/O and/O the/O pasta/O is/O out/B of/I this/I world/I ./O |
3 Opinion Transmission Network
Opinion transmission network (OTN) aims to exploit the connections between ALSC and AOWE to facilitate both tasks. In this section, we first give an overall description of OTN. Then we introduce the base ALSC module and AOWE module in the OTN model. Finally, we present our tailor-made opinion transmission mechanisms to achieve opinions interaction between two modules.
3.1 Overall Description

Figure 2 shows the overall architecture of Opinion Transmission Network (OTN). It consists of a base ALSC module and a base AOWE module, as well as bidirectional opinion transmission mechanisms, respectively from AOWE to ALSC (AOWE2ALSC) and ALSC to AOWE (ALSC2AOWE). Following most state-of-the-art works [15, 8, 4], we employ a typical attention-based BiLSTM network as our base ALSC module in this work. In terms of AOWE, we adopt CNN as the base module for two considerations. The first reason is that CNN is widely used in various sequence labeling tasks and achieve state-of-the-art results, such name entity recognition [20], Chinese words segmentation [14], and aspect extraction [12]. Secondly, it can work in parallel and has more fast computation efficiency. Additionally, we enhance the ALSC module and the AOWE module with position embeddings [4] to incorporate aspect information.
For the ALSC task, distinguishing the aspect-related opinion words is helpful to predict the sentiment polarity of the aspect. Thus, we design the opinion transmission mechanism AOWE2ALSC to integrate opinion words information from AOWE into ALSC. Specifically, we transform the prediction results of the AOWE module into the form of auxiliary attention, as additional sentiment evidence for the ALSC module.
The ALSC2AOWE mechanism aims to exploit implicit opinion clues of ALSC to improve the AOWE task. In the ALSC module, the attention weights over words can indicate the aspect-related opinion words, while it is low-dimension and easily ignored when incorporated into the AOWE module. Therefore, we step backward and leverage the intermediate features in the attention layer of the ALSC module as latent opinions to improve the AOWE task.
3.2 Base ALSC Module
The base ALSC module is an attention-based BiLSTM network enhanced with position embedding technique. Given a sentence and an aspect in , we first concatenate the word embedding and position embedding of each word as the word representation , i.e., . The indicates the relative distance of the word to the aspect and is calculated as . The and respectively represent the word embedding table and position embedding table.
With the enhanced word representations , a BiLSTM network is applied to encode them and generate the corresponding hidden states . Then we use the aspect representation as query and employ the attention mechanism to capture potential opinion clues for the ALSC task. The attention weight of the word is defined as:
(1) | ||||
(2) | ||||
(3) |
where denotes the weight matrix, represents the weight vector, and is the bias.
Finally, the aspect-related sentence representation is a weighted sum of context representations , i.e., . In the base ALSC module, the representation is fed into a linear layer and a softmax layer to predict the sentiment polarity of the aspect in the sentence .
3.3 Base AOWE Module
Similarily, the word representation in the base AOWE is obtained by concatenating the word embedding and position embedding of each word . We then employ a CNN encoder to capture context information in the sequence and obtain the corresponding feature vector of the word :
(4) |
where represents the parameters of the CNN encoder.
The CNN encoder consists of 5 CNN layers. Each layer has a set of convolution filters, and each filter can map representations of k continuous words to single feature scalar, where k is the kernel size. ReLU activation is applied to each feature vector. We will present the details of the hyperparameters of the base AOWE module in the experiment settings.
To further incorporate the aspect information, we concatenate the CNN feature vector with the word embedding of the given aspect as the final representation of each word :
(5) |
Finally, the sequence representations is fed into a two-layer perceptron and a softmax layer to predict the tag probability distribution for each word in the sentence :
(6) |
where and are the weight matrices, denotes the bias.
3.4 Opinion Transmission: AOWE2ALSC
As we have mentioned, aspect-oriented opinion words in a sentence can provide powerful evidence to infer the corresponding sentiment of the aspect. Therefore, we propose the opinion transmission mechanism AOWE2ALSC to leverage predictions from the AOWE module to help the ALSC module focus on aspect-oriented opinion words, thereby make more comprehensive sentiment predictions.
Specifically, we map the predicted probabilities of BIO tags over each word in the AOWE module to a probability distribution of each word being aspect-related opinion word as follows:
(7) |
where is a weight matrix and maps probabilities of each word being tagged with to a single score.
Since the probability distribution can be regarded as an additional attention knowledge from the AOWE moduel, we also merge the context representations in the ALSC module with :
(8) |
Finally, we concatenate the opinion representation with the original representation in the ALSC module to predict the aspect-level sentiment:
(9) |
where is the weight matrix and denotes the bias.
3.5 Opinion Transmission: ALSC2AOWE
The base ALSC module can capture some latent aspect-related opinion words through the attention mechanism. However, the attention weight is the 1-dimension scalar in the ALSC module and easily neglected when we use it to enhance the AOWE module. Therefore, we exploit the attention feature in Equation 1 to enrich the context representations of the ALSC module as follows:
(10) |
Finally, the enriched context representation is used to the predict tag of the word for the AOWE task:
(11) |
3.6 Training
For the ALSC task, we use cross-entropy loss between predicted sentiment label and the gold sentiment label as the task loss, which is defined as follows:
(12) |
where indicates all data sample, denotes the sentiment label set, and is the predicted probability of the input sample belonging to the -th sentiment.
In terms of the AOWE task, we define the cross-entropy loss as follows:
(13) |
here the tags are correspondingly converted into labels , and denotes the probability that the -th word is predicted as the label .
Because OTN is a joint model for both tasks of ALSC and AOWE, we minimize the losses and iteratively to optimize the OTN model.
4 Experiments
4.1 Datasets and Metrics
As aforementioned, the OTN model is a joint model without requiring strict annotations on the same data for the ALSC task and the AOWE task. To verify this, we respectively use the datasets 14res for ALSC and 16res for AOWE. They are respectively derived from SemEval Challenge 2014 task 4 [11] and SemEval Challenge 2016 task 5 [10]. The original SemEval datasets do not provide the annotations of the corresponding opinion words for each aspect. Therefore, [3] annotate aspect-related opinion words for each sample and remove the samples without containing opinion words. Table 2 shows the statistics of the two datasets, the “Opinion” and “Pair” respectively denote the number of opinion words and pairs of aspects and opinion words in Table 2.
ALSC | Pos. | Neu. | Neg. | Total | AOWE | Sentence | Aspect | Opinion | Pair | ||
---|---|---|---|---|---|---|---|---|---|---|---|
14res | Train | 2,164 | 633 | 805 | 3,602 | 16res | Train | 1,079 | 1,512 | 1,661 | 1,770 |
Test | 728 | 196 | 196 | 1,120 | Test | 329 | 457 | 485 | 525 |
We adopt widely-used evaluation metrics for the two tasks. For ALSC, we use accuracy and macro-F1 score as evaluation metrics [1, 5]. For AOWE, we follow the previous work [3] and use precision, recall, and F1-score to measure the performance of different methods. An opinion word/phrase is deemed to be correct on the condition that the starting and ending positions of the prediction are both the same as those of the golden word/phrase.
4.2 Experiment Settings
We use 300-dimension GloVe [9] word embeddings pre-trained from 840B tokens to initialize word vectors, which are fixed during the training stage. The position embeddings are 100-dimension vectors and randomly initialized by a uniform distribution . The dimension of the LSTM cells is 400. Table 3 shows the hyperparameters of the CNNs in the AOWE module. We adopt dropout on the embedding layer and the output layer with probability 0.5. Adam optimizer [7] is applied to update model parameters. The initial learning rate is 1e-3 and the mini-batch size is 16. We randomly select 20% samples from training sets as the validation sets for tuning hyperparameters and early stopping. We report the average results of 5 repeated experiments for each model.
Layer No. | Filter length | Filter numbers |
1 | 1 | 600 |
2 | 2 | 200 |
3 | 200 | |
4 | 200 | |
3 | 5 | 600 |
4 | 5 | 600 |
5 | 5 | 600 |
4.3 Compared Methods
We compare our OTN model with the following methods for ALSC and AOWE.
4.3.1 ALSC
We divide the compared ALSC methods into three groups for brevity.
- •
- •
-
•
GCAE is a CNN-based model. It proposes a novel Gated Tanh-ReLU Units to selectively output the sentiment features according to the given aspect [17].
Besides, we also report the performance of our base ALSC module. It is an attention-based BiLSTM network enhanced with position embedding.
4.3.2 AOWE
We compare our method with five baselines for AOWE.
-
•
Dependency-rule uses POS tags of the dependency path between aspect and opinion words from training set as rule templates to detect the corresponding opinion words for the given aspects [21].
-
•
BiLSTM uses word embedding to represent words, then employs a BiLSTM network to capture context information. Finally, the context representations are used to predict the tags for words.
-
•
PE-BiLSTM employs additional position embeddings based on BiLSTM to represent relative positions of words to the given aspect [16].
-
•
IOG employs six different positional and directional LSTM networks to extract aspect-related opinion words and achieves state-of-the-art resutls [3].
4.4 Main Results
Model | ALSC | AOWE | |||
Acc. | F1 | P | R | F1 | |
ATAE-LSTM | 78.38 | 66.36 | - | - | - |
IAN | 78.71 | 67.71 | - | - | - |
PBAN | 78.62 | 67.45 | - | - | - |
MemNN | 77.69 | 67.53 | - | - | - |
RAM | 78.41 | 68.52 | - | - | - |
CEA | 78.44 | 66.78 | - | - | - |
DAuM | 77.91 | 66.47 | - | - | - |
GCAE | 76.09 | 63.29 | - | - | - |
Dependency-rule | - | - | 76.03 | 56.19 | 64.62 |
BiLSTM | - | - | 68.68 | 70.51 | 69.57 |
PE-BiLSTM | - | - | 82.27 | 74.95 | 78.43 |
IOG | - | - | 84.36 | 79.08 | 81.60 |
IOG+CRF | - | - | 84.41 | 79.43 | 81.84 |
Base ALSC module | 78.85 | 67.69 | - | - | - |
Base AOWE module | - | - | 82.83 | 83.25 | 83.03 |
OTN | 79.50 | 69.08 | 86.78 | 81.11 | 83.83 |
The main experiment results of ALSC and AOWE are shown in Table 4.
In terms of the ALSC task, the performance of attention-based and memory-based methods are comparable, while the CNN-based method GCAE performs worst. This shows the importance of modeling long-term dependency for ALSC. Thus we adopt an attention-based BiLSTM as our base ALSC module. By employing position embedding to incorporate aspect information, the base module achieves very competitive performance. Compared to the base module, our joint model OTN achieves further improvements through leveraging the additional opinion information from the AOWE task.
As for the AOWE task, the methods Dependency-rule and BiLSTM both perform poorly. The former uses coarse-grain POS patterns and lacks robustness. The latter fails to consider aspect information and outputs the same results for different aspects in a sentence. In contrast, the aspect-dependent methods PE-BiLSTM and IOG obtain obvious improvements. With the help of CRF, IOG+CRF achieves minor improvements against IOG. Different from LSTM-based methods, our designed base module using CNN and incorporating aspect information produces very competitive results on the AOWE dataset. Nevertheless, our joint model OTN still outperforms it by 0.8% in F1-score.
Benefiting from the two tailor-made opinion transmission mechanisms, OTN performs better than the two base modules, which proves the existence of mutual indication between the ALSC and AOWE tasks. Besides, OTN consistently outperforms other compared methods in both tasks. The comparison validates the effectiveness of our model.
4.5 Ablation Study
Model | ALSC | AOWE | |||
Acc. | F1 | P | R | F1 | |
OTN | 79.50 | 69.08 | 86.78 | 81.11 | 83.83 |
-ALSC task | - | - | 86.56 | 80.98 | 83.64 |
-AOWE task | 78.62 | 67.36 | - | - | - |
-AOWE2ALSC | 79.31 | 68.89 | 85.86 | 81.59 | 83.65 |
-ALSC2AOWE | 78.94 | 67.74 | 81.67 | 82.47 | 82.05 |
To investigate the effects of the two opinion transmission mechanisms on OTN, we also conduct ablation study. In the experiments of “-ALSC task” and “-AOWE task”, we keep the model architecture unchanged but respectively remove the ALSC data for “-ALSC task” and AOWE data for “-AOWE task’. The “-AOWE2ALSC” and “-ALSC2AOWE” indicate that we remove the AOWE2ALSC mechanism or ALSC2AOWE mechanism from OTN.
Table 5 shows the results of ablation study. We can observe the performance of the model drops when it is trained on a single task or without opinion transmission mechanisms. The observation proves that OTN exploits the connection between the ALSC and AOWE tasks successfully and achieves improvements through our proposed opinion transmission mechanisms.
5 Related Work
5.1 ALSC
Most recent ALSC research utilizes the attention-based networks to capture the latent sentiment clues from the sentence for the given aspect, such as ATAE-LSTM [15], IAN [8] and PBAN [4], etc. On this basis, [13] employs memory network to conduct multi-hop attention to obtain more powerful sentiment clues for detecting the sentiment polarity of the aspect. Following the idea, memory-based methods achieve competitive performance on the ALSC [13, 1, 18, 19]. In addition, CNN [17], capsule network [2], and additional document sentiment data [5] are also applied for this task.
5.2 AOWE
AOWE is a relatively new ABSA subtask. [3] formalizes it as an aspect-oriented sequence labeling task, and designs a state-of-the-art sequence labeling model based on LSTMs. Before them, a few works focus on the pair of aspect and opinion words. [6] proposes a rule-mining method to extract aspect words and regards the nearest adjective of aspect as the corresponding opinion words. [21] uses dependency-tree templates to extract valid aspect-opinion pairs.
6 Conclusion
In ABSA research, Aspect-level sentiment classification (ALSC) and aspect-oriented opinion words extraction (AOWE) are two highly relevant tasks. Previous works usually focus on one of the two tasks and neglect mutual indication between them. In this paper, we propose a novel joint model, Opinion Transmission Network (OTN), to exploit the potential connection between ALSC and AOWE to benefit them simultaneously. In OTN, two tailor-made opinion transmission mechanisms are designed to control opinion clues to flow respectively from ALSC to AOWE and AOWE to ALSC. Experiment results on two tasks validate the effectiveness of our method.
6.0.1 Acknowledgements.
This work was supported by the NSFC (No. 61976114, 61936012) and National Key R&D Program of China (No. 2018YFB1005102).
References
- [1] Chen, P., Sun, Z., Bing, L., Yang, W.: Recurrent attention network on memory for aspect sentiment analysis. In: EMNLP. pp. 452–461 (2017)
- [2] Chen, Z., Qian, T.: Transfer capsule network for aspect level sentiment classification. In: ACL. pp. 547–556 (2019)
- [3] Fan, Z., Wu, Z., Dai, X., Huang, S., Chen, J.: Target-oriented opinion words extraction with target-fused neural sequence labeling. In: NAACL. pp. 2509–2518 (2019)
- [4] Gu, S., Zhang, L., Hou, Y., Song, Y.: A position-aware bidirectional attention network for aspect-level sentiment analysis. In: COLING. pp. 774–784 (2018)
- [5] He, R., Lee, W.S., Ng, H.T., Dahlmeier, D.: Exploiting document knowledge for aspect-level sentiment classification. In: ACL. pp. 579–585 (Jul 2018)
- [6] Hu, M., Liu, B.: Mining and summarizing customer reviews. In: ACM SIGKDD. pp. 168–177 (2004)
- [7] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: ICLR (2015)
- [8] Ma, D., Li, S., Zhang, X., Wang, H.: Interactive attention networks for aspect-level sentiment classification. In: IJCAI. pp. 4068–4074 (2017)
- [9] Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: EMNLP. pp. 1532–1543 (2014)
- [10] Pontiki, M., Galanis, D., Papageorgiou, H., Androutsopoulos, I., Manandhar, S., Al-Smadi, M., Al-Ayyoub, M., Zhao, Y., Qin, B., De Clercq, O., et al.: Semeval-2016 task 5: Aspect based sentiment analysis. In: (SemEval 2016) (2016)
- [11] Pontiki, M., Galanis, D., Pavlopoulos, J., Papageorgiou, H., Androutsopoulos, I., Manandhar, S.: SemEval-2014 task 4: Aspect based sentiment analysis. In: (SemEval 2014). pp. 27–35 (Aug 2014)
- [12] Poria, S., Cambria, E., Gelbukh, A.: Aspect extraction for opinion mining with a deep convolutional neural network. Knowledge-Based Systems 108, 42–49 (2016)
- [13] Tang, D., Qin, B., Liu, T.: Aspect level sentiment classification with deep memory network. In: EMNLP. pp. 214–224. Association for Computational Linguistics, Austin, Texas (Nov 2016)
- [14] Wang, C., Xu, B.: Convolutional neural network with word embeddings for Chinese word segmentation. In: IJCNLP. pp. 163–172 (Nov 2017)
- [15] Wang, Y., Huang, M., Zhu, X., Zhao, L.: Attention-based lstm for aspect-level sentiment classification. In: EMNLP. pp. 606–615 (2016)
- [16] Wu, Z., Zhao, F., Dai, X., Huang, S., Chen, J.: Latent opinions transfer network for target-oriented opinion words extraction. In: AAAI. pp. 9298–9305 (2020)
- [17] Xue, W., Li, T.: Aspect based sentiment analysis with gated convolutional networks. In: ACL. pp. 2514–2523 (2018)
- [18] Yang, J., Yang, R., Wang, C., Xie, J.: Multi-entity aspect-based sentiment analysis with context, entity and aspect memory. In: AAAI (2018)
- [19] Zhu, P., Qian, T.: Enhanced aspect level sentiment classification with auxiliary memory. In: COLING. pp. 1077–1087 (2018)
- [20] Zhu, Y., Wang, G.: Can-ner: Convolutional attention network for chinese named entity recognition. In: NAACL. pp. 3384–3393 (2019)
- [21] Zhuang, L., Jing, F., Zhu, X.Y.: Movie review mining and summarization. In: CIKM. pp. 43–50 (2006)