This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
11email: {yingcc,wuz}@smail.nju.edu.cn, 11email: {daixinyu,huangsj,chenjj}@nju.edu.cn

Opinion Transmission Network for Jointly Improving Aspect-oriented Opinion Words Extraction and Sentiment Classification

Chengcan Ying Authors contributed equally.    Zhen Wufootnotemark:    Xinyu Dai Corresponding author. 0000-0002-4139-7337   
Shujian Huang
   Jiajun Chen
Abstract

Aspect-level sentiment classification (ALSC) and aspect oriented opinion words extraction (AOWE) are two highly relevant aspect-based sentiment analysis (ABSA) subtasks. They respectively aim to detect the sentiment polarity and extract the corresponding opinion words toward a given aspect in a sentence. Previous works separate them and focus on one of them by training neural models on small-scale labeled data, while neglecting the connections between them. In this paper, we propose a novel joint model, Opinion Transmission Network (OTN), to exploit the potential bridge between ALSC and AOWE to achieve the goal of facilitating them simultaneously. Specifically, we design two tailor-made opinion transmission mechanisms to control opinion clues flow bidirectionally, respectively from ALSC to AOWE and AOWE to ALSC. Experiment results on two benchmark datasets show that our joint model outperforms strong baselines on the two tasks. Further analysis also validates the effectiveness of opinion transmission mechanisms.

Keywords:
Aspect-level sentiment classification Aspect-oriented opinion words extraction Opinion transmission network.

1 Introduction

Aspect-based sentiment analysis (ABSA) is a fine-grained sentiment analysis task [11], which analyzes the sentiment or opinions toward a given aspect in a sentence. The task consists of a set of subtasks, including aspect category detection, aspect term extraction, aspect-level sentiment classification (ALSC), and aspect-oriented opinion words extraction (AOWE), etc. Most existing researches perform a certain subtask of ABSA through training machine learning algorithms on labeled data [15, 1, 17]. However, the public corpora of ABSA are all small-scale due to the expensive and labor-intensive manual annotation. Scarce training data limits the performance of data-driven approaches for ABSA. Therefore, an interesting and valuable research question is how to mine and exploit internal connections between ABSA subtasks to achieve the goal of facilitating them simultaneously. In this work, we focus on two subtasks ALSC and AOWE because they are highly mutually indicative. We first introduce them briefly before presenting our motivations.

Refer to caption
Figure 1: An exmaple for the ALSC task and AOWE task. The words in red are two given aspects. The spans in blue are opinion words. The arrows indicate the correspondence between aspects and opinion words.

Aspect-level sentiment classification (ALSC) aims to predict sentiment polarity towards a given aspect in a sentence. As Figure 1 shows, there are two aspects mentioned in the sentence “waiters are unfriendly but the pasta is out of this world.”, namely “waiters” and “pasta”. The sentiments expressed towards each aspect are negative and positive respectively. Different from ALSC, aspect-oriented opinion words extraction (AOWE) is a recently proposed ABSA subtask [3]. The objective of this task is to extract the corresponding opinion words towards a given aspect from the sentence. Opinion words refer to the word/phrase of a sentence used to express attitudes or opinions explicitly. In the example above, “unfriendly” is the opinion word towards the aspect “waiters”, and “out of this world” is the opinion words towards the aspect “pasta”.

It is a common sense that positive opinion words imply positive sentiment polarity, while negative opinion words correspond to negative sentiment polarity. Inspired by this common sense, we can find that the corresponding opinion words toward a given aspect (which AOWE aims at) help infer the corresponding sentiment (which ALSC aims at). Correspondingly, the sentiment determined in ALSC also can provide some clues to help extract polarity-related opinion words for the AOWE task. Therefore, the goals of AOWE and ALSC are mutually indicative and they can benefit each other.

To exploit the above relation of mutual indication, we propose a novel model, Opinion Transmission Network (OTN), to jointly model two tasks of ALSC and AOWE and finally improve them simultaneously. Overall, OTN contains two base modules, namely the attention-based ALSC module and the CNN-based AOWE module, and two tailor-made opinion transmission mechanisms, respectively from AOWE to ALSC and ALSC to AOWE. Specifically, we utilize the extracted results of AOWE as complementary opinions information and inject them into the ALSC module in the form of additional attention. To successfully transmit implicit opinions from ALSC to AOWE, we unearth that the features in attention layer of the ALSC module keep abundant useful aspect-related opinions, which can be utilized to facilitate AOWE. It is worth noting that our proposed model works without requiring simultaneous annotations of AOWE and ALSC on the same data, thus it can be applied in more practical scenarios.

The main contributions of this work can be summarized as follows:

  1. 1.

    To make full use of high-cost labeled data, we are the first to propose exploiting mutual indication between ALSC and AOWE to improve both tasks.

  2. 2.

    To exploit the connection effectively, we propose a joint neural model Opinion Transmission Network (OTN) with two novel opinion transmission mechanisms. During network training, opinion clues in both modules can flow bi-directionally through the interactions.

  3. 3.

    We conduct experiments and analysis on the benchmark datasets. Experiment results confirm that the performance of ALSC and AOWE can be both improved through our designed opinion transmission mechanisms, and our model outperforms strong baselines on two tasks.

2 Preliminary

In this section, we introduce some necessary notations and the task formalizations of the ALSC and AOWE.

2.1 ALSC Formalization

ALSC aims to classify the sentiment of a given aspect in a sentence into one set of pre-defined sentiment categories. Specifically, given a sentence containing nn words s={w1,w2,,wn}s=\{w_{1},w_{2},...,w_{n}\} and an aspect waw_{a} in ss (we notate an aspect as one word waw_{a} for simplicity, and aa is the index of the aspect in the sentence), the task is to assign a label yALSCCy^{ALSC}\in C to an input pair <s,wa><s,w_{a}>, where CC is the set of pre-defined sentiment categories (i.e., positive, negative and neutral).

2.2 AOWE Formalization

AOWE aims at extracting the corresponding opinion words towards a given aspect from a sentence. Different from ALSC, it is formalized as an aspect-oriented sequence labeling task [3]. Given an input pair <s,wa><s,w_{a}>, the task is to assign a label yiAOWE{B,I,O}y_{i}^{AOWE}\in\{B,I,O\} for each word wiw_{i} in the sentence ss. The three labels B, I and O refer to the beginning, inside and outside of an aspect, respectively, and they follow the standard BIO notation used in sequence labeling. The spans composed by the tags BB and II represent the corresponding opinion words of the aspect waw_{a}. It is obvious that a sentence may have different labeling results for different aspects. An example is shown in Table 1.

Table 1: Different labeling results of a sentence when given different aspects. The aspects are highlighted in underline and the opinion words/phrases are in bold.
1. Waiters/O are/O very/O friendly/B and/O the/O pasta/O is/O out/O of/O this/O world/O ./O
2. Waiters/O are/O very/O friendly/O and/O the/O pasta/O is/O out/B of/I this/I world/I ./O

3 Opinion Transmission Network

Opinion transmission network (OTN) aims to exploit the connections between ALSC and AOWE to facilitate both tasks. In this section, we first give an overall description of OTN. Then we introduce the base ALSC module and AOWE module in the OTN model. Finally, we present our tailor-made opinion transmission mechanisms to achieve opinions interaction between two modules.

3.1 Overall Description

Refer to caption
Figure 2: Architecture of Opinion Transmission Network.

Figure 2 shows the overall architecture of Opinion Transmission Network (OTN). It consists of a base ALSC module and a base AOWE module, as well as bidirectional opinion transmission mechanisms, respectively from AOWE to ALSC (AOWE2ALSC) and ALSC to AOWE (ALSC2AOWE). Following most state-of-the-art works [15, 8, 4], we employ a typical attention-based BiLSTM network as our base ALSC module in this work. In terms of AOWE, we adopt CNN as the base module for two considerations. The first reason is that CNN is widely used in various sequence labeling tasks and achieve state-of-the-art results, such name entity recognition [20], Chinese words segmentation [14], and aspect extraction [12]. Secondly, it can work in parallel and has more fast computation efficiency. Additionally, we enhance the ALSC module and the AOWE module with position embeddings [4] to incorporate aspect information.

For the ALSC task, distinguishing the aspect-related opinion words is helpful to predict the sentiment polarity of the aspect. Thus, we design the opinion transmission mechanism AOWE2ALSC to integrate opinion words information from AOWE into ALSC. Specifically, we transform the prediction results of the AOWE module into the form of auxiliary attention, as additional sentiment evidence for the ALSC module.

The ALSC2AOWE mechanism aims to exploit implicit opinion clues of ALSC to improve the AOWE task. In the ALSC module, the attention weights over words can indicate the aspect-related opinion words, while it is low-dimension and easily ignored when incorporated into the AOWE module. Therefore, we step backward and leverage the intermediate features in the attention layer of the ALSC module as latent opinions to improve the AOWE task.

3.2 Base ALSC Module

The base ALSC module is an attention-based BiLSTM network enhanced with position embedding technique. Given a sentence s={w1,w2,,wn}s=\{w_{1},w_{2},\cdots,w_{n}\} and an aspect waw_{a} in ss, we first concatenate the word embedding and position embedding of each word wiw_{i} as the word representation 𝐞i\mathbf{e}_{i}, i.e., 𝐞i=[𝐄word(wi);𝐄pos(li)]\mathbf{e}_{i}=[\mathbf{E}_{word}(w_{i});\mathbf{E}_{pos}(l_{i})]. The lil_{i} indicates the relative distance of the word wiw_{i} to the aspect waw_{a} and is calculated as li=|ia|l_{i}=|i-a|. The 𝐄word\mathbf{E}_{word} and 𝐄pos\mathbf{E}_{pos} respectively represent the word embedding table and position embedding table.

With the enhanced word representations {𝐞1,𝐞2,,𝐞n}\{\mathbf{e}_{1},\mathbf{e}_{2},\cdots,\mathbf{e}_{n}\}, a BiLSTM network is applied to encode them and generate the corresponding hidden states {𝐡1,𝐡2,,𝐡n}\{\mathbf{h}_{1},\mathbf{h}_{2},\cdots,\mathbf{h}_{n}\}. Then we use the aspect representation 𝐞a\mathbf{e}_{a} as query and employ the attention mechanism to capture potential opinion clues for the ALSC task. The attention weight αi\alpha_{i} of the word wiw_{i} is defined as:

𝐡ia\displaystyle\mathbf{h}_{i}^{a} =𝐖e[𝐞i;𝐞a],\displaystyle=\mathbf{W}_{e}[\mathbf{e}_{i};\mathbf{e}_{a}], (1)
ui\displaystyle u_{i} =𝐯utanh(𝐡ia+𝐛u),\displaystyle=\mathbf{v}_{u}\tanh(\mathbf{h}_{i}^{a}+\mathbf{b}_{u}), (2)
αi\displaystyle\alpha_{i} =exp(ui)j=1nexp(uj),\displaystyle=\frac{\exp(u_{i})}{\sum_{j=1}^{n}\exp(u_{j})}, (3)

where 𝐖e\mathbf{W}_{e} denotes the weight matrix, 𝐯u\mathbf{v}_{u} represents the weight vector, and 𝐛u\mathbf{b}_{u} is the bias.

Finally, the aspect-related sentence representation 𝐫a\mathbf{r}_{a} is a weighted sum of context representations 𝐇={𝐡1,𝐡2,,𝐡n}\mathbf{H}=\{\mathbf{h}_{1},\mathbf{h}_{2},\cdots,\mathbf{h}_{n}\}, i.e., 𝐫a=𝐇𝜶\mathbf{r}_{a}=\mathbf{H}\boldsymbol{\alpha}. In the base ALSC module, the representation 𝐫a\mathbf{r}_{a} is fed into a linear layer and a softmax layer to predict the sentiment polarity of the aspect aa in the sentence ss.

3.3 Base AOWE Module

Similarily, the word representation 𝐞i\mathbf{e}_{i} in the base AOWE is obtained by concatenating the word embedding and position embedding of each word wiw_{i}. We then employ a CNN encoder to capture context information in the sequence {𝐞1,𝐞2,,𝐞n}\{\mathbf{e}_{1},\mathbf{e}_{2},\cdots,\mathbf{e}_{n}\} and obtain the corresponding feature vector 𝐜i\mathbf{c}_{i} of the word wiw_{i}:

[𝐜1,𝐜2,,𝐜n]=CNN([𝐞1,𝐞2,,𝐞n],θCNN),[\mathbf{c}_{1},\mathbf{c}_{2},\cdots,\mathbf{c}_{n}]=\mathrm{CNN}([\mathbf{e}_{1},\mathbf{e}_{2},\cdots,\mathbf{e}_{n}],\theta_{\text{CNN}}), (4)

where θCNN\theta_{\text{CNN}} represents the parameters of the CNN encoder.

The CNN encoder consists of 5 CNN layers. Each layer has a set of convolution filters, and each filter can map representations of k continuous words to single feature scalar, where k is the kernel size. ReLU activation is applied to each feature vector. We will present the details of the hyperparameters of the base AOWE module in the experiment settings.

To further incorporate the aspect information, we concatenate the CNN feature vector 𝐜i\mathbf{c}_{i} with the word embedding of the given aspect aa as the final representation of each word wiw_{i}:

𝐫io=[𝐜i;𝐄word(wa)].\mathbf{r}_{i}^{o}=[\mathbf{c}_{i};\mathbf{E}_{word}({w_{a}})]. (5)

Finally, the sequence representations {𝐫1o,𝐫2o,,𝐫no}\{\mathbf{r}_{1}^{o},\mathbf{r}_{2}^{o},\cdots,\mathbf{r}_{n}^{o}\} is fed into a two-layer perceptron and a softmax layer to predict the tag probability distribution for each word in the sentence ss:

𝐲^iAOWE=softmax(Wo1ReLU(Wo2𝐫io)+𝐛o),\mathbf{\hat{y}}_{i}^{AOWE}=\mathrm{softmax}(W_{o1}\mathrm{ReLU}(W_{o2}\mathbf{r}_{i}^{o})+\mathbf{b}_{o}), (6)

where 𝐖o1\mathbf{W}_{o1} and 𝐖o2\mathbf{W}_{o2} are the weight matrices, 𝒃o\boldsymbol{b}_{o} denotes the bias.

3.4 Opinion Transmission: AOWE2ALSC

As we have mentioned, aspect-oriented opinion words in a sentence can provide powerful evidence to infer the corresponding sentiment of the aspect. Therefore, we propose the opinion transmission mechanism AOWE2ALSC to leverage predictions from the AOWE module to help the ALSC module focus on aspect-oriented opinion words, thereby make more comprehensive sentiment predictions.

Specifically, we map the predicted probabilities of BIO tags over each word in the AOWE module to a probability distribution of each word being aspect-related opinion word as follows:

𝐩=softmax([𝐲^1AOWE,𝐲^2AOWE,,𝐲^nAOWE]T𝐖trans),\mathbf{p}=\mathrm{softmax}([\mathbf{\hat{y}}_{1}^{AOWE},\mathbf{\hat{y}}_{2}^{AOWE},\cdots,\mathbf{\hat{y}}_{n}^{AOWE}]^{T}\mathbf{W}_{trans}), (7)

where 𝐖trans3×1\mathbf{W}_{trans}\in\mathbb{R}^{3\times 1} is a weight matrix and maps probabilities of each word being tagged with B,I,OB,I,O to a single score.

Since the probability distribution 𝐩\mathbf{p} can be regarded as an additional attention knowledge from the AOWE moduel, we also merge the context representations 𝐇\mathbf{H} in the ALSC module with 𝐩\mathbf{p}:

𝐫opinion=H𝐩.\mathbf{r}_{opinion}=H\mathbf{p}. (8)

Finally, we concatenate the opinion representation 𝐫opinion\mathbf{r}_{opinion} with the original representation 𝐫a\mathbf{r}_{a} in the ALSC module to predict the aspect-level sentiment:

𝐲^ALSC=softmax(𝐖a[𝐫a;𝐫opinion]+𝐛a)\mathbf{\hat{y}}^{ALSC}=\mathrm{softmax}(\mathbf{W}_{a}[\mathbf{r}_{a};\mathbf{r}_{opinion}]+\mathbf{b}_{a}) (9)

where 𝐖a\mathbf{W}_{a} is the weight matrix and 𝐛a\mathbf{b}_{a} denotes the bias.

3.5 Opinion Transmission: ALSC2AOWE

The base ALSC module can capture some latent aspect-related opinion words through the attention mechanism. However, the attention weight αi\alpha_{i} is the 1-dimension scalar in the ALSC module and easily neglected when we use it to enhance the AOWE module. Therefore, we exploit the attention feature 𝐡ia\mathbf{h}_{i}^{a} in Equation 1 to enrich the context representations of the ALSC module as follows:

𝐫io=[𝐫io;𝐡ia].\mathbf{r}_{i}^{o^{\prime}}=[\mathbf{r}_{i}^{o};\mathbf{h}_{i}^{a}]. (10)

Finally, the enriched context representation 𝐫io\mathbf{r}_{i}^{o^{\prime}} is used to the predict tag of the word wiw_{i} for the AOWE task:

𝐲^iAOWE=softmax(Wo1ReLU(Wo2𝐫io)+𝐛o),\mathbf{\hat{y}}_{i}^{AOWE}=\mathrm{softmax}(W_{o1}\mathrm{ReLU}(W_{o2}\mathbf{r}_{i}^{o^{\prime}})+\mathbf{b}_{o}), (11)

3.6 Training

For the ALSC task, we use cross-entropy loss between predicted sentiment label and the gold sentiment label as the task loss, which is defined as follows:

LALSC=dDi=1|C|𝕀(yALSC=i)logy^iALSC,L^{ALSC}=-\sum_{d\in D}\sum_{i=1}^{|C|}\mathbb{I}(y^{ALSC}=i)\log\hat{y}_{i}^{ALSC}, (12)

where DD indicates all data sample, CC denotes the sentiment label set, and y^iALSC\hat{y}_{i}^{ALSC} is the predicted probability of the input sample belonging to the ii-th sentiment.

In terms of the AOWE task, we define the cross-entropy loss as follows:

LAOWE=dDi=1nj=02𝕀(yiAOWE=j)y^i,jAOWE,L^{AOWE}=-\sum_{d\in D}\sum_{i=1}^{n}\sum_{j=0}^{2}\mathbb{I}(y_{i}^{AOWE}=j)\hat{y}_{i,j}^{AOWE}, (13)

here the tags {O,B,I}\{O,B,I\} are correspondingly converted into labels {0,1,2}\{0,1,2\}, and y^i,jAOWE\hat{y}_{i,j}^{AOWE} denotes the probability that the ii-th word is predicted as the label jj.

Because OTN is a joint model for both tasks of ALSC and AOWE, we minimize the losses LALSCL^{ALSC} and LAOWEL^{AOWE} iteratively to optimize the OTN model.

4 Experiments

4.1 Datasets and Metrics

As aforementioned, the OTN model is a joint model without requiring strict annotations on the same data for the ALSC task and the AOWE task. To verify this, we respectively use the datasets 14res for ALSC and 16res for AOWE. They are respectively derived from SemEval Challenge 2014 task 4 [11] and SemEval Challenge 2016 task 5 [10]. The original SemEval datasets do not provide the annotations of the corresponding opinion words for each aspect. Therefore, [3] annotate aspect-related opinion words for each sample and remove the samples without containing opinion words. Table 2 shows the statistics of the two datasets, the “Opinion” and “Pair” respectively denote the number of opinion words and pairs of aspects and opinion words in Table 2.

Table 2: Statistics of ALSC dataset 14res and AOWE dataset 16res.
ALSC Pos. Neu. Neg. Total AOWE Sentence Aspect Opinion Pair
14res Train 2,164 633 805 3,602 16res Train 1,079 1,512 1,661 1,770
Test 728 196 196 1,120 Test 329 457 485 525

We adopt widely-used evaluation metrics for the two tasks. For ALSC, we use accuracy and macro-F1 score as evaluation metrics [1, 5]. For AOWE, we follow the previous work [3] and use precision, recall, and F1-score to measure the performance of different methods. An opinion word/phrase is deemed to be correct on the condition that the starting and ending positions of the prediction are both the same as those of the golden word/phrase.

4.2 Experiment Settings

We use 300-dimension GloVe [9] word embeddings pre-trained from 840B tokens to initialize word vectors, which are fixed during the training stage. The position embeddings are 100-dimension vectors and randomly initialized by a uniform distribution U(0.01,0.01)U(-0.01,0.01). The dimension of the LSTM cells is 400. Table 3 shows the hyperparameters of the CNNs in the AOWE module. We adopt dropout on the embedding layer and the output layer with probability 0.5. Adam optimizer [7] is applied to update model parameters. The initial learning rate is 1e-3 and the mini-batch size is 16. We randomly select 20% samples from training sets as the validation sets for tuning hyperparameters and early stopping. We report the average results of 5 repeated experiments for each model.

Table 3: Hyperparameters of the CNNs in the AOWE module.
Layer No. Filter length kk Filter numbers
1 1 600
2 2 200
3 200
4 200
3 5 600
4 5 600
5 5 600

4.3 Compared Methods

We compare our OTN model with the following methods for ALSC and AOWE.

4.3.1 ALSC

We divide the compared ALSC methods into three groups for brevity.

  • ATAE-LSTM [15], IAN [8] and PBAN [4] are attention-based methods.

  • MemNN [13], RAM [1], and CEA [18], DAuM [19] are all memory-based metheds.

  • GCAE is a CNN-based model. It proposes a novel Gated Tanh-ReLU Units to selectively output the sentiment features according to the given aspect [17].

Besides, we also report the performance of our base ALSC module. It is an attention-based BiLSTM network enhanced with position embedding.

4.3.2 AOWE

We compare our method with five baselines for AOWE.

  • Dependency-rule uses POS tags of the dependency path between aspect and opinion words from training set as rule templates to detect the corresponding opinion words for the given aspects [21].

  • BiLSTM uses word embedding to represent words, then employs a BiLSTM network to capture context information. Finally, the context representations are used to predict the tags for words.

  • PE-BiLSTM employs additional position embeddings based on BiLSTM to represent relative positions of words to the given aspect [16].

  • IOG employs six different positional and directional LSTM networks to extract aspect-related opinion words and achieves state-of-the-art resutls [3].

4.4 Main Results

Table 4: Main experiment results (%). Best results are in bold.
Model ALSC AOWE
Acc. F1 P R F1
ATAE-LSTM 78.38 66.36 - - -
IAN 78.71 67.71 - - -
PBAN 78.62 67.45 - - -
MemNN 77.69 67.53 - - -
RAM 78.41 68.52 - - -
CEA 78.44 66.78 - - -
DAuM 77.91 66.47 - - -
GCAE 76.09 63.29 - - -
Dependency-rule - - 76.03 56.19 64.62
BiLSTM - - 68.68 70.51 69.57
PE-BiLSTM - - 82.27 74.95 78.43
IOG - - 84.36 79.08 81.60
IOG+CRF - - 84.41 79.43 81.84
Base ALSC module 78.85 67.69 - - -
Base AOWE module - - 82.83 83.25 83.03
OTN 79.50 69.08 86.78 81.11 83.83

The main experiment results of ALSC and AOWE are shown in Table 4.

In terms of the ALSC task, the performance of attention-based and memory-based methods are comparable, while the CNN-based method GCAE performs worst. This shows the importance of modeling long-term dependency for ALSC. Thus we adopt an attention-based BiLSTM as our base ALSC module. By employing position embedding to incorporate aspect information, the base module achieves very competitive performance. Compared to the base module, our joint model OTN achieves further improvements through leveraging the additional opinion information from the AOWE task.

As for the AOWE task, the methods Dependency-rule and BiLSTM both perform poorly. The former uses coarse-grain POS patterns and lacks robustness. The latter fails to consider aspect information and outputs the same results for different aspects in a sentence. In contrast, the aspect-dependent methods PE-BiLSTM and IOG obtain obvious improvements. With the help of CRF, IOG+CRF achieves minor improvements against IOG. Different from LSTM-based methods, our designed base module using CNN and incorporating aspect information produces very competitive results on the AOWE dataset. Nevertheless, our joint model OTN still outperforms it by 0.8% in F1-score.

Benefiting from the two tailor-made opinion transmission mechanisms, OTN performs better than the two base modules, which proves the existence of mutual indication between the ALSC and AOWE tasks. Besides, OTN consistently outperforms other compared methods in both tasks. The comparison validates the effectiveness of our model.

4.5 Ablation Study

Table 5: The experiment results of ablation study.
Model ALSC AOWE
Acc. F1 P R F1
OTN 79.50 69.08 86.78 81.11 83.83
-ALSC task - - 86.56 80.98 83.64
-AOWE task 78.62 67.36 - - -
-AOWE2ALSC 79.31 68.89 85.86 81.59 83.65
-ALSC2AOWE 78.94 67.74 81.67 82.47 82.05

To investigate the effects of the two opinion transmission mechanisms on OTN, we also conduct ablation study. In the experiments of “-ALSC task” and “-AOWE task”, we keep the model architecture unchanged but respectively remove the ALSC data for “-ALSC task” and AOWE data for “-AOWE task’. The “-AOWE2ALSC” and “-ALSC2AOWE” indicate that we remove the AOWE2ALSC mechanism or ALSC2AOWE mechanism from OTN.

Table 5 shows the results of ablation study. We can observe the performance of the model drops when it is trained on a single task or without opinion transmission mechanisms. The observation proves that OTN exploits the connection between the ALSC and AOWE tasks successfully and achieves improvements through our proposed opinion transmission mechanisms.

5 Related Work

5.1 ALSC

Most recent ALSC research utilizes the attention-based networks to capture the latent sentiment clues from the sentence for the given aspect, such as ATAE-LSTM [15], IAN [8] and PBAN [4], etc. On this basis, [13] employs memory network to conduct multi-hop attention to obtain more powerful sentiment clues for detecting the sentiment polarity of the aspect. Following the idea, memory-based methods achieve competitive performance on the ALSC [13, 1, 18, 19]. In addition, CNN [17], capsule network [2], and additional document sentiment data [5] are also applied for this task.

5.2 AOWE

AOWE is a relatively new ABSA subtask. [3] formalizes it as an aspect-oriented sequence labeling task, and designs a state-of-the-art sequence labeling model based on LSTMs. Before them, a few works focus on the pair of aspect and opinion words. [6] proposes a rule-mining method to extract aspect words and regards the nearest adjective of aspect as the corresponding opinion words. [21] uses dependency-tree templates to extract valid aspect-opinion pairs.

6 Conclusion

In ABSA research, Aspect-level sentiment classification (ALSC) and aspect-oriented opinion words extraction (AOWE) are two highly relevant tasks. Previous works usually focus on one of the two tasks and neglect mutual indication between them. In this paper, we propose a novel joint model, Opinion Transmission Network (OTN), to exploit the potential connection between ALSC and AOWE to benefit them simultaneously. In OTN, two tailor-made opinion transmission mechanisms are designed to control opinion clues to flow respectively from ALSC to AOWE and AOWE to ALSC. Experiment results on two tasks validate the effectiveness of our method.

6.0.1 Acknowledgements.

This work was supported by the NSFC (No. 61976114, 61936012) and National Key R&D Program of China (No. 2018YFB1005102).

References

  • [1] Chen, P., Sun, Z., Bing, L., Yang, W.: Recurrent attention network on memory for aspect sentiment analysis. In: EMNLP. pp. 452–461 (2017)
  • [2] Chen, Z., Qian, T.: Transfer capsule network for aspect level sentiment classification. In: ACL. pp. 547–556 (2019)
  • [3] Fan, Z., Wu, Z., Dai, X., Huang, S., Chen, J.: Target-oriented opinion words extraction with target-fused neural sequence labeling. In: NAACL. pp. 2509–2518 (2019)
  • [4] Gu, S., Zhang, L., Hou, Y., Song, Y.: A position-aware bidirectional attention network for aspect-level sentiment analysis. In: COLING. pp. 774–784 (2018)
  • [5] He, R., Lee, W.S., Ng, H.T., Dahlmeier, D.: Exploiting document knowledge for aspect-level sentiment classification. In: ACL. pp. 579–585 (Jul 2018)
  • [6] Hu, M., Liu, B.: Mining and summarizing customer reviews. In: ACM SIGKDD. pp. 168–177 (2004)
  • [7] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: ICLR (2015)
  • [8] Ma, D., Li, S., Zhang, X., Wang, H.: Interactive attention networks for aspect-level sentiment classification. In: IJCAI. pp. 4068–4074 (2017)
  • [9] Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: EMNLP. pp. 1532–1543 (2014)
  • [10] Pontiki, M., Galanis, D., Papageorgiou, H., Androutsopoulos, I., Manandhar, S., Al-Smadi, M., Al-Ayyoub, M., Zhao, Y., Qin, B., De Clercq, O., et al.: Semeval-2016 task 5: Aspect based sentiment analysis. In: (SemEval 2016) (2016)
  • [11] Pontiki, M., Galanis, D., Pavlopoulos, J., Papageorgiou, H., Androutsopoulos, I., Manandhar, S.: SemEval-2014 task 4: Aspect based sentiment analysis. In: (SemEval 2014). pp. 27–35 (Aug 2014)
  • [12] Poria, S., Cambria, E., Gelbukh, A.: Aspect extraction for opinion mining with a deep convolutional neural network. Knowledge-Based Systems 108, 42–49 (2016)
  • [13] Tang, D., Qin, B., Liu, T.: Aspect level sentiment classification with deep memory network. In: EMNLP. pp. 214–224. Association for Computational Linguistics, Austin, Texas (Nov 2016)
  • [14] Wang, C., Xu, B.: Convolutional neural network with word embeddings for Chinese word segmentation. In: IJCNLP. pp. 163–172 (Nov 2017)
  • [15] Wang, Y., Huang, M., Zhu, X., Zhao, L.: Attention-based lstm for aspect-level sentiment classification. In: EMNLP. pp. 606–615 (2016)
  • [16] Wu, Z., Zhao, F., Dai, X., Huang, S., Chen, J.: Latent opinions transfer network for target-oriented opinion words extraction. In: AAAI. pp. 9298–9305 (2020)
  • [17] Xue, W., Li, T.: Aspect based sentiment analysis with gated convolutional networks. In: ACL. pp. 2514–2523 (2018)
  • [18] Yang, J., Yang, R., Wang, C., Xie, J.: Multi-entity aspect-based sentiment analysis with context, entity and aspect memory. In: AAAI (2018)
  • [19] Zhu, P., Qian, T.: Enhanced aspect level sentiment classification with auxiliary memory. In: COLING. pp. 1077–1087 (2018)
  • [20] Zhu, Y., Wang, G.: Can-ner: Convolutional attention network for chinese named entity recognition. In: NAACL. pp. 3384–3393 (2019)
  • [21] Zhuang, L., Jing, F., Zhu, X.Y.: Movie review mining and summarization. In: CIKM. pp. 43–50 (2006)