You talk what you read: Understanding News Comment Behavior
by Dispositional and Situational Attribution

Yuhang Wang Yuxiang Zhang Dongyuan Lu Jitao Sang

Abstract

Many news comment mining studies are based on the assumption that comment is explicitly linked to the corresponding news. In this paper, we observed that users’ comments are also heavily influenced by their individual characteristics embodied by the interaction history. Therefore, we position to understand news comment behavior by considering both the dispositional factors from news interaction history, and the situational factors from corresponding news. A three-part encoder-decoder framework is proposed to model the generative process of news comment. The resultant dispositional and situational attribution contributes to understanding user focus and opinions, which are validated in applications of reader-aware news summarization and news aspect-opinion forecasting.

1 Introduction

Increasingly more people express their opinions on news articles through online services recently, such as news portals and microblogs. The resulting vast number of comments clearly reflect thoughts and feelings of individuals. Mining these comments thus has important applications with practical socio-political and economical benefits. Hou et al. (2017) Boltužić and Šnajder (2014)

Existing related work has explored comment data in different scenarios. One typical line of studies Pontiki et al. (2016) Peng et al. (2020) Yan et al. (2021) concentrated on Sentiment Analysis tasks, especially Aspect-based Sentiment Analysis(ABSA) which aims to identify the aspect term, its corresponding sentiment polarity, and the opinion term. Another research line was based on the interaction between news and comments. A fundamental assumption for these studies is that comments have clear correspondence with certain aspects of the corresponding news. Based on the explored news-comments correspondence, Hou et al. (2017) aligned comments to news topics, which improves readers’ news browsing experience, Yang et al. (2020) introduced a new task which leverages reading and commenting history to predict a user’s future opinions to unseen news, and researchers also developed automatic news commenting algorithms to encourage user engagement and interactions Qin et al. (2018). For example, Wang et al. (2021) incorporated reader-aware factors to generat diversified comments. Li et al. (2019) modeled the news as a topic interaction graph to capture the main point of the article, which enhances the correspondence between generated comments and news.

However, sometimes comments not explicitly link to the corresponding news. Table 1(top) illustrates an example of news and associated comments that Wuhan’s patients with COVID-19 in the hospital are fully recovered.

Title: Live screen! The last batch of COVID-19 patients in Wuhan have been discharged from hospital.

Body: “Finally!” On April 26, a COVID-19 patient surnamed Ding was discharged from Wuhan Pulmonary Hospital, and #all hospitalized COVID-19 patients in Wuhan were cleared#. Netizen: The day of wuhan’s recovery is the Chinese New Year.

Comment:

User-1: Congratulations! This is a memorable day!

User-2: The last batch of Jiangsu Medical teams to aid Hubei went home. Thank you.

User-3: Great China!

Partial news-comment history of User-2:

news: Jiangsu launched a level 1 public health emergency response to prevent the spread of the virus.

comment: Each student has been screened in my daughter’s school today.

\hdashline[2.5pt/5pt] \hdashline[2.5pt/5pt] news: The second group of medical workers from Jiangsu province relay to Wuhan.

comment: Salute to the most beautiful people.

\hdashline[2.5pt/5pt] \hdashline[2.5pt/5pt] news: Four cases of pneumonia caused by the COVID-19 were confirmed in Jiangsu province, all with recent travel history to Wuhan.

comment: Suzhou is the first city in Jiangsu province to find (confirmed cases).

Table 1: Comment from NetEase: (top) an example news with comments; (bottom) user2’s reading and commenting history. News and comments are originally in Chinese and translated to English.

The comments about “memorable day” and “China” link entities to the corresponding news, however, the comments about “Jiangsu” from user-2 has no explicit correspondence. By retrieving user-2’s news interaction (i.e., reading and commenting) history as shown in Table 1 (bottom), we find that he/she heavily concerns about the topics related to “Jiangsu”, which gives rise to the above comment combining the news’ topics on COVID-19 and user-2’s individual focus on “Jiangsu”.

To further investigate whether the above phenomenon is common, we study how comment entities distribute in corresponding news and their interaction history on 85,179 comments from 1,275 users on NetEase News¹¹1https://news.163.com/. Table 2 shows the distribution of comments with respect to the entity linking. We observe that 34% comments have no key entities clearly linked to the corresponding news, among which nearly 2/3 (20%) appear only in users’ interaction history. The result demonstrates that user comment is related to not only the corresponding news but also user’s individual characteristics embodied by the interaction history.

Inspired by this, in this paper, we position to understand news comment behavior by modeling both user’s interaction history and the corresponding news. According to the attribution theory Heider (2013) Heider and Simmel (1944), human behavior attribution can be divided into dispositional attribution(e.g. emotions, attitudes, abilities, etc.) and situational attribution(e.g. event or external pressure etc.). Intuitively, in the news comments scenario, mining interaction history and the corresponding news contribute to dispositional and situational attribution respectively. Based on the above analysis and conclusion, we develop a three-part generative framework named DS-Attributor to understand news comment behavior by Dispositional and Situational Attribution. The first part is Dispositional Factor Encoder to model individual characteristics with both aspect and opinion user topic preferences. The second part is Situational Factor Encoder exploiting the user-derived aspect topics from dispositional factor to detect the focused aspects of specific news. Finally, the mined opinion topics of dispositional factor and the detected situational factor are integrated into the Dynamic Comment Decoder module to generate comments.

Entities of comments appear in	Percentage
only corresponding news	21%
corresponding news & history	55%
only history	20%
neither	14%

Table 2: Distribution of news-comment correspondence

Contributions. We summarize the main contributions of this paper as follows:

•

We position the problem of understanding news comment behavior by both situational and dispositional attribution.
•

We propose a novel encoder-decoder framework to model the comment generation process by combining the comment history and corresponding news.
•

The resultant dispositional and situational attribution is validated to enable applications like news aspect-opinion forecasting and reader-aware news summarization.

2 Notations and Problem Definition

Our goal is to understand news comment behavior by dispositional and situational attribution through a generative framework. Specifically, given a user, the model needs to use his/her historical comments to mine dispositional factors and detect the situational factors from a specific piece of news, then generates comment with the mined dispositional and situational factors. Let $\mathcal{U}=\{u_{1},u_{2},...,u_{N}\}$ denote a set of users. For each user $u_{n}\in\mathcal{U}$ , assume $\mathcal{P}_{u_{n}}=\{Y_{u_{n},1},Y_{u_{n},2},\cdots,Y_{u_{n},t},\cdots,Y_{u_{n},T_{u_{n}}}\}$ to include all comments $u_{n}$ posted before timestep $T_{u_{n}}$ . The comment $Y_{u_{n},t}$ denotes a sequence of words as $\{y_{1},y_{2},\cdots,y_{l}\}$ , where $y_{i}\in\mathcal{V}$ , and $l$ is the number of words in $Y_{u_{n},t}$ . Let $X$ denotes a piece of news which haven’t been read by $u_{n}$ before $T_{u_{n}}$ as $\{t,b_{1},b_{2},\cdots,b_{m}\}$ includes news title $t$ and $m$ sentences of news body. Based on the above notations, we formally define the problem as follows:

Problem 1(Dispositonal and Situational Comment Attribution) Given the historical comments $\mathcal{P}_{u_{n}}$ of user $u_{n}\in\mathcal{U}$ and a specific piece of news $X$ , the goal of Dispositonal and Situational Comment Attribution is: (1) to mine dispositional factor which includes preferences regarding both aspect and opinion topics from $\mathcal{P}_{u_{n}}$ , (2) to detect situational factor from news $X$ , (3) to generate comment $Y$ on news $X$ based on dispositional and situational factors.

Refer to caption — Figure 1: Overall architechture of the proposed DS-Attributor.

3 Methodology

We present the overall framework DS-Attributor in Figure 1, which includes three main modules. The Dispositional Factor Mining Encoder aims at modeling dispositional factors involving with both aspect and opinion topic preferences $pf_{a}$ , $pf_{s}$ from users’ historical comments (see Section 3.1). In Situational Factor Encoder, given aspect topic preference $pf_{a}$ and news $X$ , the goal is to get representation $s_{i}$ for each news sentence, and measure the corresponding importance $g_{i}$ by a weighted aspect vetor $v_{a}$ (see Section 3.2). Finally, in Dynamic Comment Decoder, users opinion vector $v_{s}$ is obtained and incorporated with $s_{i}$ and $g_{i}$ of each sentence to generate the observed comment (see Section 3.3). We will elaborate the details of each module below.

3.1 Dispositional Factor Encoder

In this subsection, our goal is to mine dispositional factor from users’ historical comments. The comment is mainly composed of aspect and opinon terms Pontiki et al. (2016), for example, in the sentence Great China!, the aspect term is “China”, and the opinion term is “Great”. Therefore, we model the dispositional factor as the user’s preferences of aspect and opinion.

Comment Disentanglement. We first pretrain a Comment Disentanglement module based on Neural Topic Model Dieng et al. (2020), which can be used to extract aspect and opinion topic distributions from the comment. In addition, the aspect topic vectors $V_{a}\in\mathcal{R}^{k_{a}\times d}$ and opinion topic vectors $V_{s}\in\mathcal{R}^{k_{s}\times d}$ also can be obtained, where $d$ is the dimension of topic vetors, $k_{a}$ and $k_{s}$ are numbers of aspect and opinion topic respectively. Specifically, as shown in the left of Figure 1, for each comment, we employ Bag-of-Words(BOW) feature vector $y$ to represent it. Then we use two parallel VAE-based structures to reconstruct aspect BOW target $\widehat{y}_{a}$ and opinion BOW target $\widehat{y}_{s}$ respectively. $\widehat{y_{a}}$ and $\widehat{y_{s}}$ are defined as:

	$\displaystyle\widehat{y}_{a,i}$	$\displaystyle=\left\{\begin{array}[]{rl}y_{i}&\text{if the word corresponding to $y_{i}$}\\ &\text{is an entity},\\ 0&\text{else }.\\ \end{array}\right.,$		(4)
	$\displaystyle\widehat{y}_{s,i}$	$\displaystyle=\left\{\begin{array}[]{rl}y_{i}&\text{if the word corresponding to $y_{i}$}\\ &\text{is an adjective or emotional word},\\ 0&\text{else }.\\ \end{array}\right.,$		(8)

where $y_{i}$ , $\widehat{y}_{a,i}$ , $\widehat{y}_{s,i}$ are elements of $y$ , $\widehat{y}_{a}$ , $\widehat{y}_{s}$ respectively. During inference, given a comment BOW feature vector, we can get both aspect topic distribution $z_{a}$ and opinion topic distribution $z_{s}$ .

Historical Aspect-Opinion Modeling. For each user $u_{n}$ , the historical sequences of both aspect and opinion topic distributions can be obtained by performing comment disentanglement on the historical comments $\mathcal{P}_{u_{n}}$ . Denote $[z^{1}_{a},z^{2}_{a},\cdots,z^{t}_{a},...,z^{T}_{a}]$ , $[z^{1}_{s},z^{2}_{s},\cdots,z^{t}_{s},...,z^{T}_{s}]$ as the derived historical sequences of aspect and opinion topic distributions. We introduce two different LSTMs to process the above distribution sequences and output user preferences of aspect topic $pf_{a}$ and opinion topic $pf_{s}$ respectively. Specifically, the $t$ -th hidden states are given by:

	$\displaystyle pf^{t}_{a}$	$\displaystyle=LSTM_{a}(z^{t}_{a},pf^{t-1}_{a}),$		(9)
	$\displaystyle pf^{t}_{s}$	$\displaystyle=LSTM_{s}(z^{t}_{s},pf^{t-1}_{s}).$		(10)

After recursive updating, we encode user’s propositional preference in user-aspect topic $pf_{a}$ and user-opinion topic $pf_{s}$ .

3.2 Situational Factor Encoder

In this subsection, our goal is to detect situational factor from the news. Since not all sentences contribute equally to motivate users to comment, we introduced an attention-based method using weighted aspect vector $v_{a}$ to measure importance of news sentences with respect to a specific user.

Hierarchical News Encoder. Firstly, the news $X$ is embedded as $v_{X}$ by a hierachical news encoder. Assume the news $X$ contains $L$ sentences and each sentence contains $N$ words. $w_{i,t}$ with $t\in[1,N]$ represents the words in the $i$ th sentence. Given a sentence, we first use Bi-LSTM to get word embeddings from both directions for words as:

\displaystyle s_{i,t}

\displaystyle=\text{Bi-LSTM}(w_{i,t},s_{i,t-1}).

(11)

An attention mechanism Vaswani et al. (2017) is introduced to aggregate the representation of those informative words to form a sentence vector as follows:

$\displaystyle s_{i}=$	$\displaystyle\sum\nolimits_{t}\alpha_{i,t}s_{i,t},$	(12)
$\displaystyle\alpha_{i,t}=$	$\displaystyle softmax(u^{\top}_{i,t}{u_{w}}),$	(13)
$\displaystyle u_{i,t}=$	$\displaystyle tanh({W_{w}}s_{i,t}+{b_{w}}).$	(14)

To get the news embedding, we aggregate the sentence vectors by attentive pooling as:

$\displaystyle v_{X}=$	$\displaystyle\sum\nolimits_{i}\beta_{i}s_{i},$	(15)
$\displaystyle\beta_{i}=$	$\displaystyle softmax(u^{\top}_{i}{u_{s}}),$	(16)
$\displaystyle u_{i}=$	$\displaystyle tanh({W_{s}}s_{i}+{b_{s}}).$	(17)

Importance Measurement. Since $pf_{a}$ reflects the user’s aspect preference for news content, we employ it to analyse the importance of sentences in the news. Specifically, we first take $pf_{a}$ and $v_{X}$ as inputs to predict aspect topic distribution $\widehat{z}_{a}$ , and the weighted aspect vector $v_{a}$ is then calculated as:

	$\displaystyle v_{a}=$	$\displaystyle\sum_{j}\nolimits V_{a}(j)\widehat{z}_{a}(j),$		(18)
	$\displaystyle\widehat{z}_{a}=$	$\displaystyle softmax({W_{z}}[pf_{a};v_{X}]),$		(19)

where $V_{a}(j)$ is the $j$ th aspect vector obtained from Comment Disentanglement module. Note that in the training stage, the true aspect topic distribution $z_{a}$ is available. During inference, we predict aspect topic distribution by Eqn. (19). So in order to learn ${W_{z}}$ during the training stage, a KL term is added into the final loss function:

\displaystyle\bm{\mathcal{L}_{a}}=D_{KL}(\widehat{z}_{a}||z_{a}).

(20)

Similarly, $\mathcal{L}_{s}$ =D_KL(^z_s——z_s) $isusedtoconstrain$ ^z_s $.Then,weemployanotherattentionmechanismtomeasureimportanceofsentencesusingaspectvector.Theimportancescore$ g_i ∈[0,1] $foreachsentencecanbeobtainedby:\begin{aligned} g_{i}=&softmax(u^{\prime\top}_{i}{u^{\prime}_{s}}),\\ u^{\prime}_{i}=&tanh({W_{g}}s_{i}+{b_{g}}).\end{aligned}\par$

3.3 Dynamic Comment Decoder

Considering the opinion topic preference, we design a comment decoder to dynamically integrate the opinion vector and the news context vector to generate comments. The formula of the decoder is defined by:

\displaystyle state_{t}=LSTM_{dec}([c^{X}_{t};e(y_{t}-1);M_{t}],state_{t-1}),

(21)

where $[\cdot;\cdot]$ denotes vector concatenation. The comment word is then sampled from output distribution based on the concatenation of decoder state as:

\tilde{y}_{t}\sim softmax({W_{y}}(state_{t})).

(22)

During training, cross-entropy loss $\mathcal{L}_{ce}$ is employed as the optimization objective. The decoder takes the embedding of the previously decoded word $e(y_{t}-1)$ , the context vector $c^{X}_{t}$ and the dynamic opinion state $M_{t}$ as input to update its state. The context vector $c^{X}_{t}$ is a weighted sum of encoder’s sentence representations calculated by:

$\displaystyle c^{X}_{t}=$	$\displaystyle\sum\nolimits_{i}\alpha_{t,i}s_{i},$	(23)
$\displaystyle\alpha_{t,i}=$	$\displaystyle\frac{g_{i}\odot e_{t,i}}{\sum_{j}g_{j}\odot e_{t,j}},$	(24)
$\displaystyle e_{t,i}=$	$\displaystyle softmax(state_{t-1}{{W_{\alpha}}}s_{i}).$	(25)

The dynamic opinion state $M_{t}$ is initialized by the opinion vetor $v_{s}$ which is calculated similar with $v_{a}$ (see Eqn. (19)), and decays by a certain amount at each time step. This process is described as:

$\displaystyle M_{t}=$	$\displaystyle g^{u}_{t}\odot M_{t-1},$	(26)
$\displaystyle g^{u}_{t}=$	$\displaystyle sigmoid({W_{o}}(state_{t})),$	(27)
$\displaystyle M_{0}=$	$\displaystyle v_{s},$	(28)
$\displaystyle\widehat{z}_{s}=$	$\displaystyle softmax({W_{s}}[pf_{s};\sum\nolimits_{i}g_{i}s_{i}]),$	(29)

where $\odot$ denotes element-wise multiplication. Once the decoding process is completed, the opinion state is expressed on context vector completely, and the comment is generated.

The above three modules are jointly trained with the following overall loss function

\displaystyle\mathcal{L}=\mathcal{L}_{ce}+\lambda_{1}\mathcal{L}_{a}+\lambda_{2}\mathcal{L}_{s},

(30)

where $\lambda_{1}$ , $\lambda_{2}$ are hyperparameters to balance between different modules. Then we jointly train all components according to Eqn. (30). After dispositional and situational comment attribution, we can obtain $\widehat{z}_{a}$ , $\widehat{z}_{s}$ , $g_{i}$ and the decoder attention $e_{i}(\text{the mean of }e_{t,i})$ to support the following experiments and applications.

4 Experiments

4.1 Experiments Setup

Datasets. Since existing news datasets do not include user interaction history to satisfy situational attribution, we construct a new dataset DS-News from NetEase News, one of the most popular online news platfrom in China. We set 10 random users and crawl users who commented on the same news article using Breadth First Search. For a specific user, we crawl his/her interaction history which consists of a sequence of news-comment pairs. After removing users with too short interaction history, 1,275 examined users are collected with totally 124,918 comments. Table 1(top) visually shows a news comment instance. The statistics of DS-News are summarized as shown in Table 3.

Dataset attributes	number
Total number of users	1,275
Total number of news	97,937
Total number of comments	124,918
Avg. length of user histories	97.63
Avg. number of news words	382.77
Avg. number of comment words	17.10

Table 3: Statistics of DS-News

Compared Methods. In order to evaluate the effectiveness of the proposed DS-Attributor, we implemented the following baselines and DS-Attributor variants for comparison in terms of the news comment generation task.

•

Seq2seq(Qin et al. (2018)): this model follows the framework of seq2seq model with attention. We use the title together with the content as input.
•

Hierarchical-Attention(Yang et al. (2016)): this model takes all the content sentences as input and applies hierarchical attention as the encoder to get the sentence vectors and document vector. A RNN decoder with attention is applied. The document vector is used as the initial state for RNN decoder.
•

Graph2seq(Li et al. (2019)): this model constructs the input news as a topic interaction graph, and it takes the GCN as encoder and the LSTM as decoder to generate news comment.
•

DS-Attributor(w/o IM): DS-Attributor without Importance Measurement.
•

DS-Attributor(w/o OV): DS-Attributor without integrating opinon vector $v_{s}$ .

Evaluation protocols. We use BLEU-1, BLEU-2 Papineni et al. (2002), ROUGE-L Lin (2004), CIDErVedantam et al. (2015) and METEOR Banerjee and Lavie (2005) as metrics to evaluate the performance of different models. A popular NLG evaluation tool nlg-eval ²²2https://github.com/Maluuba/nlg-eval is used to compute these metrics.

Implementation details. For pretraining Comment Disentanglement, we use a vocabulary with the top 20k frequent words in the entire data. The number of aspect topics and opinion topics are set to 40 and 6 respectively. The dimensions of the latent topic vectors are both set to 300. We pretrain the model using Adam Kingma and Ba (2014) with learning rate 0.001. For Historical Aspect-Opinion Modeling, we use different two-layer LSTMs with hidden size 64 to model aspect and opinion topics respectively. For sentence encoder, we use a two-layer Bi-LSTM with hidden size 128. For importance measurement, we employ attention mechanism with hidden size 256. We use a two-layer LSTM with hidden size 512 as decoder. For our method, $\lambda_{1}$ and $\lambda_{2}$ are both set to 0.4 ³³3More implementation details and results by tuning the hyperparameters are available in the supplementary material.. The batch size is set to 64. Those parameters are optimized by Adam optimizer with learning rate 0.001 and trained for 200 epochs with learning rate decay.

Methods	BLEU-1	BLEU-2	ROUGE-L	METEOR	CIDEr
Seq2seq	0.101	0.021	0.091	0.046	0.029
Graph2seq	0.108	0.020	0.093	0.044	0.023
Hierarchical-Attention	0.102	0.022	0.092	0.044	0.037
DS-Attributor(w/o IM)	0.118	0.027	0.103	0.051	0.034
DS-Attributor(w/o OV)	0.121	0.027	0.094	0.053	0.034
DS-Attributor	0.125	0.029	0.108	0.054	0.039

Table 4: Evaluation results in terms of news comment generation

4.2 Quantitative Experimental Results

Quantitative evaluation results are shown in Table 4. The proposed DS-Attributor outperforms the baselines on all 5 evaluation metrics. This demonstrates the advantage of exploring the dispositional factors in modeling news comment behaviors. In Table 6 and Table 7, we illustrate some example aspect and opinion topics discovered from news interaction history. We can see that aspect topics describe different focuses and interests of users, and opinion topics help understand user sentimental preferences. Among the baseline methods, Hierarchical-Attention generally performs better than Seq2seq and Graph2seq. A possible reason is that Hierarchical-Attention captured and aggregated the key information in the news through hierarchical attention mechanism.

On all 5 evaluation metrics, DS-Attributor achieved superior performance than the two variants, showing the contribution of important sentence measurement and opinion integration. Key observations include: (1) The performance of DS-Attributor(w/o IM) decreased significantly on BLEU and METEOR, which indicates that leveraging weighted aspect vetor $v_{a}$ is beneficial to remove irrelevant information and detect users’ focused aspects of specific news. (2) When opinion vector is removed, DS-Attributor(w/o OV) performs poorly on ROUGE-L and CIDEr, which shows that mining users’ opinion preference does provide prior information to understand the sentimental tendency and thus help accurately predict comment reaction.

Title: A college in Wuhan apologizes for the requisition of student dormitories: Improper disposal of items will be compensated

Body: The college issued a letter of apology on February 10 in response to the requisition of students’ dormitories. The college received a notice from the city government on February 7, and then requisitioned some dormitories as COVID-19 medical isolation sites by February 9. The college apologized for the improper disposal of students’ belongings and promised to compensate students for any loss of belongings after verification and disinfect the dormitories in the next semester. Experts: medical support. In recent days, a number of university dormitories in Wuhan have been requisitioned as quarantine observation points in response to the COVID-19 outbreak. For students’ personal belongings, many schools said they would be sealed up for special storage.

Seq2Seq: 我就想知道是什么时候的？ (I just want to know when?)

Graph2Seq: 我觉得这就是在黑，因为我觉得是个什么原因 (I think it’s slander because I think it’s a reason)

Hierarchical-Attention: 这是要被封了吗？ (Is this going to be blocked?)

Comment-1: 全国学校都是一个样，都是血泪了 Schools all over the country are the same, sad

Comment-2: 什么时候可以开学？ When can I go to school?

Comment-3: 大逆不道！我想见宿舍！ Outrageous! I want to see the dorm!

Table 5: Illustration of generated comments for one example news: (top) the example news; (middle) generated comments by baseline methods; (bottom) generated comments by the proposed DS-Attributor for three different users. The news is originally in Chinese and translated to English and the generated comments are originally in Chinese and translated to English.

Topic No.	Topic words
Aspect 1	fan, star, hero, entertainment
Aspect 4	player, football, champion, fans, team
Aspect 17	news, society, problem, comments
Aspect 20	virus, human, earth, Black, Wuhan
Aspect 38	teacher, school, student, university
Aspect 39	world, people, politics, protest, danger

Table 6: Example of discovered aspect topics.

Topic No.	Topic words
Opinion 1	like, not bad, nice, pretty, delicious
Opinion 2	hope, protect, isolate, normal, alive
Opinion 3	development, hope, try hard, solve
Opinion 4	no, no way, not enough, disbelief

Table 7: Example of discovered opinion topics.

4.3 Case Study

In order to better understand how dispositional and situational attribution contribute to the comment behavior, as shown in Table 5, we visualize the generated comments for one specific news regarding the requisition of a university in Wuhan as a medical isolation place. Compared to the baselines only leveraging modeling the situational factors, DS-Attributor can generate appropriate comments according to different users considering the dispositional factors mined from their interaction history. It is shown that the generated comments for three example users from DS-Attributor contain diverse and more clear focuses. Regarding comment-3 which expresses complaint about the decision of dormintory requisition, we examine the corresponding situational and dispositional factors. In particular, for situational factor, we highlight the news sentence with blue color that users focus most on with the highest attention value $g_{i}$ . From situational attribution we can see that the user is concerned about dormitories and personal belongings in this news. For dispositional factor, from the aspect and opinion distributions $\widehat{z}_{a}$ , $\widehat{z}_{s}$ , we find that this user has the highest preference for Aspect#38 and Opinion#4 . As shown in Table 6 and Table 7, Aspect#38 talks about school, and Opinion#4 indicates negative sentiment. This gives rise to the dissatisfaction in the comment and helps detect the user’s actual focus in the news.

5 Applications

By situational and dispositional attribution, the proposed DS-Attributor can enable applications other than comment generation. In this section, we introduce two possible applications by employing the learned during situational and dispositional attribution.

5.1 News Aspect-Opinion Forcasting

In this subsection, we introduce a useful application aggregating the predicted comments to forecast the audience’s focus and opinion for future news. We will use an example of news to illustrate this application which describes Vietnam exposed a large-scale protest in the streets of the people. Specifically, for the given news, 200 users are selected as test subjects to predict their focused aspect distribution $\widehat{z}_{a}$ and corresponding opinion distribution $\widehat{z}_{s}$ . For simplicity, we obtain above two topics distribution of 200 users, and analyse the topics with the highest weights respectively.

We observe that most people concentrated on Aspect#39 which talks about social topics(e.g., politics, pretest, etc.). The keywords of generated comments on Aspect#39 are shown in the Figure 2(left), which are closely related to the news content. As for people’s opinions on Aspect#39, we visualize the opinion distribution in the Figure 2(right). Specifically, most people express positive emotions, such as “protect”, “alive” in Opinion#2 and “hope”, “solve” in Opinion#3, and the others express opposition to this matter(e.g., no, disbelief, etc.). Therefore, DS-Attributor provides the possibility to predict people’s reaction before or at the early stage of news release. By examining the users from certain community, By examining the users from a certain community, we can also support fine-grained aspect-opinion forecast. This will enable timely and effective public opinion management.

5.2 Reader-aware News Summarization

DS-Attributor derives users’ aspect preferences as by-product, which helps understand the subjective focus on news. Therefore, instead of objective news summarization as most current studies conduct by only analyzing the correlation between news sentences, we can exploit the derived user aspect preference to support a novel subjective news summarization. Specifically, we introduce subjective user factors into the traditional objective news summarization solution, by fusing the score $g_{i}$ (see section 3.2) and the decoder attention $e_{i}$ (see section 3.3) to update the similarity matrix $w(i,j)$ of standard TextRank Mihalcea and Tarau (2004) as

w(i,j)=\alpha_{1}w_{s}(i,j)+\alpha_{2}w_{g}(i,j)+\alpha_{3}w_{e}(i,j),

(31)

where $w_{s}(i,j)$ is cosine similarity of two sentence vectors, $w_{g}(i,j)$ is defined as

w_{g}(i,j)=\frac{g_{j}w_{s}(i,j)}{\sum_{k}g_{k}w_{s}(i,k)},

(32)

$w_{e}(i,j)$ is defined similar to $w_{e}(i,j)$ , and $\alpha_{1}$ , $\alpha_{2}$ , $\alpha_{3}$ are coefficients. The final sentence importance score is estimated after performing TextRank. With ROUGE-L as the evaluation metric, we compared three summarization strategies on 100 news: (1) Standard TextRank: to extract top-k sentences as summary without reader factors. (2) Single-user: to randomly select one from 20 users’ top-k results $m$ times, and average the ROUGE-L scores. (3) Multi-user: to randomly select $n$ users’ summary results and choose sentences by voting each time, repeat $m$ times and average the ROUGE-L scores.

The evaluation results of different methods are shown in Figure 3.

From the results, we have the following conclusions: (1) reader-aware summarization strategies show better performance than standard TextRank, because subjective reader factor is useful to extract the news article highlight. (2) the multi-reader strategy achieves superior performance when $k$ is small, which shows that the common interest of multiple readers is beneficial to mining the main purpose of the news. Users’ interests disperse as $k$ increases, where multi-user strategy obtains close performance to single-user strategy but still clearly outperforms the standard TextRank-based solution. Note that this evaluation is conducted with news title as the ground-truth. In practical scenarios, by exploiting the dispositional preference for a specific individual or group of users, we can develop applications like customized and even personalized news summarization.

6 Conclusion

In this paper, we have proposed an encoder-decoder framework, DS-Attributor, for modeling the comment generation process by combining both situational and dispositional factors. Following this study, we are working towards the following directions: (1) modeling comment attribution with news event, e.g., associating the discovered global aspect topics to the local news event aspects; (2) exploring more applications by employing the derived situational and dispositional factors, e.g., customized news summarization, comment-driven news recommendation.

References

Banerjee and Lavie [2005] Satanjeev Banerjee and Alon Lavie. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65–72, 2005.
Boltužić and Šnajder [2014] Filip Boltužić and Jan Šnajder. Back up your stance: Recognizing arguments in online discussions. In Proceedings of the First Workshop on Argumentation Mining, pages 49–58, 2014.
Dieng et al. [2020] Adji B Dieng, Francisco JR Ruiz, and David M Blei. Topic modeling in embedding spaces. Transactions of the Association for Computational Linguistics, 8:439–453, 2020.
Heider and Simmel [1944] Fritz Heider and Marianne L. Simmel. An experimental study of apparent behavior. American Journal of Psychology, 57:243–259, 1944.
Heider [2013] Fritz Heider. The psychology of interpersonal relations. Psychology Press, 2013.
Hou et al. [2017] Lei Hou, Juanzi Li, Xiao-Li Li, Jie Tang, and Xiaofei Guo. Learning to align comments to news topics. ACM Transactions on Information Systems (TOIS), 36(1):1–31, 2017.
Kingma and Ba [2014] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
Li et al. [2019] Wei Li, Jingjing Xu, Yancheng He, Shengli Yan, Yunfang Wu, et al. Coherent comment generation for chinese articles with a graph-to-sequence model. arXiv preprint arXiv:1906.01231, 2019.
Lin [2004] Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004.
Mihalcea and Tarau [2004] Rada Mihalcea and Paul Tarau. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing, pages 404–411, 2004.
Papineni et al. [2002] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
Peng et al. [2020] Haiyun Peng, Lu Xu, Lidong Bing, Fei Huang, Wei Lu, and Luo Si. Knowing what, how and why: A near complete solution for aspect-based sentiment analysis. In AAAI, 2020.
Pontiki et al. [2016] Maria Pontiki, Dimitrios Galanis, Haris Papageorgiou, Ion Androutsopoulos, Suresh Manandhar, Mohammad Al-Smadi, Mahmoud Al-Ayyoub, Yanyan Zhao, Bing Qin, Orphée De Clercq, et al. Semeval-2016 task 5: Aspect based sentiment analysis. In International workshop on semantic evaluation, pages 19–30, 2016.
Qin et al. [2018] Lianhui Qin, Lemao Liu, Victoria Bi, Yan Wang, Xiaojiang Liu, Zhiting Hu, Hai Zhao, and Shuming Shi. Automatic article commenting: the task and dataset. arXiv preprint arXiv:1805.03668, 2018.
Vaswani et al. [2017] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
Vedantam et al. [2015] Ramakrishna Vedantam, C Lawrence Zitnick, and Devi Parikh. Cider: Consensus-based image description evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4566–4575, 2015.
Wang et al. [2021] Wei Wang, Piji Li, and Hai-Tao Zheng. Generating diversified comments via reader-aware topic modeling and saliency detection. arXiv preprint arXiv:2102.06856, 2021.
Yan et al. [2021] Hang Yan, Junqi Dai, Xipeng Qiu, Zheng Zhang, et al. A unified generative framework for aspect-based sentiment analysis. arXiv preprint arXiv:2106.04300, 2021.
Yang et al. [2016] Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pages 1480–1489, 2016.
Yang et al. [2020] Fan Yang, Eduard Dragut, and Arjun Mukherjee. Predicting personal opinion on future events with fingerprints. In Proceedings of the International Conference on Computational Linguistics, pages 1802–1807, 2020.

You talk what you read: Understanding News Comment Behavior by Dispositional and Situational Attribution