TieFake: Title-Text Similarity and Emotion-Aware Fake News Detection ^†^†thanks: † Corresponding Author: Zhao Kang, [email protected]

Quanjiang Guo¹, Zhao Kang¹^†, Ling Tian¹, Zhouguo Chen² ¹School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China ²30^th Research Institute of China Electronics Technology Group Corporation, Chengdu, China [email protected], [email protected], [email protected], [email protected]

Abstract

Fake news detection aims to detect fake news widely spreading on social media platforms, which can negatively influence the public and the government. Many approaches have been developed to exploit relevant information from news images, text, or videos. However, these methods may suffer from the following limitations: (1) ignore the inherent emotional information of the news, which could be beneficial since it contains the subjective intentions of the authors; (2) pay little attention to the relation (similarity) between the title and textual information in news articles, which often use irrelevant title to attract reader’ attention. To this end, we propose a novel Title-Text similarity and emotion-aware Fake news detection (TieFake) method by jointly modeling the multi-modal context information and the author sentiment in a unified framework. Specifically, we respectively employ BERT and ResNeSt to learn the representations for text and images, and utilize publisher emotion extractor to capture the author’s subjective emotion in the news content. We also propose a scale-dot product attention mechanism to capture the similarity between title features and textual features. Experiments are conducted on two publicly available multi-modal datasets, and the results demonstrate that our proposed method can significantly improve the performance of fake news detection. Our code is available at https://github.com/UESTC-GQJ/TieFake.

Index Terms:

multi-modal learning, disinformation, social media, sentiment analysis

I Introduction

Nowadays, more and more people consume news through social media. A recent study [1] defines fake news “to be news articles that are intentionally and verifiably false, and could mislead readers.” Moreover, such content is written to deceive someone. An example of such fake news is shown in Figure 1. The image shown in the news is obviously photo-shopped to make it look similar to the news generally featured on a popular news channel like CNN. This image makes people believe that the news is true, but the experimenter himself quashed it (see Figure 2).

Refer to caption — Figure 1: An example of fake news that claims that NASA will pay subject $100,000 to stay in bed for 60 days (politifact.com).

Especially since the 2016 U.S. presidential election, the dissemination of “fake news” has adversely affected the public and the government. Based on a thorough analysis of 126,000 verified real and fraudulent news on Twitter from 2006 to 2017, Vosoughi et al. [2] pointed out that fake news and inaccurate information may spread faster and broader than fact-based news. According to crucial psychological and social science ideas [3], the more fake news articles circulate, the more likely social media users will disseminate and believe them due to repeated exposure or peer pressure. Due to the echo chamber effect, such levels of trust and beliefs can easily be magnified and sustained within social media [4]. Therefore, to prevent the dissemination of fake news on social media, extensive research has been done on the effective identification of fake news.

Fake news detection methods can be roughly grouped into content-based and social-context-based strategies [5]. The main difference between them is whether or not they rely on social context information: the information about how the news has been propagated on social media, where many auxiliary details of social media users involved and their connections/networks can be utilized. Many novel approaches [6, 7, 8, 9] have been proposed to exploit social context and social node information with GNN network. Nevertheless, it is often difficult to detect fake news using these methods when it has been just published and has not been propagated, i.e., no social context information, which motivates us to further explore the news content information in fake news detection.

A news article often contains both textual and visual information. Existing content-based fake news detection methods either solely consider textual information [10] or sentiment information [11, 12], or combine information derived from both modalities, which complement each other in detecting fake news [13, 14, 15]. Though their promising performance, they ignore the latent emotion of authors conveyed by fake news. To attract public attention, the authors prefer to show emotions with strong subjective color whose emotion is far from that in general statements of the news text [16]. The authors also attempt to catch the reader’s attention by using titles that are not relevant to the contents [17]. In light of such considerations, we propose a Title-Text similarity and emotion-aware Fake news detection (TieFake) method. Figure 3 shows the overview architecture of our proposed method. For each news article, we first adopt neural networks to automatically obtain the latent representation of both its textual and visual information. In addition, the subjective emotion of authors are extracted to help predict fake news. We also apply a scaled dot-product attention on title and text, because a huge “gap” always exists between the title and text of fake news when creators articulate or fabricate fake news to support non-factual scenes or statements. Therefore, the proposed method makes full use of the news text, images, the subjective emotion of author and the similarity between title and text to improve the accuracy of fake news detection.

The main contributions of our work are summarized as follows:

•

To our best knowledge, we present the first approach that utilizes the subjective emotion of the news author in detecting fake news;
•

We propose a novel attention mechanism to explore the similarity between title and text;
•

We conduct extensive experiments on two real-world data to demonstrate the effectiveness of our proposed method.

II Related Work

II-A Fake News Detection

Most existing work on fake news detection task treats it as a binary classification problem. Early methods [18, 19] design plenty of hand-crafted features to debunk fake news. These methods train a fake news classifier using text content features. Although these manually chosen features improve the performance of fake news detection, these approaches typically require extensive preprocessing and feature engineering. Recognizing and detecting fake news has become more complex as social media information has exploded in popularity. Researchers have put forth various practical methods [18, 15, 20, 5, 13], which can be briefly reviewed from single-modal (e.g., text or image) and multi-modal perspectives.

Existing methods[18, 21, 20, 5] for single-modal analysis mainly concentrate on extracting textual or visual elements from the news’ text content or image. For example, Yu et al. [22] use convolutional neural networks to obtain high-level interactions and critical features of related news. Recurrent neural networks are used by Ma et al. [19] to learn latent properties from the relevant textual news. In [23], the authors only exploit the rich visual information with different pixel domains and utilize a multi-domain visual neural network to detect fake news. However, social media platforms offer a wealth of multi-modal data (e.g., images, texts, and videos)[24, 25] that can complement each other and contribute to social media analysis[26, 27].

Researchers realize that multi-modal fusion features may be crucial in identifying fake news because of deep neural networks’ enormous success in learning picture and word representations. Multi-modality fake news detection has recently drawn a lot of interest. Several approaches [15, 28, 29, 30, 31, 32] conduct fake news detection based on multimedia content and obtain superior performance. Jin et al.[15] propose a multi-modality-based fake news detection model, which extracts the multi-modality information, including visual, textual, and social context features, and then fuses them by attention mechanism. Khattar et al.[28] propose a multimodal variational autoencoder that learns a shared representation of the text and images. Shivangi et al.[29] make use of the pre-trained BERT to learn text features and apply VGG-19 pre-trained on the ImageNet dataset to learn image features. Wang et al.[31] propose a novel knowledge-driven multimodal graph convolutional network to jointly model textual information, knowledge concepts, and visual information into a unified framework for fake news detection.

Although most existing approaches show promising performance on fake news detection task, they still fail to fully exploit the data. In this paper, we propose to take advantage of the author’s potential emotion and the similarity between the title and text.

II-B Attention Mechanism

Attention mechanisms have been shown to be effective in various tasks such as image captioning [33, 34], machine translation [35] and recommendation system[36]. Bahdanau et al. [35] firstly introduce attention to the machine translation task to allow the model to automatically search for parts of a source sentence that are relevant to predicting a target word. Soon after, Transformer[37] is proposed to solve the sequence-to-sequence problem, replace LSTM with an attention structure, and achieve a state-of-the-art quality score on the neural machine translation task. Recently, attention mechanisms have been incorporated into fake news detection methods. For example, Chen et al.[38] propose a deep attention model based on recurrent neural networks(RNN) to learn selectively temporal hidden representations of sequential posts for identifying fake news. Motivated by the successful applications of the attention mechanism, we introduce a scaled dot-product attention mechanism on title features and textual features to compute the similarity between news title and text.

III Proposed Method

Problem Definition Given a news article $A=\{t,v,e\}$ consisting of textual information $t$ , visual information $v$ , and emotion information $e$ , we denote $\mathbf{R_{T}}\in\mathbb{R}^{d}$ , $\mathbf{R_{V}}\in\mathbb{R}^{d}$ , and $\mathbf{R_{E}}\in\mathbb{R}^{d}$ as the corresponding representations. Our goal is to predict whether $A$ is a fake news article ( $\hat{y}=0$ ) or a true one ( $\hat{y}=1$ ) by determining $\mathcal{M}:(\mathcal{M}_{t},\mathcal{M}_{v},\mathcal{M}_{e})\xrightarrow{(\theta_{t},\theta_{v},\theta_{e})}\hat{y}\in\{0,1\}$ , where $\theta$ are parameters to be learned.

III-A Textual feature extractor

To precisely utilize the semantic meaning of the word and the linguistic contexts, we employ a 12 encoding layers version of Bidirectional Encoder Representations from Transformers (BERT [39]), which takes a sequence of words as input. We model a given text content $t$ as the textual embedding $R_{T_{bert}}$ = $\{\mathbf{T}_{1},\mathbf{T}_{2},\cdots,\mathbf{T}_{m}\}$ (where $m$ denotes the number of words) from the second last output layer of the module, and passes the embedding through a fully-connected layer to reduce down to the final dimension of length 32, i.e., $R_{T}\in\mathbb{R}^{32}$ . The operation of the fully-connected layer in the textual feature extractor can be represented as:

R_{T}=\sigma(W_{vt}\cdot R_{T_{bert}})

(1)

where $\sigma$ is the ReLU activation function and $W_{vt}$ is the weight matrix of the fully connected layer. Similarly, we can use the same method to generate the feature vector of the title $R_{Ti}\in\mathbb{R}^{32}$ .

III-B Visual feature extractor

Given an image $V$ attached to a news text, we employ the pre-trained ResNeSt-50 [40]. On top of the last layer of ResNeSt-50, we add a fully connected layer to adjust the dimension of final visual feature representation to length 30, i.e., $R_{V}\in\mathbb{R}^{30}$ . During the joint training process with the textual feature extractor, the parameters of pre-trained ResNeSt-50 are kept static to avoid overfitting. The operation of the last layer in the visual feature extractor can be represented as:

R_{V}=\sigma(W_{vf}\cdot R_{V_{resnest}})

(2)

where $R_{V_{resnest}}$ is the visual feature representation obtained from pre-trained ResNeSt-50 and $W_{vf}$ is the weight matrix of the fully connected layer in the visual feature extractor.

III-C Emotion feature extractor

To comprehensively represent emotion, we employ publisher emotion extractor [41] to obtain a variety of features from news contents, including the emotion category, emotional lexicon, emotional intensity, sentiment score, and other auxiliary features. The corresponding feature representations are denoted by $R_{E}^{cate}$ , $R_{E}^{lex}$ , $R_{E}^{int}$ , $R_{E}^{senti}$ , and $R_{E}^{aux}$ , respectively. Among them, emotion category, emotional intensity, and sentiment score provide the overall information, while the other two provide word- and symbol-level information. We concatenate all five kinds of features as $R_{E}$ with final dimension of length 38, i.e.,

R_{E}=R_{E}^{cate}\oplus R_{E}^{lex}\oplus R_{E}^{int}\oplus R_{E}^{senti}\oplus R_{E}^{aux}

(3)

III-D Title-Text similarity extractor

In this part, we propose to apply the scaled dot-product attention mechanism on title features and textual features to capture how well the title and text are related to each other in the news. We utilize the attention mechanism for the title and textual features in two directions. Figure 4 shows the general design of our proposed attention mechanism.

When we use information from a text to compare to a title or use the text vector representation $R_{T}$ as Query and title features $R_{Ti}$ as Key and Value. Mathematically, the Query, Key, and Value are defined as:

	$\displaystyle Q$	$\displaystyle=R_{T}\times W_{Q},K=R_{Ti}\times W_{K},V=R_{Ti}\times W_{V}$		(4)
		$\displaystyle W_{Q}\in\mathbb{R}^{d_{T}\times d},W_{K}\in\mathbb{R}^{d_{Ti}\times d},W_{V}\in\mathbb{R}^{d_{Ti}\times d},$		(4)

where $d_{Ti}=d=32$ , $W_{Q}$ , $W_{K}$ , $W_{V}$ are weight matrices.

The output matrix of the scaled dot-product attention applied on $Q$ , $K$ , and $V$ is calculated as:

Att_{T\xrightarrow{}Ti}=softmax(\frac{Q\times K^{T}}{\sqrt{d}})\times V.

(5)

where $Att_{T\xrightarrow{}Ti}$ is the attention’s output matrix when we use text vector representation $R_{T}$ as Query and title features $R_{Ti}$ as Key and Value.

Similarly, when we use information from the title to make comparison to the text, we obtain $Att_{Ti\xrightarrow{}T}$ , in which the title vector representation $R_{Ti}$ is used as Query and $R_{T}$ is used as Key and Value. After obtaining two attention’s output matrices $Att_{T\xrightarrow{}Ti}$ and $Att_{Ti\xrightarrow{}T}$ , we pass them into a fully connected layer with the size of 32, which is the same as $d_{T}$ . Finally, we obtain two vectors $R_{T\xrightarrow{}Ti}$ and $R_{Ti\xrightarrow{}T}$ .

III-E Fake news detector

We deploy a fully connected layer with softmax to predict whether the news articles are fake or real. Above five feature vectors $R_{T}$ , $R_{V}$ , $R_{E}$ , $R_{T\xrightarrow{}Ti}$ , and $R_{Ti\xrightarrow{}T}$ are fused together to generate the final news representation denoted as:

R_{F}=R_{T}\oplus R_{V}\oplus R_{E}\oplus R_{T\xrightarrow{}Ti}\oplus R_{Ti\xrightarrow{}T}\in\mathbb{R}^{164}

(6)

where $\oplus$ is the concatenation operator. The fake news detector is built on top of the multi-modal feature extractors and takes $R_{F}$ as input. Finally, given a news article, its label $\hat{y}$ can be predicted by a fully connected layer with all learned parameters.

IV EXPERIMENT AND Results

TABLE I: Data statistics.

News articles	Fake	True	Overall
PolitiFact	161	205	366
GossipCop	4,927	16,693	21,620

IV-A Experimental setup

Dataset We use two reputable public benchmark datasets from FakeNewsNet [42] for our experiments. News articles are respectively collected from PolitiFact and GossipCop. PolitiFact (politifact.com) is a well-known non-profit fact-checking website of political statements and reports in the U.S. GossipCop (gossipcop.com) is a website that fact-checks celebrity reports and entertainment stories published in magazines and newspapers. The PolitiFact and GossipCop datasets contain news articles published between May 2002 to July 2018 and July 2000 to December 2018, respectively. Experts in the relevant fields gave ground truth labels (fake or true), ensuring news labels’ accuracy. In this work, we focus on detecting fake news by incorporating text and image information. Thus, we remove news articles without any text or image, and statistics of the two datasets are provided in Table I.

Evaluation metrics We commonly use Accuracy (Acc) as the evaluation metric for binary classification tasks such as fake news detection. However, its reliability is significantly compromised when a dataset suffers from class imbalance. Therefore, in addition to the Acc metric, Precision (Pre), Recall (Rec), and $F_{1}$ score are also deployed.

Implementation details In our experiments, we split each dataset into training and testing sets in an 8:2 ratio. In our proposed model, we keep weights of pre-trained BERT and ResNeSt fixed and used them as feature extractors, because in preliminary experiments, we found that fine-tuning BERT and ResNeSt did not improve the performance of our model. Hyper-parameters used in experiments are as follows. The experiments are conducted on an AMD Ryzen 5800H CPU and an NVIDIA GeForce RTX 3060 GPU with 6GB RAM. Our algorithms are implemented on Pytorch deep learning framework[43] and are trained with Adaptive Moment Estimation (Adam)[44] optimizer. We use the same batch size of 16 instances in the training stage for all methods, and the model is trained for 10 epochs with a learning rate of $10^{-4}$ .

IV-B Baseline

We compare to the following baselines, which detect fake news using (i) textual (Sec.IV-B1), (ii) visual (Sec.IV-B2), or (iii) multi-modal information (Sec.IV-B3).

IV-B1 Text-based models

•

CNN [22]: CNN employs a convolutional neural network to learn the feature representations for misinformation identification and early detection tasks by framing news text into the fixed-length sequence;
•

GRU [19]: GRU is based on recurrent neural networks (RNN) for learning the hidden representations that can use the multilayer GRU network to consider the post as a variable-length time series;
•

BERT [39]: BERT is a bidirectional encoder from Transformer designed to pre-train deep bidirectional representations from unlabeled text with conditional computations common to both left and right contexts.

IV-B2 Image-based models

•

VGG-19 [45]: VGG-19 is a widely-used CNN with 19 layers for image classification. We use a fine-tuned VGG-19 as one of the baselines;
•

ResNet-50 [46]: ResNet-50 is a widely-used CNN with 50 layers in various feature extraction applications.

IV-B3 Multi-modal models

•

MVAE [28]: MVAE uses a bimodal variational autoencoder coupled with a binary classifier for fake news detection;
•

att-RNN [15]: att-RNN is a deep neural network model applicable for multi-modal fake news detection. It employs LSTM and VGG-19 with attention mechanism to fuse news articles’ textual, visual, and social-context features. We set the hyper-parameters the same and exclude the social-context features for a fair comparison;
•

SpotFake [29]: SpotFake utilizes the pre-trained language models (like BERT) to learn the textual information and employs VGG-19 (pre-trained on the ImageNet dataset) to obtain image features;
•

EANN [13]: EANN can derive event-invariant features and thus assist in detecting fake news on newly arrived events, which consists of the multi-modal feature extractor, the fake news detector, and the post discriminator. For fairness of comparison, we conduct experiments with a simplified version of EANN that excludes the post discriminator;
•

SpotFake+ [30]: SpotFake+ is an advanced version of SpotFake that extracts the textual feature using a pre-trained XLNet model;
•

SAFE [32]: SAFE extracts multi-modal (textual and visual) features of news content and their relationships through a similarity-aware multi-modal method for fake news detection.

TABLE II: Performance of various methods in detecting fake news.

Dataset	Methods	Acc	Pre	Rec	$\mathbf{F}_{1}$
	CNN	0.658	0.702	0.622	0.660
	GRU	0.681	0.667	0.632	0.644
	BERT	0.781	0.766	0.878	0.818
	VGG-19	0.458	0.492	0.473	0.482
	ResNet-50	0.485	0.478	0.501	0.489
PolitiFact	MVAE	0.726	0.761	0.678	0.717
PolitiFact	att-RNN	0.741	0.726	0.813	0.767
	SpotFake	0.770	0.753	0.795	0.770
	EANN	0.795	0.813	0.761	0.786
	SpotFake+	0.856	0.878	0.846	0.862
	SAFE	0.872	0.883	0.897	0.890
	TieFake	0.912	0.931	0.909	0.920
	CNN	0.741	0.733	0.775	0.753
	GRU	0.793	0.779	0.801	0.790
	BERT	0.836	0.872	0.829	0.850
	VGG-19	0.443	0.478	0.462	0.450
	ResNet-50	0.454	0.469	0.458	0.463
GossipCop	MVAE	0.782	0.802	0.751	0.776
GossipCop	att-RNN	0.774	0.798	0.821	0.809
	SpotFake	0.812	0.807	0.822	0.814
	EANN	0.833	0.842	0.835	0.838
	SpotFake+	0.864	0.859	0.882	0.870
	SAFE	0.831	0.843	0.894	0.868
	TieFake	0.892	0.887	0.902	0.894

IV-C Results

Experimental results in Table II further reveal several insightful observations.

(1) In both datasets, multi-domain models work much better than single-domain models. Once again, this validates the benefit of incorporating all kinds of information. For example, additional visual information can be used as complementary information to help detect fake news.

(2) Text-based models work much better than image-based models, demonstrating that textual features could be more helpful than visual ones in detecting fake news. The reason is that it is more challenging to learn the semantic meaning of visual features than textual features.

(3) SAFE outperforms all baselines on the PolitiFact dataset because SAFE jointly uses multi-modal (text and visual) and relational information to learn the representation of news, which is more applicable to small sample datasets. In addition, SpotFake and SpotFake+ achieve better results than other baselines on the GossipCop dataset, indicating that the pre-trained BERT and XLNet can obtain better textual information to improve model performance, which is more applicable to big sample datasets.

(4) The proposed TieFake outperforms all the baselines on both large and small sample datasets. In comparison with SAFE on the PolitiFact dataset, our model improves 4.0% in accuracy, 4.8% in precision, 1.2% in recall and 3.0% in $F_{1}$ . In comparison with SpotFake+ on the GossipCop dataset, our model improves 2.8% in accuracy, 2.8% in precision, 2.0% in recall and 2.4% in $F_{1}$ . This verifies that the proposed model can jointly capture multi-modal (text and visual) and emotional information, which can better characterize the underlying representation of news content, improving the performance of fake news detection.

TABLE III: The performance of TieFake variants.

Dataset	Methods	Acc	Pre	Rec	$\mathbf{F}_{1}$
	TieFake-T	0.612	0.632	0.598	0.615
	TieFake-V	0.866	0.858	0.871	0.864
PolitiFact	TieFake-E	0.857	0.851	0.878	0.864
	TieFake-S	0.904	0.925	0.902	0.914
	TieFake	0.912	0.931	0.909	0.920
	TieFake-T	0.593	0.588	0.623	0.605
	TieFake-V	0.862	0.872	0.846	0.859
GossipCop	TieFake-E	0.854	0.869	0.851	0.860
	TieFake-S	0.886	0.891	0.895	0.892
	TieFake	0.892	0.887	0.902	0.894

IV-D Ablation study

The proposed TieFake contains several components, thus we additionally compare the variants of TieFake to show the impact of each component.

•

TieFake-T: A variant of TieFake with the textual information is removed;
•

TieFake-V: A variant of TieFake with the visual information is removed;
•

TieFake-E: A variant of TieFake with the author potential emotion is removed.
•

TieFake-S: A variant of TieFake with the similarity between the title and text is removed.

Results in Table III indicate that (1) integrating news’ textual information, visual information, and author’s potential emotion improves the performance; (2) the proposed method TieFake outperforms TieFake-E, which shows the effectiveness of introducing the potential emotion to our model; (3) the proposed method TieFake outperforms TieFake-S, which proves that mining the similarity between the title and the text is effective for detecting fake news; (4) textual information is much more critical than visual information and potential author emotion.

V Conclusion

In this paper, we propose a simple but effective framework named TieFake to detect fake news, which utilizes both textual and visual features of news content and investigates subjective emotional features of authors. To our knowledge, it is the first model to exploit author emotion in the multi-modal fake news detection task. Primarily, we propose a novel attention mechanism to learn the similarity between the title and text. Experimental results verify the effectiveness of our model. The proposed method can be extended in the future to consider more complex information, e.g., network and video information. Additionally, there is still room for improvement on more complex fusion techniques to understand how different modalities play a role in fake news detection.

VI Acknowledgment

This work was supported by the National Natural Science Foundation of China (No. 62276053) and the Sichuan Science and Technology Program (No. 22ZDYF3621).

References

[1] H. Allcott and M. Gentzkow, “Social media and fake news in the 2016 election,” Journal of Economic Perspectives, vol. 31, no. 2, pp. 211–36, May 2017. [Online]. Available: https://www.aeaweb.org/articles?id=10.1257/jep.31.2.211
[2] S. Vosoughi, D. Roy, and S. Aral, “The spread of true and false news online,” science, vol. 359, no. 6380, pp. 1146–1151, 2018.
[3] X. Zhou and R. Zafarani, “A survey of fake news: Fundamental theories, detection methods, and opportunities,” ACM Computing Surveys (CSUR), vol. 53, no. 5, pp. 1–40, 2020.
[4] K. H. Jamieson and J. N. Cappella, Echo chamber: Rush Limbaugh and the conservative media establishment. Oxford University Press, 2008.
[5] K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, “Fake news detection on social media: A data mining perspective,” ACM SIGKDD explorations newsletter, vol. 19, no. 1, pp. 22–36, 2017.
[6] F. Qian, C. Gong, K. Sharma, and Y. Liu, “Neural user response generator: Fake news detection with collective user intelligence.” in IJCAI, vol. 18, 2018, pp. 3834–3840.
[7] Y. Dai, L. Shou, M. Gong, X. Xia, Z. Kang, Z. Xu, and D. Jiang, “Graph fusion network for text classification,” Knowledge-Based Systems, vol. 236, p. 107659, 2022.
[8] R. Fang, L. Wen, Z. Kang, and J. Liu, “Structure-preserving graph representation learning,” in Proceedings of the IEEE International Conference on Data Mining (ICDM), Orlando, FL, USA, 2022.
[9] H. Liu, Z. Kang, L. Zhang, L. Tian, and F. Hua, “Document-level relation extraction with cross-sentence reasoning graph,” in Proceedings of The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2023.
[10] X. Zhou, A. Jain, V. V. Phoha, and R. Zafarani, “Fake news early detection: A theory-driven model,” Digital Threats: Research and Practice, vol. 1, no. 2, pp. 1–25, 2020.
[11] O. Ajao, D. Bhowmik, and S. Zargari, “Sentiment aware fake news detection on online social networks,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 2507–2511.
[12] A. Giachanou, P. Rosso, and F. Crestani, “Leveraging emotional signals for credibility detection,” in Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, 2019, pp. 877–880.
[13] Y. Wang, F. Ma, Z. Jin, Y. Yuan, G. Xun, K. Jha, L. Su, and J. Gao, “Eann: Event adversarial neural networks for multi-modal fake news detection,” in Proceedings of the 24th acm sigkdd international conference on knowledge discovery & data mining(KDD), 2018, pp. 849–857.
[14] Z. Jin, J. Cao, Y. Zhang, J. Zhou, and Q. Tian, “Novel visual and statistical image features for microblogs news verification,” IEEE transactions on multimedia, vol. 19, no. 3, pp. 598–608, 2016.
[15] Z. Jin, J. Cao, H. Guo, Y. Zhang, and J. Luo, “Multimodal fusion with recurrent neural networks for rumor detection on microblogs,” in Proceedings of the 25th ACM international conference on Multimedia(ACM MM), 2017, pp. 795–816.
[16] M. L. Newman, J. W. Pennebaker, D. S. Berry, and J. M. Richards, “Lying words: Predicting deception from linguistic styles,” Personality and social psychology bulletin, vol. 29, no. 5, pp. 665–675, 2003.
[17] A. Shrestha and F. Spezzano, “Textual characteristics of news title and body to detect fake news: A reproducibility study,” in Advances in Information Retrieval, D. Hiemstra, M.-F. Moens, J. Mothe, R. Perego, M. Potthast, and F. Sebastiani, Eds. Cham: Springer International Publishing, 2021, pp. 120–133.
[18] C. Castillo, M. Mendoza, and B. Poblete, “Information credibility on twitter,” in Proceedings of the 20th international conference on World wide web(WWW), 2011, pp. 675–684.
[19] J. Ma, W. Gao, P. Mitra, S. Kwon, B. J. Jansen, K.-F. Wong, and M. Cha, “Detecting rumors from microblogs with recurrent neural networks,” in 25th International Joint Conferences on Artificial Intelligence(IJCAI), 2016, pp. 3818–3824.
[20] S. Kwon, M. Cha, K. Jung, W. Chen, and Y. Wang, “Prominent features of rumor propagation in online social media,” in 2013 IEEE 13th International Conference on Data Mining, 2013, pp. 1103–1108.
[21] A. Gupta, P. Kumaraguru, C. Castillo, and P. Meier, “Tweetcred: Real-time credibility assessment of content on twitter,” 11 2014, pp. 228–243.
[22] F. Yu, Q. Liu, S. Wu, L. Wang, T. Tan et al., “A convolutional approach for misinformation identification.” in IJCAI, 2017, pp. 3901–3907.
[23] P. Qi, J. Cao, T. Yang, J. Guo, and J. Li, “Exploiting multi-domain visual information for fake news detection,” in 2019 IEEE International Conference on Data Mining (ICDM), 2019, pp. 518–527.
[24] S. Qian, T. Zhang, and C. Xu, “Multi-modal multi-view topic-opinion mining for social event analysis,” in Proceedings of the 24th ACM International Conference on Multimedia, 2016, p. 2–11.
[25] S. Qian, T. Zhang, C. Xu, and J. Shao, “Multi-modal event topic model for social event analysis,” IEEE Transactions on Multimedia, vol. 18, no. 2, pp. 233–246, 2016.
[26] S. Liu, S. Qian, Y. Guan, J. Zhan, and L. Ying, “Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval,” in Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020, p. 1379–1388.
[27] X. Wu, C.-W. Ngo, and A. G. Hauptmann, “Multimodal news story clustering with pairwise visual near-duplicate constraint,” IEEE Transactions on Multimedia, vol. 10, no. 2, pp. 188–199, 2008.
[28] D. Khattar, J. S. Goud, M. Gupta, and V. Varma, “Mvae: Multimodal variational autoencoder for fake news detection,” in The world wide web conference(WWW), 2019, pp. 2915–2921.
[29] S. Singhal, R. R. Shah, T. Chakraborty, P. Kumaraguru, and S. Satoh, “Spotfake: A multi-modal framework for fake news detection,” in 2019 IEEE fifth international conference on multimedia big data(BigMM). IEEE, 2019, pp. 39–47.
[30] S. Singhal, A. Kabra, M. Sharma, R. R. Shah, T. Chakraborty, and P. Kumaraguru, “Spotfake+: A multimodal framework for fake news detection via transfer learning (student abstract),” in Proceedings of the AAAI conference on artificial intelligence(AAAI), vol. 34, no. 10, 2020, pp. 13 915–13 916.
[31] Y. Wang, S. Qian, J. Hu, Q. Fang, and C. Xu, “Fake news detection via knowledge-driven multimodal graph convolutional networks,” in Proceedings of the 2020 International Conference on Multimedia Retrieval, ser. ICMR ’20, 2020, p. 540–547.
[32] X. Zhou, J. Wu, and R. Zafarani, “Safe: Similarity-aware multi-modal fake news detection,” in Advances in Knowledge Discovery and Data Mining(PAKDD), 2020, pp. 354–367.
[33] K. Xu, J. L. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” in Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ser. ICML’15. JMLR.org, 2015, p. 2048–2057.
[34] L. Ren, G. Duan, T. Huang, and Z. Kang, “Multi-local feature relation network for few-shot learning,” Neural Computing and Applications, vol. 34, no. 10, pp. 7393–7403, 2022.
[35] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” Computer Science, 2014.
[36] J. Chen, H. Zhang, X. He, L. Nie, W. Liu, and T.-S. Chua, “Attentive collaborative filtering: Multimedia recommendation with item- and component-level attention,” in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR ’17. New York, NY, USA: Association for Computing Machinery, 2017, p. 335–344. [Online]. Available: https://doi.org/10.1145/3077136.3080797
[37] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, ser. NIPS’17. Red Hook, NY, USA: Curran Associates Inc., 2017, p. 6000–6010.
[38] T. Chen, X. Li, H. Yin, and J. Zhang, “Call attention to rumors: Deep attention based recurrent neural networks for early rumor detection,” in Trends and Applications in Knowledge Discovery and Data Mining, M. Ganji, L. Rashidi, B. C. M. Fung, and C. Wang, Eds. Cham: Springer International Publishing, 2018, pp. 40–52.
[39] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technoologies(NAACL), 2019, pp. 4171–4186.
[40] H. Zhang, C. Wu, Z. Zhang, Y. Zhu, H. Lin, Z. Zhang, Y. Sun, T. He, J. Mueller, R. Manmatha et al., “Resnest: Split-attention networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2022, pp. 2736–2746.
[41] X. Zhang, J. Cao, X. Li, Q. Sheng, L. Zhong, and K. Shu, “Mining dual emotion for fake news detection,” in Proceedings of the Web Conference 2021(WWW), 2021, pp. 3465–3476.
[42] K. Shu, D. Mahudeswaran, S. Wang, D. Lee, and H. Liu, “Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media,” Big data, vol. 8, no. 3, pp. 171–188, 2020.
[43] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
[44] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014.
[45] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations(ICLR), 2015.
[46] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), 2016, pp. 770–778.

TieFake: Title-Text Similarity and Emotion-Aware Fake News Detection ††thanks: † Corresponding Author: Zhao Kang, [email protected]