Enhancing Financial Sentiment Analysis with Expert-Designed Hint

Chung-Chi Chen,¹ Hiroya Takamura,¹ Ichiro Kobayashi,² Yusuke Miyao³
¹ Artificial Intelligence Research Center, AIST, Japan
² Ochanomizu University, Japan
³ University of Tokyo, Japan
[email protected], [email protected],
[email protected], [email protected]

Abstract

This paper investigates the role of expert-designed hint in enhancing sentiment analysis on financial social media posts. We explore the capability of large language models (LLMs) to empathize with writer perspectives and analyze sentiments. Our findings reveal that expert-designed hint, i.e., pointing out the importance of numbers, significantly improve performances across various LLMs, particularly in cases requiring perspective-taking skills. Further analysis on tweets containing different types of numerical data demonstrates that the inclusion of expert-designed hint leads to notable improvements in sentiment analysis performance, especially for tweets with monetary-related numbers. Our findings contribute to the ongoing discussion on the applicability of Theory of Mind in NLP and open new avenues for improving sentiment analysis in financial domains through the strategic use of expert knowledge.

1 Introduction

Financial sentiment analysis, which classifies given text into bullish or bearish categories, has long been a topic of research in NLP Baker and Wurgler (2007); Liu (2015); Xu and Cohen (2018). Here, bullish (bearish) indicates the investor’s expectation that the price of the mentioned stock will rise (decline). When dealing with financial textual data, numbers have often been highlighted. Some studies attempt to extract numerical information Chen et al. (2019a, b). Others focus on the role of numbers in financial documents to explore reasoning skills Zhu et al. (2021); Chen et al. (2021); Nan et al. (2022). Further research suggests that the attention to numbers in financial documents can enhance downstream task performance, such as volatility forecasting Yang et al. (2022); Shi et al. (2023). However, the relationship between financial sentiment analysis and numbers in text is rarely discussed. Building on this premise, this paper examines the efficacy of prompting LLMs to consider expert-designed hints, specifically the importance of numbers in understanding financial documents. Our findings indicate that the implementation of “expert-designed hints,” such as emphasizing the relevance of numerical data, substantially enhances the performance of sentiment analysis across various LLMs.

Financial sentiment analysis has been a longstanding topic in Natural Language Processing (NLP) research. Some studies focus on the writer’s self-annotated labels, which are provided by the writer when posting tweets Li and Shah (2017); Xing et al. (2020). Others make predictions about the readers’ sentiment Agić et al. (2010); Yuan et al. (2020); Gaillat et al. (2018). Additionally, there are studies that discuss the difference between writers’ and readers’ sentiment Maks and Vossen (2013); Berengueres and Castro (2017). For example, Chen et al. (2020) show that writers’ and readers’ financial sentiments may differ based on a survey of 10K financial tweets. This discrepancy provides a basis for discussing perspective-taking within sentiment analysis, specifically, the extent to which models, acting as readers, can deduce a writer’s sentiment, especially when it is implicitly conveyed. This paper aims to bridge this gap by utilizing the Fin-SoMe dataset Chen et al. (2020), comprising approximately 10,000 annotated social media posts from both the writer’s and reader’s viewpoints. Our results suggest that the expert-designed hint activates perspective-taking and further enhances the performance of financial sentiment analysis.

In sum, this short paper aims to answer the following three research questions (RQ):

RQ1: Should LLMs be prompted to consider hints identified by experts, or are they inherently capable of recognizing such hints?

RQ2: To what extent can LLMs accurately apply perspective-taking ability when analyzing sentiments in financial social media data?

RQ3: Considering the importance of numerical data in financial documents, does the category of numbers affect sentiment analysis tasks?

2 Related Work

Theory of Mind (ToM) Baron-Cohen (1997); Baron-Cohen and Hammer (1997); Barnes-Holmes et al. (2004); Baron-Cohen et al. (2013) has been a subject of interest for a long time and has recently regained attention. Discussions from the perspectives of vision Liu et al. (2022), gaming Li et al. (2023), and psychology van Dijk et al. (2023) are provided, but discussions in sentiment analysis are scarce. Perspective-taking involves understanding others’ thoughts by empathizing with their viewpoints, a concept extensively explored within the realm of psychology under the ToM. LLMs have shown remarkable capabilities in grasping the semantics explicitly expressed in texts. Consequently, researchers in the NLP community have begun to investigate the applicability of ToM in evaluating whether models can empathize with human perspectives Liu et al. (2022); Li et al. (2023); van Dijk et al. (2023); Sclar et al. (2023); Sileo and Lernould (2023). Despite numerous studies replicating psychological experiments to assess language models’ abilities, investigations into perspective-taking for enhancing understanding of sentiments on social media are scant. We propose a novel task design based on an existing dataset and discuss perspective-taking in financial sentiment analysis.

3 Dataset

We use the Fin-SoMe dataset Chen et al. (2020) in our experiments. A total of 10,000 tweets were collected from the social media platform Stocktwits,¹¹1https://stocktwits.com/ each labeled by the post’s writer as either bullish or bearish. Reader sentiment was gauged by asking annotators to classify each tweet based on its content into bullish, bearish, or none categories. Tweets that did not explicitly convey sentiment were labeled as none. To discuss perspective-taking ability, instances that received a bullish or bearish label from the post’s writer but were not labeled as bullish or bearish by the reader were deemed to require perspective-taking ability in this study. The dataset statistics is presented in Table 1. This research does not involve training models; the entire dataset is treated as a test set to assess models’ proficiency in financial sentiment analysis. The experimental results from this dataset inform the discussions of RQ1 and RQ2.

4 Methods

Our experiments aim to ascertain if LLMs inherently possess the ability to recognize and utilize the nuanced hint that experts deploy in deciphering financial documents. Furthermore, we discuss the ability of LLMs to require perspective-taking ability. We posit that if LLMs are inherently knowledgeable of and can apply these nuanced hints, their performance in sentiment analysis tasks should remain consistent, irrespective of whether these hints are explicitly highlighted within the prompt. Conversely, a notable discrepancy in performance would imply LLMs’ lack of intrinsic capability to leverage these nuanced hints without explicit expert instruction.

Writer Sentiment	Whole	Perspective-Taking
Bullish	8,573	1,557
Bearish	1,427	277
Total	10,000	1,834

Table 1: Dataset Statistics.

LLM	Approach	Whole		Perspective-Taking		Contains at Least One Number
LLM	Approach	Micro-F1	Weighted-F1	Micro-F1	Weighted-F1	Micro-F1	Weighted-F1
PaLM 2	Simple Prompt	80.84	84.49	70.77	76.96	80.61	84.29
	CoT	79.09	83.06	68.97	75.63	80.02	83.57
	CoT + Hint	80.38	83.88	72.90*	78.30*	81.32*	84.39*
Gemini Pro	Simple Prompt	68.26	77.48	42.80	55.71	69.06	78.01
	CoT	71.94	80.71	51.31	64.54	72.78	81.24
	CoT + Hint	74.45*	81.66*	54.20*	65.91*	75.32*	82.18*
GPT-3.5	Simple Prompt	68.43	77.43	41.44	54.88	69.23	77.95
	CoT	68.99	77.75	43.35	55.76	69.80	78.27
	CoT + Hint	79.68*	83.72*	65.27*	73.22*	80.62*	84.23*
GPT-4	Simple Prompt	78.01	84.59	54.80	67.68	78.93	85.13
	CoT	73.35	82.04	48.64	62.57	74.21	82.58
	CoT + Hint	81.86*	86.27*	62.00*	72.11*	82.82*	86.79*

Table 2: Experimental results using the entire Fin-SoMe dataset and subsets within different settings. We use bold font to highlight the best performance among all approaches for the same LLM. An asterisk (*) indicates a significant difference when compared to the second-best performance across different approaches.

Tweet (Bullish)	$MTBC haha whoever popped that fake ask above 2.60 earlier just loaded good 50k at 2.50 and 2.45!
	Analysis	Sentiment
CoT	The user is commenting on a situation where someone apparently tried to manipulate the market price of $MTBC but ended up buying at a lower price. The user seems to be mocking those who fell for the trick.	Neutral
CoT + Hint	The sentiment of this tweet appears to be bullish for the person who managed to buy at lower prices, as they seem to have manipulated the market in their favor. … Overall, considering the tone of mockery and satisfaction in the tweet, it leans more towards bullish sentiment as it implies an expectation of profit from the lower purchase prices.	Bullish

Table 3: Case Study.

Furthermore, our methodology incorporates not only the direct sentiment prediction approach, termed as the Simple Prompt, but also includes the Chain-of-Thought (CoT) technique Wei et al. (2022), which involves eliciting an analytical reasoning before concluding with a prediction. To evaluate the potential enhancement in performance through the integration of nuanced hints, we augment our prompts with a directive emphasizing the importance of numerical data or statistics embedded within the tweets, as these elements could be pivotal in deducing the overarching sentiment. This approach is inspired by numerous studies on number comprehension and reasoning within financial texts Zhu et al. (2021); Chen et al. (2021); Nan et al. (2022); Yang et al. (2022); Shi et al. (2023), which collectively suggest that numerical information plays a critical role in understanding intentions or making informed decisions. In short, we add a sentence, “Focus particularly on any numerical data or statistics present in the tweet, as these figures may be crucial in determining the overall sentiment.”, in the CoT prompt as a hint in the experiment.

Our analysis encompasses four LLMs, including PaLM 2 Anil et al. (2023), Gemini Pro,²²2https://deepmind.google/technologies/gemini/ GPT-3.5, and GPT-4.³³3https://platform.openai.com/docs/models To quantify their performance, we employ both Micro-F1 and Weighted-F1 scores. Furthermore, we utilize McNemar’s test McNemar (1947) to ascertain the statistical significance of performance disparities among the models, setting the significance threshold at $\alpha=0.05$ .

5 Experimental Results

5.1 Overall Performance

Table 2 shows the experimental results. First, when employing a Simple Prompt, the performance of PaLM 2 is superior to all other LLMs. Second, the CoT approach does not invariably result in enhanced performance; it improves performance in two out of four LLMs (Gemini Pro and GPT-3.5). Third, the inclusion of an expert-designed hint consistently improves performance across all LLMs compared to using CoT. Moreover, it significantly outperforms other methods in three out of four LLMs. Although it marginally underperforms compared to the Simple Prompt with PaLM 2, the difference is not statistically significant. This finding addresses RQ1 and rejects our hypothesis. The experimental outcomes suggest that hints, specifically numbers, commonly utilized in prior research for analyzing financial documents, are not automatically leveraged by LLMs in financial text analysis without explicit guidance. A concise summary of these experiments and findings indicates that a simple yet crucial hint provided by experts can significantly enhance the results of sentiment analysis in financial social media data. In conclusion, our primary objective was to investigate whether the expert-designed hint utilized by experts affects the performance of LLMs in financial sentiment analysis. Our findings affirmatively answer RQ1, indicating that the addition of an expert-designed hint generally improves performance.

5.2 Perspective-Taking Subset

To address RQ2, we focus on analyzing the perspective-taking subset. The experimental results in Table 2 initially reveal that this subset poses more challenges, as evidenced by lower performance compared to experiments involving the entire dataset, regardless of the LLMs and approaches applied. Additionally, the expert-designed hint consistently yields superior performance in this subset across all LLMs. These results suggest that the expert-designed hint facilitates the activation of the perspective-taking capability of LLMs. Table 3 presents an example tweet labeled as bullish by the writer of the tweet. The analysis and sentiment labels generated by GPT-4 using different approaches are also provided. This example firstly highlights the significance of numerical information in financial texts. Ignoring the four numbers within this 18-token tweet would result in a loss of considerable information. Secondly, it demonstrates that GPT-4 can comprehend the content and deliver a precise analysis. Nonetheless, the analysis and sentiment label change upon the addition of a hint, leading to a more thorough analysis by recognizing the tone and emotions of the writer and correctly aligning the sentiment label with that of the writer. To sum up, our further analysis reveals that part of the improvement is attributable to the activation of perspective-taking, thereby answering RQ2.

5.3 Tweets with Numbers

Given that the expert-designed hint pertains to numbers, it is crucial to verify whether this hint indeed enhances performance in tweets containing numbers. To evaluate the outcomes, we assess the performance of all methods on a subset where each tweet includes at least one number. The results are presented in Table 2. Generally, the performance on this subset surpasses that observed with the entire dataset. However, the highest performances with CoT + Hint, regardless of the LLM employed, exceeded those achieved with the full dataset. This suggests that the hint significantly aids in financial sentiment analysis. Furthermore, these results reveal a marked distinction between different methodologies when utilizing PaLM 2, indicating that, although the performance differential between the Simple Prompt and CoT + Hint approaches is negligible for the entire dataset, the performance gain within the subset containing numbers is significant. In summary, these findings demonstrate that the expert-designed hint substantially facilitates financial sentiment analysis by merely reminding models of its presence.

5.4 Category of the Number

Given the significance of numbers in social media data, we further analyze the enhancement in tweets containing various types of numbers. To discuss RQ3, we further compare the FinSoMe dataset with the FinNum dataset Chen et al. (2019a). The FinNum dataset comprises annotations for 8,868 numbers found in financial social media posts, categorized into seven types specifically designed for interpreting numbers in financial contexts: monetary, percentage, option, indicator, temporal, quantity, and product/version. We identified 6,493 tweets present in both the FinSoMe and FinNum datasets and used this subset to explore how differences in categories of numbers influence sentiment analysis performance.

We divided this set into seven groups based on the number of category labels in the FinNum dataset. The improvement is calculated based on the micro-F1 performance using Simple Prompt and CoT + Hint. Table 4 presents the results. Firstly, 37.53% of instances contain numbers related to the Monetary category, and we observed that the enhancement in this category is notably high compared to other groups. Additionally, significant improvement is a phenomenon observed regardless of the LLMs applied. The example in Table 3 also shows the importance of numbers in the Monetary category, with all numbers (2.50, 50k, 2.50, and 2.45) being related to monetary values. Secondly, enhancement is observed in most groups, with only a few cases showing worse performance. This indicates that a simple hint can effectively guide LLMs to perform more comprehensive sentiment analysis, focusing on aspects considered important by experts.

Category	Instance (%)	PaLM 2	Gemini Pro	GPT-3.5	GPT-4
Monetary	37.53	11.98	42.35	59.37	22.64
Temporal	30.23	0.80	18.16	32.85	7.74
Percent	13.32	-1.38	5.72	11.16	7.79
Quantity	12.46	1.48	9.77	14.83	4.20
Indicator	2.43	0.00	0.00	10.76	4.43
Option	2.28	-8.98	8.98	25.17	2.99
Product Number	1.74	1.77	10.62	17.70	13.27

Table 4: Improvement (%) in the subset of tweets containing a number in the target category.

6 Conclusion

This study investigates the impact of employing expert-designed hint on the performance of LLMs in financial sentiment analysis. We find that LLMs do not inherently utilize subtle hints crucial for sentiment analysis without explicit instruction. Introducing a simple, expert-derived hint that highlights the significance of numerical data substantially improves the models’ capability to identify sentiments. This enhancement is especially notable in scenarios requiring perspective-taking, where the models must deduce the sentiment implied by the writer, highlighting the necessity of explicit guidance in financial sentiment analysis tasks.

Limitation

The limitations of this study are discussed as follows.

Firstly, the findings of this study are primarily based on financial social media data, particularly from the Stocktwits platform. This focus may limit the generalizability of our conclusions to other domains or types of social media content. Future studies could explore whether the observed benefits of perspective-taking and expert-designed hint extend to other domains, such as healthcare or politics, where sentiment analysis is equally critical.

Secondly, this study simplifies the concept of perspective-taking. However, perspective-taking in human communication is a complex, multi-dimensional process that involves understanding emotional states, intentions, and contextual factors. Future work could aim to model these additional layers of complexity to achieve a more holistic understanding of sentiment in social media texts.

Another limitation is the focus on only four LLMs in our experiments. While these models are among the most advanced at the time of our study, the rapidly evolving field of natural language processing continually introduces new models that may offer different insights into the challenges of financial sentiment analysis. Testing our approach with a wider array of LLMs could provide a more comprehensive understanding of its effectiveness.

Lastly, our study’s focus on numerical data as a key element of financial sentiment analysis may overlook other important factors that influence sentiment interpretation, such as linguistic subtleties, cultural references, or domain-specific knowledge. Incorporating these dimensions into future research could provide a more holistic understanding of the challenges and opportunities in applying large language models to financial sentiment analysis.

References

Agić et al. (2010) Željko Agić, Nikola Ljubešić, and Marko Tadić. 2010. Towards sentiment analysis of financial texts in Croatian. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Valletta, Malta. European Language Resources Association (ELRA).
Anil et al. (2023) Rohan Anil, Andrew M Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, et al. 2023. Palm 2 technical report. arXiv preprint arXiv:2305.10403.
Baker and Wurgler (2007) Malcolm Baker and Jeffrey Wurgler. 2007. Investor sentiment in the stock market. Journal of economic perspectives, 21(2):129–151.
Barnes-Holmes et al. (2004) Yvonne Barnes-Holmes, Louise McHugh, and Dermot Barnes-Holmes. 2004. Perspective-taking and theory of mind: A relational frame account. The Behavior Analyst Today, 5(1):15.
Baron-Cohen (1997) Simon Baron-Cohen. 1997. Mindblindness: An essay on autism and theory of mind. MIT press.
Baron-Cohen and Hammer (1997) Simon Baron-Cohen and Jessica Hammer. 1997. Parents of children with asperger syndrome: what is the cognitive phenotype? Journal of cognitive neuroscience, 9(4):548–554.
Baron-Cohen et al. (2013) Simon Baron-Cohen, Helen Tager-Flusberg, and Michael Lombardo. 2013. Understanding other minds: Perspectives from developmental social neuroscience. OUP Oxford.
Berengueres and Castro (2017) Jose Berengueres and Dani Castro. 2017. Differences in emoji sentiment perception between readers and writers. In 2017 IEEE International Conference on Big Data (Big Data), pages 4321–4328. IEEE.
Chen et al. (2020) Chung-Chi Chen, Hen-Hsen Huang, and Hsin-Hsi Chen. 2020. Issues and perspectives from 10,000 annotated financial social media data. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6106–6110.
Chen et al. (2019a) Chung-Chi Chen, Hen-Hsen Huang, Hiroya Takamura, and Hsin-Hsi Chen. 2019a. Overview of the ntcir-14 finnum task: Fine-grained numeral understanding in financial social media data. In Proceedings of the 14th NTCIR Conference on Evaluation of Information Access Technologies, pages 19–27.
Chen et al. (2019b) Chung-Chi Chen, Hen-Hsen Huang, Chia-Wen Tsai, and Hsin-Hsi Chen. 2019b. Crowdpt: Summarizing crowd opinions as professional analyst. In The World Wide Web Conference, pages 3498–3502.
Chen et al. (2021) Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan Routledge, and William Yang Wang. 2021. FinQA: A dataset of numerical reasoning over financial data. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3697–3711, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Gaillat et al. (2018) Thomas Gaillat, Manel Zarrouk, André Freitas, and Brian Davis. 2018. The SSIX corpora: Three gold standard corpora for sentiment analysis in English, Spanish and German financial microblogs. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
Li et al. (2023) Huao Li, Yu Chong, Simon Stepputtis, Joseph Campbell, Dana Hughes, Charles Lewis, and Katia Sycara. 2023. Theory of mind for multi-agent collaboration via large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 180–192, Singapore. Association for Computational Linguistics.
Li and Shah (2017) Quanzhi Li and Sameena Shah. 2017. Learning stock market sentiment lexicon and sentiment-oriented word vector from StockTwits. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pages 301–310, Vancouver, Canada. Association for Computational Linguistics.
Liu et al. (2022) Andy Liu, Hao Zhu, Emmy Liu, Yonatan Bisk, and Graham Neubig. 2022. Computational language acquisition with theory of mind. In The Eleventh International Conference on Learning Representations.
Liu (2015) Shuming Liu. 2015. Investor sentiment and stock market liquidity. Journal of Behavioral Finance, 16(1):51–67.
Maks and Vossen (2013) Isa Maks and Piek Vossen. 2013. Sentiment analysis of reviews: Should we analyze writer intentions or reader perceptions? In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, pages 415–419, Hissar, Bulgaria. INCOMA Ltd. Shoumen, BULGARIA.
McNemar (1947) Quinn McNemar. 1947. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2):153–157.
Nan et al. (2022) Linyong Nan, Chiachun Hsieh, Ziming Mao, Xi Victoria Lin, Neha Verma, Rui Zhang, Wojciech Kryściński, Hailey Schoelkopf, Riley Kong, Xiangru Tang, Mutethia Mutuma, Ben Rosand, Isabel Trindade, Renusree Bandaru, Jacob Cunningham, Caiming Xiong, Dragomir Radev, and Dragomir Radev. 2022. FeTaQA: Free-form table question answering. Transactions of the Association for Computational Linguistics, 10:35–49.
Sclar et al. (2023) Melanie Sclar, Sachin Kumar, Peter West, Alane Suhr, Yejin Choi, and Yulia Tsvetkov. 2023. Minding language models’ (lack of) theory of mind: A plug-and-play multi-character belief tracker. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13960–13980, Toronto, Canada. Association for Computational Linguistics.
Shi et al. (2023) Ming-Xuan Shi, Chung-Chi Chen, Hen-Hsen Huang, and Hsin-Hsi Chen. 2023. Enhancing volatility forecasting in financial markets: A general numeral attachment dataset for understanding earnings calls. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 2: Short Papers), pages 37–42, Nusa Dua, Bali. Association for Computational Linguistics.
Sileo and Lernould (2023) Damien Sileo and Antoine Lernould. 2023. MindGames: Targeting theory of mind in large language models with dynamic epistemic modal logic. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 4570–4577, Singapore. Association for Computational Linguistics.
van Dijk et al. (2023) Bram van Dijk, Marco Spruit, and Max van Duijn. 2023. Theory of mind in freely-told children’s narratives: A classification approach. In Findings of the Association for Computational Linguistics: ACL 2023, pages 12979–12993, Toronto, Canada. Association for Computational Linguistics.
Wei et al. (2022) Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
Xing et al. (2020) Frank Xing, Lorenzo Malandri, Yue Zhang, and Erik Cambria. 2020. Financial sentiment analysis: An investigation into common mistakes and silver bullets. In Proceedings of the 28th International Conference on Computational Linguistics, pages 978–987, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Xu and Cohen (2018) Yumo Xu and Shay B. Cohen. 2018. Stock movement prediction from tweets and historical prices. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1970–1979.
Yang et al. (2022) Linyi Yang, Jiazheng Li, Ruihai Dong, Yue Zhang, and Barry Smyth. 2022. Numhtml: Numeric-oriented hierarchical transformer model for multi-task financial forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 11604–11612.
Yuan et al. (2020) Chaofa Yuan, Yuhan Liu, Rongdi Yin, Jun Zhang, Qinling Zhu, Ruibin Mao, and Ruifeng Xu. 2020. Target-based sentiment annotation in Chinese financial news. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5040–5045, Marseille, France. European Language Resources Association.
Zhu et al. (2021) Fengbin Zhu, Wenqiang Lei, Youcheng Huang, Chao Wang, Shuo Zhang, Jiancheng Lv, Fuli Feng, and Tat-Seng Chua. 2021. TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3277–3287, Online. Association for Computational Linguistics.