BeliN: A Novel Corpus for Bengali Religious News Headline Generation using Contextual Feature Fusion

Md Osama [email protected] Ashim Dey [email protected] Kawsar Ahmed [email protected] Muhammad Ashad Kabir [email protected]

Abstract

Automatic text summarization, particularly headline generation, remains a critical yet underexplored area for Bengali religious news. Existing approaches to headline generation typically rely solely on the article content, overlooking crucial contextual features such as sentiment, category and aspect. This limitation significantly hinders their effectiveness and overall performance. This study addresses this limitation by introducing a novel corpus, BeliN (Bengali Religious News) – comprising religious news articles from prominent Bangladeshi online newspapers, and MultiGen – a contextual multi-input feature fusion headline generation approach. Leveraging transformer-based pre-trained language models such as BanglaT5, mBART, mT5, and mT0, MultiGen integrates additional contextual features—including category, aspect, and sentiment—with the news content. This fusion enables the model to capture critical contextual information often overlooked by traditional methods. Experimental results demonstrate the superiority of MultiGen over the baseline approach that uses only news content, achieving a BLEU score of 18.61 and ROUGE-L score of 24.19, compared to baseline approach scores of 16.08 and 23.08, respectively. These findings underscore the importance of incorporating contextual features in headline generation for low-resource languages. By bridging linguistic and cultural gaps, this research advances natural language processing for Bengali and other underrepresented languages. To promote reproducibility and further exploration, the dataset and implementation code are publicly accessible at https://github.com/akabircs/BeliN.

keywords:

Bengali , Headline generation , Religious , News article , Feature fusion , Aspect , Sentiment , Transformer

^†^†journal: Elsevier

\affiliation

[1]organization=Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, city=Raozan, state=Chittagong, postcode=4349, country=Bangladesh

\affiliation

[2]organization=Department of Computer Science and Engineering, Bangladesh Army University of Science and Technology (BAUST), city=Saidpur, state=Nilphamari, postcode=5310, country=Bangladesh

\affiliation

[3]organization=School of Computing, Mathematics and Engineering, Charles Sturt University, city=Bathurst, state=NSW, postcode=2795, country=Australia

1 Introduction

A newspaper title serves as an essential element, often shaping the reader’s first impression by being both representative of the content and attention-grabbing [1]. Representative headlines play a crucial role in information retrieval systems, which prioritize keywords in headlines to enhance searchability and relevance [2]. Consequently, researchers have extensively explored automated techniques, such as text summarization, to generate compelling and accurate headlines from articles [3].

Text summarization systems can be categorized into two main approaches: extractive and abstractive techniques [4]. Early text summarization techniques predominantly relied on extractive processes, which identify and select significant portions of text, such as key phrases or sentences, directly from the document. These methods generate summaries by reproducing the most critical points verbatim, ensuring fidelity to the original content [5, 6]. Abstractive text summarization, a more recent development in the field of Automatic Text Summarization (ATS), generates concise summaries by reformulating key ideas from the original text. Unlike extractive methods, which replicate exact phrases, abstractive summarization produces a condensed script that captures the essence of the document in a clear and coherent manner [7, 4]. ATS aims to identify and appropriately prioritize the informative components of the source article, making it particularly useful for summarizing blogs, newspapers, and other text-based media [8].

Headline generation, a specialized form of text summarization, can also employ both extractive and abstractive approaches. However, compared to the extractive technique, the abstractive method more accurately generates real-world headlines because it captures the underlying meaning and context of the content, allowing for greater flexibility and creativity in rephrasing, while the extractive approach tends to rely on directly selecting portions of the input text [9]. Traditionally, headlines are generated solely based on the content of the article [10], a method that, while effective in capturing key information, limits the potential for engaging and compelling headline creation [11]. This approach tends to focus purely on summarizing the main points, often neglecting the broader context or emotional resonance required to capture a reader’s attention in today’s fast-paced media environment [12]. Furthermore, while there has been extensive research in headline generation for high-resource languages such as English [8], relatively few studies have focused on Bengali, particularly in the context of Bengali religious news [13, 14]. One notable study, Shironaam [13], focused on Bengali and incorporated additional contextual information, such as category, in the headline generation process. However, their sample of religious news was very small, and they did not consider other significant contextual elements, such as sentiment and aspect. As of 2024, Bengali is spoken by over 237 million native speakers and an additional 41 million second-language speakers, making it the fifth most spoken native language and the seventh most spoken language worldwide [15]. This widespread usage underscores Bengali’s significant role in global linguistic diversity.

In this paper, we introduce a novel Bengali religious news corpus, named BeliN, with a multi-input approach that incorporates additional features alongside the news content to generate more accurate and contextually relevant headlines. Specifically, we include the article’s category, content aspect, and sentiment as additional input features, as illustrated in Figure 1. This approach aims to narrow the domain and enhance the quality of the generated headlines. The rationale behind this method is that a headline should not only capture the essence of the article but also reflect its main context concisely. By integrating multiple inputs, we seek to provide a more comprehensive understanding of the article’s content, thereby producing more precise and relevant headlines. The inclusion of these features in the headline generation process is expected to improve the model’s performance and better reflect the underlying article’s themes [13]. This multi-input approach not only narrows the domain but also enriches the generated headlines with additional contextual information, making them more informative and representative of the article.

Refer to caption — Figure 1: An overview of multi-input headline generation

By leveraging these additional inputs, we aim to create a more robust and accurate headline-generation system for Bengali news articles, particularly within the domain of religious news. The proposed Bengali news headline generation system, named MultiGen, employs and evaluates state-of-the-art pretrained transformer models such as mT5, mT0, mBART, and BanglaT5. Our experimental evaluations show that BanglaT5 outperforms others, offering significant improvements in headline accuracy and contextual relevance. The key Contributions of this work are as follows:

1.

We have developed a novel dataset, BeliN, focused on the underexplored domain of religious news in the low-resource Bengali language. The dataset comprises 2,520 news articles and their corresponding headlines, making it valuable for various NLP tasks, including headline generation, text summarization, news categorization, sentiment analysis and aspect classification.
2.

We have proposed the MultiGen approach, which incorporates additional features such as category, aspect, and sentiment as auxiliary information to enhance the headline generation process. Our approach demonstrates the importance of integrating multiple inputs, achieving significant improvements over the baseline approach.
3.

We have employed and evaluated the performance of state-of-the-art pre-trained models for generating compelling and attention-grabbing news headlines.

The remainder of this paper is organized as follows: Section 2 provides a comprehensive review of the related work that forms the foundation of this research. Section 3 introduces the BeliN corpus, presenting its key characteristics and statistical summary. The methodology and approaches adopted in this study are detailed in Section 4. Section 5 discusses the experimental setup, metrics, and models and presents the evaluation results. A detailed analysis, including a discussion of the findings and limitations, is presented in Section 6. Finally, Section 7 concludes the paper and outlines potential future directions.

2 Related Work

Generating news headlines has been a prominent area of research within the field of natural language processing (NLP) [16, 17]. Although significant advancements have been made in text summarization [18, 19, 20] and headline generation [10, 21], progress in low-resource languages, such as Bengali, remains limited [13, 22, 23, 24]. The task of generating headlines, particularly for religious news in Bengali, poses unique challenges due to the scarcity of annotated datasets [13].

Table 1: Summary of the related work.

Study	Dataset/Corpus				Feature				Task	Approach
Study	Name	Language	Religious	Availability	Content	Category	Aspect	Sentiment	Task	Approach
[25]	Newsroom	English	$\times$	Public^a	✓	$\times$	$\times$	$\times$	S	A & E
[26]	CNN Daily Mail	English	$\times$	Public^b	✓	$\times$	$\times$	$\times$	S	A
[27]	CNN Corpus	English	$\times$	Private	✓	$\times$	$\times$	$\times$	S	E
[28]	CCSum	English	$\times$	Public^c	✓	$\times$	$\times$	$\times$	S	A
[29]	XSum	English	$\times$	Public^l	✓	$\times$	$\times$	$\times$	H	A
[30]	NewSHead	English	$\times$	Public^d	✓	✓	$\times$	$\times$	H	A
[31, 32]	PENS	English	$\times$	Public^e	✓	✓	$\times$	$\times$	H	A
[33]	CNN, New York Times [34]	English	$\times$	Private	✓	$\times$	$\times$	$\times$	H	A
[35]	DUC [36], Gigaword [37, 38]	English	$\times$	Public^f,g	✓	$\times$	$\times$	$\times$	H	A
[1]	Newsroom [25], Gigaword [37, 38]	English	$\times$	Public^a,g	✓	$\times$	$\times$	$\times$	S	A & E
[39]	Newsroom [25], Gigaword [37, 38], CNN Daily Mail [26]	English	$\times$	Public^a,g,b	✓	$\times$	$\times$	$\times$	S	A
[40]	SuDer	Turkish	$\times$	Private	✓	$\times$	$\times$	$\times$	H	A
[41]	AFRIHG	African	$\times$	Private	✓	$\times$	$\times$	$\times$	H	A
[42]	RIA [43], Lenta [44]	Russian	$\times$	Public^h,i	✓	$\times$	$\times$	$\times$	H	A
[45]	LCSTS	Chinese	$\times$	Public^j	✓	$\times$	$\times$	$\times$	S	A
[46]	Mukhyansh	Indic languages	$\times$	Private	✓	$\times$	$\times$	$\times$	H	A
[47]	Varta	Indic, English	$\times$	Public^k	✓	$\times$	$\times$	$\times$	H	A
[48]	LCSTS [45], XSum [29]	Chinese, English	$\times$	Public^j,l	✓	$\times$	$\times$	$\times$	H	A
[43]	RIA, New York Times [34]	Russian, English	$\times$	Private (NYT)	✓	$\times$	$\times$	$\times$	H	A
[22]	Own dataset	Bengali	$\times$	Private	✓	$\times$	$\times$	$\times$	H	A
[49]	XL-Sum	Bengali+43 others	$\times$	Public^m	✓	$\times$	$\times$	$\times$	S	A
[50]	Potrika	Bengali	$\times$	Publicⁿ	✓	✓	$\times$	$\times$	H	A
[14]	BNAD	Bengali	✓	Public^o	✓	✓	$\times$	$\times$	H	A
[13]	Shironaam	Bengali	✓	Public^p	✓	✓	$\times$	$\times$	H	A
Ours	BeliN	Bengali	✓	Public^q	✓	✓	✓	✓	H	A
S = Summarization, H = Headline, A = Abstractive, E = Extractive

In high-resource languages like English and Chinese, a variety of datasets have supported significant advancements in summarization and headline generation research summarized in Table 1. Numerous datasets, including Newsroom [25], CNN Daily Mail [26], CNN Corpus [27], and CCSum [28], have been developed for summarization in English. While some of these datasets are publicly accessible, others are privately listed, as listed in Table 1. Headline generation, as a specific subset of abstractive summarization, has also benefited from datasets, such as XSum [29], NewSHead [30], PENS [31], New York Times [34], DUC [36], and Gigaword [37, 38], all designed for English. Most of these datasets utilize textual content to produce summaries or headlines, with two datasets, NewSHead [30] and PENS [31], explicitly incorporating category as a feature. While the majority of these works emphasize the abstractive approach, some also explore the extractive methodology.

Beyond English, several datasets have been developed for abstractive headline generation in other languages. These include SuDer [40] in Turkish, AFRIHG [41] in African, RIA [43] and Lenta [44] in Russian, and Mukhyansh [46] and Varta [47] in Indic languages. Additionally, LCSTS [45] has been created in Chinese for abstractive summarization. Notably, for non-English datasets, the focus has also primarily been on utilizing content alone for headline generation tasks.

In the Bengali language, several datasets have been developed for summarization and headline generation tasks. XL-Sum [49], a multilingual dataset covering 44 languages, includes Bengali and focuses on abstractive summarization using news content. Potrika [50] stands out as a substantial dataset offering a large collection of Bengali news samples with different categories, excluding religious news. BNAD [14] is another notable dataset in Bengali, which integrates category information with news articles for headline generation, though it includes only 1,275 religious news samples across its domains. Shironaam [13] is a large-scale dataset encompassing 13 news domains, including religious news, albeit with limited samples in this domain.

Among the previously mentioned research efforts, none of the non-Bengali datasets include the religious news domain. While some Bengali datasets, such as BNAD [14] and Shironaam [13], incorporate religious news along with categories, they lack subdivisions within the religious content, aspect, and sentiment of a news article. In our proposed dataset, BeliN, we address this gap by focusing on the less explored religious news domain. We have compiled religious news samples from various online newspapers, integrating additional features such as category, aspect, and sentiment. The dataset includes five religious categories: Islam, Hinduism, Christianity, Buddhism, and others. Furthermore, it captures four distinct aspects: religious reports, festivals, education, and culture. Additionally, we annotate the sentiment polarity of the news articles as positive, negative, or neutral to further enhance the dataset’s utility.

Numerous headline-generation systems have been developed utilizing various datasets, with the majority relying solely on content as input for generating headlines [32, 51, 41, 42, 33, 30, 38, 1, 39, 48, 34]. This content-only approach has also been widely adopted in Bengali, as seen in research on text summarization [49], and headline generation [22] that utilize custom datasets. In the news content-only approach, the absence of linguistic context (i.e., the surrounding language or text that provides clarification and deeper meaning beyond the literal interpretation of words [52]) along with the lack of additional guidance, often leads to limited headline diversity and challenges in evaluation due to reliance on a single ground truth [53]. To address this limitation, incorporating additional contextual features is crucial for generating more nuanced and linguistically informed headlines. While Shironaam [13] introduced a multi-input framework by incorporating category and image captions alongside content, our proposed MultiGen approach goes a step further by integrating contextual features such as aspect and sentiment, in addition to content and category. By leveraging sentiment to reflect the natural tone of the news, our method aims to generate more robust and contextually accurate headlines. The MultiGen approach demonstrates superior performance compared to the baseline content-only systems, highlighting its effectiveness over existing methods like Shironaam [13].

In summary, while headline generation has seen substantial progress in high-resource languages, research in low-resource languages like Bengali remains underexplored. This study bridges the gap by introducing BeliN, a specialized dataset for Bengali religious news, and MultiGen, a state-of-the-art approach tailored for this domain. Together, they lay the groundwork for broader advancements in low-resource language processing and headline generation.

3 The BeliN Corpus

This section provides a comprehensive overview of the BeliN corpus¹¹1https://github.com/akabircs/BeliN, a meticulously curated dataset designed to advance the task of Bengali news headline generation. The corpus development process, as illustrated in Figure 2, follows a structured approach that integrates raw data collection, labeling, and statistical analysis to ensure a high-quality and contextually enriched dataset. The process begins with sourcing raw data from diverse Bengali news websites and religious news portals to achieve a representative dataset. This is followed by a detailed labeling methodology, where additional metadata such as categories, sentiments, and aspects are assigned to enrich the data. These auxiliary features enhance the context for generating precise and meaningful headlines. The final stage involves a thorough statistical analysis to evaluate the dataset’s composition and suitability for training and evaluating generative models. Each of these steps is systematically discussed in the following subsections

3.1 Raw Data Collection

The raw data collection for the BeliN corpus was conducted with the objective of creating a robust dataset for Bengali news headline generation, specifically targeting religious news. Articles and corresponding headlines were manually gathered from a diverse set of Bengali news websites and religious news portals, ensuring the inclusion of high-quality and contextually relevant data. The sources for these articles are listed in Table 2, reflecting a wide coverage of topics and perspectives within the religious domain.

The manual collection approach was crucial in ensuring the integrity of the data, particularly in capturing nuanced contexts and maintaining linguistic authenticity. Unlike automated scraping techniques, this method allowed for the careful selection of articles and headlines that align with the focus of the BeliN corpus. This step laid the groundwork for the subsequent labeling process, where additional metadata was assigned to further enrich the dataset’s contextual depth.

Table 2: List of newspapers

Newspaper	URL
Prothom alo	https://www.prothomalo.com/religion
Kaler kantho	https://www.kalerkantho.com/online/Islamic-lifestylie
Bangladesh pratidin	https://www.bd-pratidin.com/islam
NayaDiganta	https://www.dailynayadiganta.com/diganta-islami-jobon/133
Jugantor	https://www.jugantor.com/all-news/islam-life
Daily Ittefaq	https://www.ittefaq.com.bd/religion
Samakal	https://samakal.com/search?q=religion
Dhaka Tribune	https://www.dhakatribune.com/topic/religion
Bhorer Kagoj	https://www.bhorerkagoj.com/religion
Jai Jai Din	https://www.jaijaidinbd.com/islam-and-religion
Alokito Bnagladesh	https://www.alokitoBengalidesh.com/islam
Daily Inqilab	https://dailyinqilab.com/islamic-world
Daily Vorer Pata	https://www.dailyvorerpata.com/cat.php?cd=293
Daily Khabar Patra	https://khoborpatrabd.com/?s=religion

3.2 Dataset Labeling

Each article in the BeliN corpus was manually labeled for accuracy and consistency, categorized by religious affiliation, aspect, and sentiment. A subset was further annotated with predefined aspects and corresponding sentiment polarity to aid headline generation. The dataset features a diverse collection of religious news articles from various Bengali sources, structured into five key columns, detailed as follows:

1.

Article: The full text of the news article.
2.

Headline: The original headline of the news article.
3.

Category: The religious affiliation of the news article - Islam, Hinduism, Christianity, Buddhism, Others.
4.
Aspect: This identifies the specific focus or theme of the article’s content, which can include categories such as religious reports, festivals, education, culture, or other related topics which can be one of the following:
1. (a)
  
  Religious Report: Religious reports typically encompass a range of religious discussions, including sacred events, and mythological and religious tales. These reports highlight significant news about religious communities, broader spiritual perspectives, and important religious philosophies. For example, “In Pakistan, a church was vandalized and set on fire. Two members of the community were arrested for blasphemy²²2https://www.prothomalo.com/world/pakistan/i3riizd976".
2. (b)
  
  Religious Festival: This section publishes news about religious festivals, ceremonies, and rituals. It includes significant news about various religious communities’ festivals, worship, and other religious practices. During festival times, news about events of different religious communities may also be featured in this section. For example, “Durga Puja is celebrated with grandeur in Abu Dhabi³³3https://www.bd-pratidin.com/probash-potro/2023/10/22/932700".
3. (c)
  
  Religious Education: This section focuses on news related to religious education and spiritual growth. It highlights updates from various educational institutions, religious schools, and policies on religious education. For example, “Those deeds by which one can attain paradise with the Prophet Muhammad (peace be upon him)⁴⁴4https://www.kalerkantho.com/online/Islamic-lifestylie/2023/10/25/1330109".
4. (d)
  
  Religious Culture: In this section, notable news about religious culture and religious characters are mentioned. Religious personalities, mythologies, and religious-cultural events can be particularly highlighted here. For example, “A 10-day Islamic book fair begins in Mymensingh⁵⁵5https://www.kalerkantho.com/online/Islamic-lifestylie/2023/10/05/1324183".
5.
Sentiment: This represents the sentiment of the article, which can be classified as:
1. (a)
  
  Positive: This sentiment label indicates that the content of the article expresses favorable, supportive, or optimistic views toward the subject matter. For example, “Faith grows through contemplation and research⁶⁶6https://www.kalerkantho.com/online/Islamic-lifestylie/2023/10/06/1324389".
2. (b)
  
  Negative: This sentiment label indicates that the content of the article conveys unfavorable, critical, or pessimistic views toward the subject matter. For example “The national wealth is at risk of self-destruction⁷⁷7https://www.kalerkantho.com/online/Islamic-lifestylie/2023/09/02/1314294".
3. (c)
  
  Neutral: This sentiment label indicates that the content of the article maintains an impartial, balanced, or indifferent stance, without showing strong positive or negative opinions. For example, “The rare coin of the Islamic era in Saudi Arabia⁸⁸8https://www.kalerkantho.com/online/Islamic-lifestylie/2023/09/10/1316757".

A sample of the dataset has been given in Table 3. The BeliN corpus captures the complexity of religious news, reflecting diverse aspects and sentiments within the articles.

[Uncaptioned image] — Table 3: Samples of the BeliN corpus

3.3 Dataset Statistics

This subsection provides a detailed statistical analysis of the BeliN corpus, highlighting its composition and diversity. The dataset, specifically curated for religious news, spans multiple categories, aspects, and sentiment polarities. Such granularity ensures the dataset’s utility for training and evaluating generative models, offering a rich context for generating headlines. Table 4 presents the statistics of the BeliN corpus, which encompasses religious news across different categories. It includes counts for five major categories. Each category is further analyzed based on four aspects, and sentiment counts (positive, negative, and neutral) are provided for each aspect. The table concludes with a total count of 2520 entries in the dataset, with individual category counts distributed accordingly.

Table 4: Descriptive statistics of the BeliN corpus

Category	Aspect				Sentiment			Total
Category	Report	Festival	Education	Culture	Positive	Negative	Neutral	Total
Islam	860	68	890	183	1457	299	245	2001
Hinduism	135	67	16	24	128	58	56	242
Christianity	7	12	7	2	19	5	4	28
Buddhism	12	13	1	3	25	3	1	29
Others	190	1	16	13	88	90	42	220
Total	1204	161	930	225	1717	455	348	2520

Table 5 compares the features of Shironaam and BeliN in the religious domain. BeliN includes additional features like aspect and sentiment, while Shironaam includes topic words and image captions. BeliN also has a larger number of samples (2520 news) compared to Shironaam (294 news).

Table 5: Feature comparison of the BeliN dataset and Shironaam.

Features	Shironaam [13]	BeliN (this study)
Article	✓	✓
Headline	✓	✓
Category	✓	✓
Aspect	×	✓
Sentiment	×	✓
Topic words	✓	×
Image caption	✓	×
Total Samples	294*	2520
* Religious news

Table 6 shows the quantitative statistics of Shironaam and BeliN based on the average number of words, sentences, and vocabulary size. The BeliN dataset demonstrates strong novelty in its n-grams, with 4.42% novel unigrams, 21.48% novel bigrams, 42.10% novel trigrams, and 56.47% novel 4-grams, as shown in Table 7. These results highlight the distinctiveness of headlines in comparison to the articles. While Shironaam shows slightly higher percentages of novel n-grams, BeliN still provides valuable insights into Bengali religious news, showcasing significant diversity in language use. This makes BeliN a valuable resource for research in Bengali language processing. The figures presented illustrate the distribution of article and headline lengths in the dataset. Figure 3(a) shows the frequency of article lengths, measured in words, revealing the common word counts for articles. Figure 3(b) depicts the frequency of headline lengths, also measured in words, providing insight into the typical brevity or elaboration of headlines compared to the full articles. Together, these figures offer a visual representation of the structure and variation in article and headline lengths within the dataset.

Table 6: Comparison of the BeliN and Shironaam datasets based on average words, average sentences, and vocabulary size.

Dataset	Article			Headline
Dataset	Avg. words	Avg. sentences	Vocabulary	Avg. words	Avg. sentences	Vocabulary
Shironaam* [13]	943.43	7.55	3,497	13.03	1.02	416
BeliN (this study)	1001.18	32.75	9,750	17.13	1.06	1,410
* for religious news only

Table 7: Percentage of n-grams in the BeliN and Shironaam datasets.

Dataset	Unigram	Bigram	Trigram	4-gram
Shironaam* [13]	4.50%	22.48%	45.19%	60.22%
BeliN (this study)	4.42%	21.48%	42.10%	56.47%
* for religious news only

The rich contextual information embedded in the BeliN corpus has significant potential for various natural language processing (NLP) tasks beyond headline generation. These include text generation, topic modeling, news categorization, news headline sentiment analysis within the realm of religious news. By leveraging the detailed annotations and multi-aspect nature of the dataset, researchers and developers can create more sophisticated and human-like AI systems capable of understanding and generating content with high contextual awareness.

4 The MultiGen Approach

The traditional news content-only approach to headline generation faces several challenges. Relying solely on the news content often results in a lack of linguistic context and guidance, limiting headline diversity and creating evaluation difficulties due to dependence on a single ground-truth reference [53]. In this approach, the model is designed to take solely news content as input to generate headlines, which may not capture the full spectrum of possible interpretations or nuances of the article. Additionally, without the inclusion of contextual features, generated headlines may fail to align with the emotional tone or thematic aspects of the content, diminishing their relevance and overall quality.

To overcome these limitations, the MultiGen approach introduces a multi-input framework for Bengali news headline generation. Unlike the conventional approach that relies exclusively on news content, MultiGen integrates additional contextual features such as aspect, category, and sentiment. These features provide a more comprehensive understanding of the article, allowing the model to generate headlines that are contextually relevant and linguistically informed. Aspect and category help the model focus on specific narrative elements, while sentiment captures the emotional tone, ensuring that the generated headlines are better aligned with the article’s mood and message. This enriched approach improves both the diversity and quality of the generated headlines. Incorporating contextual features has also proven effective in other similar NLP tasks, including text classification [54], information retrieval [55], and sentiment analysis [56, 57].

By incorporating these additional features, MultiGen enhances the model’s ability to understand the nuances of the article, producing headlines that are both informative and contextually aligned with the content’s emotional tone. The overall framework of MultiGen, as illustrated in Figure 4, showcases the seamless integration of these diverse inputs within an encoder-decoder architecture, enabling more accurate and contextually aware headline generation.

The remainder of this section provides a detailed description of the MultiGen approach, starting with the preprocessing steps, followed by the fusion of multiple inputs, and concluding with the encoder-decoder architecture used to achieve enhanced headline generation performance.

4.1 Preprocessing

Preprocessing is a critical phase that prepares raw text data for input into generative models, ensuring the data is clean, consistent, and ready for effective processing. It involves two main tasks:

1.

Text Normalization: This step uses the BUET normalizer [58] to standardize characters with Unicode NFKC. Non-textual elements like URLs and emojis are removed, excessive whitespace is managed, and redundant punctuation characters are reduced.
2.

Input Processing: Text is formatted for model training by adding task-specific prefixes, such as “Summarize the Article as Headlines,” to guide the model. Appropriate tokenizers, like AutoTokenizer for BanglaT5 and mBART, are used to process articles. The text is truncated to 512 tokens for input and 64 tokens for headlines, ensuring computational efficiency. Finally, tokenized inputs and labels are combined into a dictionary for training, enhancing coherence and output quality.

4.2 Fusing Article with Category, Aspect, and Sentiment

The proposed multi-input approach enhances Bengali news headline generation by integrating multiple contextual signals—article ( $\mathbf{A}$ ), category ( $\mathbf{C}$ ), aspect ( $\mathbf{P}$ ), and sentiment ( $\mathbf{S}$ )—into a unified input sequence. By leveraging this enriched input, the model benefits from a broader contextual understanding, enabling it to generate more precise and contextually aligned headlines. This approach constructs the input sequence by fusing these components with a [SEP] token, ensuring a clear distinction between different elements while preserving their individual contributions. In contrast to the baseline approach, which relies solely on article content, the proposed fusion approach incorporates additional contextual elements, significantly enhancing the model’s capacity to produce contextually nuanced headlines. Specifically, the input sequence $\mathbf{I}$ is defined as:

\mathbf{I}=[A_{1},\ldots,A_{n},\text{[SEP]},C_{1},\ldots,C_{k},\text{[SEP]},P_{1},\ldots,P_{m},\text{[SEP]},S_{1},\ldots,S_{p}]

This fusion strategy, referred to as MultiGen, is designed to enhance the quality of headline generation by tailoring the outputs to specific categories, aspects, and sentiment polarities. By capturing richer contextual information, this approach generates headlines that resonate more closely with the article’s intent and emotional tone, ultimately delivering a more engaging and context-aware reader experience.

4.3 Encoder-Decoder Architecture

The T5 (Text-to-Text Transfer Transformer) model employs an encoder-decoder architecture specifically designed for sequence-to-sequence tasks. This architecture consists of two primary components: the encoder and the decoder, both built using transformer layers. The encoder processes the input sequence and converts it into fixed-length continuous hidden representations ( $\mathbf{Z}$ ) that encapsulate both semantic and syntactic features. The decoder then utilizes these hidden representations to generate the output sequence by predicting tokens iteratively, conditioned on the encoder’s output and previously generated tokens.

Encoder

The encoder comprises a stack of transformer layers, each including multi-head self-attention mechanisms and feedforward neural networks. Given an input sequence, the encoder first embeds the tokens into continuous vector representations and applies positional encodings to capture the order of tokens. Successive transformer layers refine these representations by attending to various parts of the input, capturing both local and global dependencies. Mathematically, the encoder maps the input sequence ( $\mathbf{I}$ ) into hidden states ( $\mathbf{Z}$ ) as follows:

\mathbf{Z}=f_{\theta_{\text{enc}}}(\mathbf{I})

Decoder

The decoder also consists of transformer layers and incorporates a cross-attention mechanism to attend to the encoder’s output representations. During generation, the decoder predicts each token sequentially, leveraging the hidden states from the encoder and previously generated tokens. The output probabilities for the next token are computed using a softmax layer over the vocabulary. Formally, the decoder generates the output sequence ( $\mathbf{H}$ ) conditioned on the encoder’s hidden states ( $\mathbf{Z}$ ):

p_{\theta_{\text{dec}}}(\mathbf{H}\mid\mathbf{Z})=p_{\theta_{\text{dec}}}(\mathbf{H}\mid f_{\theta_{\text{enc}}}(\mathbf{I}))

In the baseline approach, the encoder processes only the news content ( $\mathbf{A}$ ) to generate hidden states, which the decoder uses to produce the headline ( $\mathbf{H}$ ):

\mathbf{Z}=f_{\theta_{\text{enc}}}(\mathbf{A})

p_{\theta_{\text{dec}}}(\mathbf{H}\mid\mathbf{Z})=p_{\theta_{\text{dec}}}(\mathbf{H}\mid f_{\theta_{\text{enc}}}(\mathbf{A}))

In the proposed approach, the input sequence ( $\mathbf{I}$ ) includes additional contextual information—category ( $\mathbf{C}$ ), aspect ( $\mathbf{P}$ ), and sentiment ( $\mathbf{S}$ )—concatenated with the news content using [SEP] tokens:

\mathbf{I}=[A_{1},\ldots,A_{n},C_{1},\ldots,C_{k},P_{1},\ldots,P_{m},S_{1},\ldots,S_{p}]

The encoder transforms this enriched input into hidden representations ( $\mathbf{Z}$ ), which the decoder leverages to generate contextually enriched headlines:

\mathbf{Z}=f_{\theta_{\text{enc}}}(\mathbf{I})

p_{\theta_{\text{dec}}}(\mathbf{H}\mid\mathbf{Z})=p_{\theta_{\text{dec}}}(\mathbf{H}\mid f_{\theta_{\text{enc}}}(\mathbf{I}))

This architecture’s ability to incorporate auxiliary information into the input sequence enables it to generate more accurate and contextually relevant headlines, demonstrating the effectiveness of the proposed enhancements.

5 Experimental Evaluation

5.1 Experimental Settings

For our Bengali religious news headline generation task, we employed a combination of hardware and software resources to ensure efficient model training and evaluation. Hardware resources included Google Colab Pro with an NVIDIA A100 GPU, Kaggle’s environment with NVIDIA T4 $\times$ 2 GPUs, and a local machine equipped with an NVIDIA T4 GPU to maximize computational efficiency. On the software side, the project was developed on a Windows 11 machine with a 1TB HDD and 512GB SSD. We utilized TensorFlow and PyTorch for deep learning, NLTK for text preprocessing, and Hugging Face’s Transformers library for implementing encoder-decoder Transformer architectures. The dataset was split into training (1870 samples, 74%), validation (150 samples, 6%), and testing (500 samples, 20%) subsets, allowing for thorough model training, hyperparameter tuning, and performance evaluation. This setup provided a solid foundation for the successful development and assessment of the headline generation models.

5.2 Evaluation Metrics

To evaluate the performance of our developed system, we utilized several evaluation metrics. These metrics include BLEU, ROUGE-1, ROUGE-2, ROUGE-L, BERTScore and METEOR.

ROUGE (Recall-Oriented Understudy for Gisting Evaluation)

This metric is a widely used family of metrics for evaluating natural language processing tasks, particularly text summarization [59]. It measures the similarity between model-generated summaries and reference summaries based on n-gram overlap and the longest common subsequence (LCS). Three key ROUGE metrics are commonly employed: ROUGE-1 evaluates the overlap of unigrams (individual words) between the generated and reference summaries, calculating Precision and Recall. ROUGE-2 extends this to bi-grams (pairs of consecutive words), measuring the degree of structural similarity between the summaries. ROUGE-L, on the other hand, focuses on the longest common subsequence (LCS), which captures the longest sequence of words that appears in both summaries, regardless of order, providing a measure of structural alignment. Together, these ROUGE metrics offer a comprehensive view of content and structural similarity between generated and reference summaries, helping assess the effectiveness of the summarization model.

BLEU (Bilingual Evaluation Understudy)

BLEU [60] is a widely used metric for evaluating the quality of machine-translated text. It measures the similarity between model-generated and human-generated reference translations based on n-gram precision. BLEU is calculated by comparing the n-grams (sequences of n words) generated by the model to the n-grams in the reference translations. The BLEU score ranges from 0 to 1, with higher scores indicating better translation quality. The BLEU score can be expressed mathematically as:

\text{BLEU}=\text{BP}\times\exp\left(\sum_{n=1}^{N}\frac{1}{N}\log\left(\text{precision}_{n}\right)\right)

(1)

where BP is the brevity penalty to account for shorter translations, and $\text{precision}_{n}$ is the modified precision for n-grams of size $n$ . The brevity penalty BP is calculated as:

\text{BP}=\begin{cases}1&\text{if }c>r\\ \exp(1-\frac{r}{c})&\text{if }c\leq r\end{cases}

(2)

where $c$ is the length of the model output and $r$ is the effective reference length, which is the length of the reference translation closest to the length of the model output. BLEU is a valuable metric for evaluating the overall quality of machine-generated translations, but it has some limitations, particularly in capturing semantic similarity and fluency. It is best used in conjunction with other evaluation metrics for a comprehensive assessment of translation quality.

METEOR (Metric for Evaluation of Translation with Explicit ORdering)

This metric [61] is designed to evaluate machine translation by considering synonyms, stemming, and paraphrasing. It calculates precision and recall based on alignments between the generated and reference texts. The METEOR score is the harmonic mean of precision and recall, adjusted by a penalty for fragmentation. Mathematically, the METEOR score is given by:

F_{\text{mean}}=\frac{10\cdot P\cdot R}{R+9\cdot P}

(3)

\text{METEOR}=F_{\text{mean}}\times(1-\text{penalty})

(4)

BERTScore

It evaluates the quality of text generation using contextual embeddings [62] from pre-trained BERT models. It calculates the cosine similarity between token embeddings of the generated and reference texts. BERTScore includes precision, recall, and F1-score based on these similarities. Mathematically, BERTScore is defined as:

\text{Precision}=\frac{1}{|X|}\sum_{x\in X}\max_{y\in Y}\text{cosine}(x,y)

(5)

\text{Recall}=\frac{1}{|Y|}\sum_{y\in Y}\max_{x\in X}\text{cosine}(y,x)

(6)

\text{F1-score}=2\times\frac{\text{Precision}\times\text{Recall}}{\text{Precision}+\text{Recall}}

(7)

where $X$ and $Y$ represent the sets of token embeddings for the model output and reference translations, respectively.

5.3 Pre-trained Language Models

The pre-trained language models employed for this task include both T5-based and BART-based models. These models are fine-tuned to transform Bengali news articles into concise, informative headlines. The T5-based models include BanglaT5, mT5, and mT0, while mBART represents the BART-based model. Detailed information about these models is provided in Table 8.

1.

T5-based models: BanglaT5, mT5, and mT0 are based on the T5 architecture [63], which frames all NLP tasks as text-to-text problems. The T5 model uses a sequence-to-sequence framework where the encoder processes the input text and the decoder generates the output text. This approach allows for flexibility in handling various NLP tasks, such as translation, summarization, and text generation, by converting them into a text-to-text format.
2.

BART-based model: mBART is based on the BART architecture [64], which utilizes a denoising autoencoder approach for pre-training. This model follows a sequence-to-sequence framework similar to T5 but incorporates a denoising objective during pre-training, where parts of the input are corrupted and the model learns to reconstruct the original text. This pre-training strategy helps the model become robust to noise and enhances its ability to generate coherent and contextually relevant text.

Table 8: Details of the pre-trained language models used.

Model	Hugging Face link	Parameters	Pretrained on
Bangala-T5 [65]	https://huggingface.co/csebuetnlp/BanglaT5	247M	Bengali2B+
mT0-base [66]	https://huggingface.co/bigscience/mt0-base	582M	mC4
mT5-Base [67]	https://huggingface.co/google/mt5-base	582M	mC4
mBART-50 [68]	https://huggingface.co/facebook/mbart-large-50	610M	CC25

5.4 Hyper-parameter Tuning

Hyperparameter tuning is a critical step in optimizing the performance of generative models for headline generation. In this study, we experimented with various hyperparameter configurations for Bangla-T5, mBART, mT5, and mT0 to achieve the best results. Table 9 summarizes the hyperparameter search space and the selected configurations for each model.

Table 9: The hyperparameter search space used in tuning and the selected optimal hyperparameters for each model.

Hyper-parameters	Hyper-parameter Space	Bangla-T5	mBART	mT5	mT0
Learning Rate	$2\times 10^{-5}$ , $1\times 10^{-4}$ , $1\times 10^{-3}$	$1\times 10^{-4}$	$1\times 10^{-3}$	$1\times 10^{-4}$	$2\times 10^{-5}$
Epochs	3–10	5	5	5	5
Batch Size	4, 8	8	8	8	8
Input Token Length	512, 1024	512	512	512	512
Target Token Length	16, 32, 64, 128	64	64	64	64

The selection of appropriate hyperparameters directly impacts the performance and efficiency of the models. The learning rate, for instance, was found to be a key factor in stabilizing training. While Bangla-T5 and mT5 performed optimally with a learning rate of $1\times 10^{-4}$ , mBART required a higher rate of $1\times 10^{-3}$ , and mT0 performed best with $2\times 10^{-5}$ . A batch size of 8 was chosen to balance computational requirements and model convergence, ensuring stable training. Input and target token lengths were also crucial for effectively handling the variability in article lengths and headline requirements. Setting the input token length to 512 ensured that the models could process sufficiently detailed content, while a target token length of 64 allowed the generation of concise and precise headlines without truncation. The number of epochs, fixed at 5, provided a balance between overfitting and undertraining for all models. These carefully chosen hyperparameters enabled the models to achieve strong performance in generating contextually relevant and coherent headlines, underscoring the importance of systematic hyperparameter tuning.

5.5 Results

This part evaluates and analyzes the performance of the headline generation models developed in this research. We have provided a comprehensive overview of the evaluation metrics employed, including ROUGE, BLEU, METEOR, and BERTScore. These metrics are widely used in natural language processing tasks to assess the quality of generated text. State of the Art (SOTA) analysis is a critical component of model evaluation, as it establishes a benchmark against which the performance of proposed models can be compared. By analyzing improvements in performance metrics relative to existing methods, researchers can demonstrate the effectiveness and advancements offered by their proposed approaches. The performance of the developed models was evaluated using a combination of these metrics. Table 10 reports the results of our models, comparing the proposed approaches against their respective baseline models, and providing insights into the SOTA improvements.

Table 10: Performance comparison of the proposed MultiGen approach and the baseline across various transformer-based pre-trained models.

Model	Approach	BLEU	ROUGE-1	ROUGE-2	ROUGE-L	BERTScore	METEOR
mT5	Baseline	10.31	13.47	4.22	13.03	69.34	9.80
mT5	Proposed	11.66	17.54	5.68	16.85	71.74	10.86
	$\Delta$ SOTA	+13.1%	+30.4%	+34.6%	+29.1%	+3.5%	+10.8%
mT0	Baseline	12.08	18.84	7.10	17.95	70.34	13.90
mT0	Proposed	13.13	22.94	7.94	21.48	72.62	14.40
	$\Delta$ SOTA	+8.7%	+21.2%	+11.8%	+19.0%	+3.2%	+3.6%
mBART	Baseline	15.23	23.01	7.90	21.88	73.21	13.12
mBART	Proposed	16.58	24.36	7.78	22.63	74.63	14.60
	$\Delta$ SOTA	+8.8%	+5.9%	-1.5%	+3.4%	+1.9%	+11.3%
BanglaT5	Baseline	16.08	22.84	7.97	23.08	73.57	15.40
BanglaT5	Proposed	18.61	26.70	10.60	24.19	75.12	16.65
	$\Delta$ SOTA	+15.7%	+17.0%	+33.0%	+4.8%	+2.1%	+8.1%

The proposed approach consistently outperforms the baseline across all models in terms of BLEU, ROUGE, BERTScore, and METEOR scores. The BLEU scores indicate a notable enhancement in fluency and coherence of the generated headlines, with the proposed models achieving significant improvements over their baselines. For instance, the mT5 model exhibits a BLEU score increase of 13.1%, while BanglaT5 shows a remarkable improvement of 15.7%, emphasizing the effectiveness of the proposed methods in producing high-quality outputs. The ROUGE scores further demonstrate the superiority of the proposed approach, revealing higher precision and recall compared to the baseline models. This reflects an improved relevance and informativeness in the generated headlines. For example, mT0’s proposed method achieves a ROUGE-1 score that is 21.2% better than its baseline, indicating that it captures more relevant content. Additionally, BERTScore and METEOR metrics underscore the semantic accuracy and contextual relevance of the generated headlines. The improvements in these scores highlight the effectiveness of the proposed models in understanding and generating text that aligns well with human expectations. Among all the models, BanglaT5 stands out as the top performer in the proposed approach, achieving the highest scores across all metrics. The substantial improvements in BLEU, ROUGE, BERTScore, and METEOR suggest that BanglaT5 effectively leverages additional contextual information, such as aspect categories and sentiment analysis, to generate headlines that are not only accurate and informative but also contextually relevant. Overall, the findings from this research indicate that the proposed models significantly enhance the quality of Bengali news headline generation, offering promising directions for future research and development in this area.

6 Discussion

6.1 Analyzing Generated Headlines

Table 11 illustrates sample-generated headlines, showcasing the impact of our MultiGen approach on generating high-quality, contextually relevant headlines for Bengali news articles. The proposed approach demonstrates significant improvements over the baseline, particularly in preserving the essence of the articles while ensuring linguistic fluency and contextual accuracy. The generated headlines from both the baseline and the proposed MultiGen approach were compared against the reference headlines to evaluate their quality and contextual relevance. The analysis reveals notable differences in the performance of the two models, particularly in their ability to align with the reference headlines.

The baseline model, while capable of generating coherent headlines, often fails to capture the nuanced meaning or context of the input article. For instance, in sample #1, the reference headline focuses on the etiquette of reciting the Qur’an. The baseline model, however, generates a different headline highlighting the virtues of the Qur’an, which does not address the context intended in the reference. In contrast, the proposed MultiGen approach successfully generates a headline emphasizing the manners of recitation, aligning more closely with the reference.

Similarly, in sample #2, the baseline approach generated headline incorrectly attributes lying as a characteristic of Islam, reflecting a lack of semantic understanding. Conversely, the proposed approach accurately conveys the importance of truthfulness, capturing the core message of the article and adhering more closely to the reference.

For sample #3, the reference headline highlights poverty alleviation through Zakat. While the baseline approach generates a headline related to wealth distribution in Islam, it does not focus on the specific theme of poverty alleviation. The MultiGen approach, however, generates a headline that aligns with the reference and integrates the article’s broader social and economic implications.

Sample #4 further illustrates the shortcomings of the baseline approach, which generates a headline that fails to capture the critical detail that today is Christmas Day. In contrast, the proposed approach successfully incorporates this temporal context, producing a headline that accurately conveys the date-related information and aligns closely with the reference.

Overall, the MultiGen approach consistently outperforms the baseline model by generating headlines that are contextually and temporally accurate, semantically rich, and closely aligned with the reference. This demonstrates the efficacy of incorporating additional contextual information such as category, aspect, and sentiment, enabling the proposed model to better understand and reflect the underlying themes of the input articles.

6.2 Findings and Observations

This section highlights key insights from the results presented in Table 10, with a focus on error analysis to identify areas where the headline generation models may have performed sub-optimally. Below are the primary observations:

Low BLEU Scores

The mT5 model, under the baseline approach, demonstrated relatively low BLEU scores compared to other models. This suggests that the generated headlines often lacked fluency or coherence, resulting in lower n-gram overlaps with the reference headlines.

Variability in ROUGE Scores

While the proposed approach generally outperformed the baseline across all models, variability in ROUGE scores was observed. For instance, the ROUGE-2 scores for mT5 and mBART under the proposed approach were slightly lower than other models, indicating difficulties in capturing bi-gram similarities effectively.

Performance Discrepancies

BanglaT5 consistently exhibited superior performance, particularly in terms of ROUGE scores, highlighting its ability to generate headlines that closely align with reference headlines. Conversely, mT0’s relatively lower BLEU and ROUGE scores suggest room for improvement in fluency and relevance.

Impact of Additional Context

The improved performance of the proposed approach, which leverages additional contextual features such as aspect categories and sentiment, underscores the significance of incorporating auxiliary inputs for headline generation. This approach facilitates a deeper understanding of article contexts, leading to more coherent and informative outputs.

Room for Improvement

Despite the promising results, there is potential for further improvement in headline generation. Refining model architectures, optimizing hyperparameters, and enhancing preprocessing techniques could mitigate observed shortcomings and elevate the quality of generated headlines.

6.3 Limitations and Future Work

Despite the encouraging results, the study encountered several limitations. A significant challenge was the scarcity of high-quality annotated datasets for Bengali, which constrained model training and evaluation, potentially limiting the generalizability of results. Hardware limitations also restricted fine-tuning large-scale generative models, impacting training efficiency and performance. Furthermore, the complexity of Bengali morphology and syntax introduced additional challenges, occasionally resulting in inaccuracies in the generated headlines [69].

Future work could address these limitations through innovative data augmentation techniques, exploration of advanced model architectures, and domain-specific customization. Expanding the dataset to include a wider variety of domains and incorporating user feedback in training loops could further enhance the models. Additionally, leveraging LLMs for real-time and multilingual headline generation could broaden the applicability and effectiveness of the approach [24]. This study provides a solid foundation for advancing headline generation in Bengali, offering valuable insights for future research in natural language processing and text generation.

7 Conclusion

This research work has explored the potential of a contextual multi-input feature fusion approach, using various generative models for religious news headline generation, with a particular focus on the Bengali language. Central to this work is the introduction of the novel BeliN corpus, a curated dataset of Bengali religious news articles and corresponding headlines. This dataset addresses the scarcity of resources for Bengali and serves as a foundational contribution to advancing natural language processing for low-resource languages. We have implemented and evaluated state-of-the-art pre-trained models, including mT5, mT0, mBART, and BanglaT5, within the proposed MultiGen approach, incorporating additional contextual information such as aspect, category, and sentiment analysis. Rigorous experimentation and detailed analysis demonstrate that the proposed approach significantly outperforms traditional baseline methods, achieving higher accuracy, coherence, and contextual relevance in generated headlines.

The findings underscore the importance of integrating contextual features in headline generation and highlight the efficacy of the BeliN corpus in enabling this advancement. This research contributes to natural language processing and offers practical insights for developing sophisticated text summarization systems in underrepresented languages, thereby promoting linguistic inclusivity and cross-cultural communication.

Data availability

Data and code used in this study are publicly available at https://github.com/akabircs/BeliN.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

We want to thank Sadia Tasnim (Begum Rokeya University, Rangpur) for their assistance with collecting and annotating the dataset.

References

Cai et al. [2023] P. Cai, K. Song, S. Cho, H. Wang, X. Wang, H. Yu, F. Liu, D. Yu, Generating user-engaging news headlines, in: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023, pp. 3265–3280. doi:10.18653/v1/2023.acl-long.183.
De Francisci Morales et al. [2012] G. De Francisci Morales, A. Gionis, C. Lucchese, From chatter to headlines: harnessing the real-time web for personalized news recommendation, in: Proceedings of the fifth ACM international conference on Web search and data mining, 2012, pp. 153–162. doi:10.1145/2124295.2124315.
Koh et al. [2022] H. Y. Koh, J. Ju, M. Liu, S. Pan, An empirical survey on long document summarization: Datasets, models, and metrics, ACM computing surveys 55 (2022) 1–35. doi:10.1145/3545176.
Rao et al. [2024] A. Rao, S. Aithal, S. Singh, Single-document abstractive text summarization: A systematic literature review, ACM Comput. Surv. 57 (2024). doi:10.1145/3700639.
Banerjee et al. [2023] S. Banerjee, S. Mukherjee, S. Bandyopadhyay, P. Pakray, An extract-then-abstract based method to generate disaster-news headlines using a dnn extractor followed by a transformer abstractor, Information Processing & Management 60 (2023) 103291. doi:10.1016/j.ipm.2023.103291.
Giarelis et al. [2023] N. Giarelis, C. Mastrokostas, N. Karacapilidis, Abstractive vs. extractive summarization: An experimental review, Applied Sciences 13 (2023). doi:10.3390/app13137620.
Alomari et al. [2022] A. Alomari, N. Idris, A. Q. M. Sabri, I. Alsmadi, Deep reinforcement and transfer learning for abstractive text summarization: A review, Computer Speech & Language 71 (2022) 101276. doi:10.1016/j.csl.2021.101276.
El-Kassas et al. [2021] W. S. El-Kassas, C. R. Salama, A. A. Rafea, H. K. Mohamed, Automatic text summarization: A comprehensive survey, Expert systems with applications 165 (2021) 113679. doi:10.1016/j.eswa.2020.113679.
Ahuir et al. [2024] V. Ahuir, J.-A. Gonzalez, L.-F. Hurtado, E. Segarra, Abstractive summarizers become emotional on news summarization, Applied Sciences 14 (2024). doi:10.3390/app14020713.
Ayana et al. [2017] Ayana, S.-Q. Shen, Y.-K. Lin, C.-C. Tu, Y. Zhao, Z.-Y. Liu, M.-S. Sun, Recent advances on neural headline generation, Journal of Computer Science and Technology 32 (2017) 768–784. doi:10.1007/s11390-017-1758-3.
Hagar and Diakopoulos [2019] N. Hagar, N. Diakopoulos, Optimizing content with a/b headline testing: Changing newsroom practices, Media and Communication 7 (2019) 117–127.
Banerjee and Urminsky [2024] A. Banerjee, O. Urminsky, The language that drives engagement: A systematic large-scale analysis of headline experiments, Marketing Science (2024).
Akash et al. [2023] A. U. Akash, M. T. Nayeem, F. T. Shohan, T. Islam, Shironaam: Bengali news headline generation using auxiliary information, in: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, Dubrovnik, Croatia, 2023, pp. 52–67. URL: https://aclanthology.org/2023.eacl-main.4.
Saad et al. [2024] A. M. Saad, U. N. Mahi, M. S. Salim, S. I. Hossain, Bangla news article dataset, Data in Brief 57 (2024) 110874. URL: 10.1016/j.dib.2024.110874.
Eberhard et al. [2024] D. M. Eberhard, G. F. Simons, C. D. Fennig, Ethnologue: Languages of the world. twenty-seventh edition., https://www.ethnologue.com/, 2024. [last accessed 30 December 2024].
Shaibani and Elnagar [2024] A. Y. Shaibani, A. M. Elnagar, A survey of text summarization and headline generation methods in arabic: A survey of text summarization and headline generation methods in arabic, in: Proceedings of the 2024 9th International Conference on Machine Learning Technologies, ICMLT ’24, Association for Computing Machinery, New York, NY, USA, 2024, p. 317–323. doi:10.1145/3674029.3674078.
Zeyad and Biradar [2024] A. M. A. Zeyad, A. Biradar, Advancements in the efficacy of flan-t5 for abstractive text summarization: A multi-dataset evaluation using rouge and bertscore, 2024 International Conference on Advancements in Power, Communication and Intelligent Systems (APCI) (2024) 1–5. URL: https://api.semanticscholar.org/CorpusID:271748003.
Yadav et al. [2023] A. K. Yadav, Ranvijay, R. S. Yadav, A. K. Maurya, State-of-the-art approach to extractive text summarization: a comprehensive review, Multimedia Tools and Applications 82 (2023) 29135–29197. doi:10.1007/s11042-023-14613-9.
Bharathi Mohan et al. [2023] G. Bharathi Mohan, R. Prasanna Kumar, S. Parathasarathy, S. Aravind, K. B. Hanish, G. Pavithria, Text Summarization for Big Data Analytics: A Comprehensive Review of GPT 2 and BERT Approaches, Springer Nature Switzerland, Cham, 2023, pp. 247–264. doi:10.1007/978-3-031-33808-3_14.
Cajueiro et al. [2023] D. O. Cajueiro, A. G. Nery, I. Tavares, M. K. D. Melo, S. A. dos Reis, L. Weigang, V. R. R. Celestino, A comprehensive review of automatic text summarization techniques: method, data, evaluation and coding, 2023. arXiv:2301.03403.
Liu et al. [2018] T. Liu, H. Li, J. Zhu, J. Zhang, C. Zong, Review headline generation with user embedding, in: M. Sun, T. Liu, X. Wang, Z. Liu, Y. Liu (Eds.), Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, Springer International Publishing, Cham, 2018, pp. 324–334. doi:10.1007/978-3-030-01716-3_27.
Salehin et al. [2019] M. Salehin, A. Rafat, F. Khan, S. Abujar, Generating bengali news headlines: An attentive approach with sequence-to-sequence networks, in: Proceedings of the 8th International Conference System Modeling and Advancement in Research Trends (SMART), 2019, pp. 256–261. doi:10.1109/SMART46866.2019.9117554.
Hayat et al. [2023] S. M. A. I. Hayat, A. Das, M. Hoque, Abstractive bengali text summarization using transformer-based learning, in: 6th International Conference on Electrical Information and Communication Technology (EICT), 2023, pp. 1–6. doi:10.1109/EICT61409.2023.10427906.
Kabir et al. [2024] M. Kabir, M. S. Islam, M. T. R. Laskar, M. T. Nayeem, M. S. Bari, E. Hoque, BenLLM-eval: A comprehensive evaluation into the potentials and pitfalls of large language models on Bengali NLP, in: N. Calzolari, M.-Y. Kan, V. Hoste, A. Lenci, S. Sakti, N. Xue (Eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), ELRA and ICCL, Torino, Italia, 2024, pp. 2238–2252. URL: https://aclanthology.org/2024.lrec-main.201.
Grusky et al. [2018] M. Grusky, M. Naaman, Y. Artzi, Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies, in: M. Walker, H. Ji, A. Stent (Eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), Association for Computational Linguistics, New Orleans, Louisiana, 2018, pp. 708–719. doi:10.18653/v1/N18-1065.
Nallapati et al. [2016] R. Nallapati, B. Zhou, C. N. dos santos, C. Gulcehre, B. Xiang, Abstractive text summarization using sequence-to-sequence rnns and beyond, in: Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, 2016, pp. 280–290. doi:10.18653/v1/K16-1028.
Lins et al. [2019] R. D. Lins, H. Oliveira, L. Cabral, J. Batista, B. Tenorio, R. Ferreira, R. Lima, G. de França Pereira e Silva, S. J. Simske, The cnn-corpus: A large textual corpus for single-document extractive summarization, in: Proceedings of the ACM Symposium on Document Engineering 2019, DocEng ’19, Association for Computing Machinery, New York, NY, USA, 2019, pp. 1–10. doi:10.1145/3342558.3345388.
Jiang and Dreyer [2024] X. Jiang, M. Dreyer, CCSum: A large-scale and high-quality dataset for abstractive news summarization, in: K. Duh, H. Gomez, S. Bethard (Eds.), Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Association for Computational Linguistics, Mexico City, Mexico, 2024, pp. 7306–7336. doi:10.18653/v1/2024.naacl-long.406.
Narayan et al. [2018] S. Narayan, S. B. Cohen, M. Lapata, Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization, in: E. Riloff, D. Chiang, J. Hockenmaier, J. Tsujii (Eds.), Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Brussels, Belgium, 2018, pp. 1797–1807. doi:10.18653/v1/D18-1206.
Gu et al. [2020] X. Gu, Y. Mao, J. Han, J. Liu, Y. Wu, C. Yu, D. Finnie, H. Yu, J. Zhai, N. Zukoski, Generating representative headlines for news stories, in: Proceedings of The Web Conference 2020, WWW ’20, Association for Computing Machinery, New York, NY, USA, 2020, p. 1773–1784. doi:10.1145/3366423.3380247.
Ao et al. [2021] X. Ao, X. Wang, L. Luo, Y. Qiao, Q. He, X. Xie, PENS: A dataset and generic framework for personalized news headline generation, in: C. Zong, F. Xia, W. Li, R. Navigli (Eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Association for Computational Linguistics, Online, 2021, pp. 82–92. doi:10.18653/v1/2021.acl-long.7.
Ao et al. [2023] X. Ao, L. Luo, X. Wang, Z. Yang, J.-H. Chen, Y. Qiao, Q. He, X. Xie, Put your voice on stage: Personalized headline generation for news articles, ACM Transactions on Knowledge Discovery from Data 18 (2023). doi:10.1145/3629168.
Jin et al. [2020] D. Jin, Z. Jin, J. T. Zhou, L. Orii, P. Szolovits, Hooks in the headline: Learning to generate headlines with controlled styles, in: D. Jurafsky, J. Chai, N. Schluter, J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online, 2020, pp. 5082–5093. doi:10.18653/v1/2020.acl-main.456.
Sandhaus [2008] E. Sandhaus, The New York Times Annotated Corpus, 2008. URL: https://hdl.handle.net/11272.1/AB2/GZC6PL. doi:11272.1/AB2/GZC6PL.
Takase et al. [2016] S. Takase, J. Suzuki, N. Okazaki, T. Hirao, M. Nagata, Neural headline generation on Abstract Meaning Representation, in: J. Su, K. Duh, X. Carreras (Eds.), Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Austin, Texas, 2016, pp. 1054–1059. doi:10.18653/v1/D16-1112.
National Institute of Standard and Technology [2014] National Institute of Standard and Technology, Document understanding conferences, https://www-nlpir.nist.gov/projects/duc/data.html, 2014. Accessed: 30 December 2024.
Graff and Cieri [2003] D. Graff, C. Cieri, English gigaword, https://catalog.ldc.upenn.edu/LDC2003T05, 2003. doi:10.35111/0z6y-q265.
Napoles et al. [2012] C. Napoles, M. Gormley, B. Van Durme, Annotated gigaword, in: Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-Scale Knowledge Extraction, AKBC-WEKEX ’12, Association for Computational Linguistics, 2012, p. 95–100. URL: https://aclanthology.org/W12-3018.pdf.
Singh et al. [2021] R. K. Singh, S. Khetarpaul, R. Gorantla, S. G. Allada, Sheg: summarization and headline generation of news articles using deep learning, Neural Computing and Applications 33 (2021) 3251–3265. doi:10.1007/s00521-020-05188-9.
Sen and Yanikoglu [2018] M. Sen, B. Yanikoglu, Document classification of suder turkish news corpora, in: Proceedings of the 2018 26th Signal Processing and Communications Applications Conference (SIU), 2018, pp. 1–4. doi:10.1109/SIU.2018.8404790.
Ogunremi et al. [2024] T. Ogunremi, S. sessi Akojenu, A. Soronnadi, O. Adekanmbi, D. I. Adelani, AfriHG: News headline generation for african languages, in: 5th Workshop on African Natural Language Processing, 2024, p. 4. URL: https://openreview.net/forum?id=fw7g7pNUDl.
Bukhtiyarov and Gusev [2020] A. Bukhtiyarov, I. Gusev, Advances of Transformer-Based Models for News Headline Generation, Springer, 2020, pp. 54–61. doi:10.1007/978-3-030-59082-6_4.
Gavrilov et al. [2019] D. Gavrilov, P. Kalaidin, V. Malykh, Self-attentive model for headline generation, 2019. doi:10.1007/978-3-030-15719-7_11.
Yutkin [2019] D. Yutkin, Lenta, https://github.com/yutkin/Lenta.Ru-News-Dataset, 2019. Accessed: 30 December 2024.
Hu et al. [2015] B. Hu, Q. Chen, F. Zhu, LCSTS: A large scale Chinese short text summarization dataset, in: L. Màrquez, C. Callison-Burch, J. Su (Eds.), Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Lisbon, Portugal, 2015, pp. 1967–1972. doi:10.18653/v1/D15-1229.
Madasu et al. [2023] L. Madasu, G. Kanumolu, N. Surange, M. Shrivastava, Mukhyansh: A headline generation dataset for Indic languages, in: C.-R. Huang, Y. Harada, J.-B. Kim, S. Chen, Y.-Y. Hsu, E. Chersoni, P. A, W. H. Zeng, B. Peng, Y. Li, J. Li (Eds.), Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation, Association for Computational Linguistics, Hong Kong, China, 2023, pp. 620–634. URL: https://aclanthology.org/2023.paclic-1.62.
Aralikatte et al. [2023] R. Aralikatte, Z. Cheng, S. Doddapaneni, J. C. K. Cheung, Varta: A large-scale headline-generation dataset for Indic languages, in: A. Rogers, J. Boyd-Graber, N. Okazaki (Eds.), Findings of the Association for Computational Linguistics: ACL 2023, Association for Computational Linguistics, Toronto, Canada, 2023, pp. 3468–3492. doi:10.18653/v1/2023.findings-acl.215.
Li et al. [2021] P. Li, J. Yu, J. Chen, B. Guo, Hg-news: News headline generation based on a generative pre-training model, IEEE Access 9 (2021) 110039–110046. doi:10.1109/ACCESS.2021.3102741.
Hasan et al. [2021] T. Hasan, A. Bhattacharjee, M. S. Islam, K. Mubasshir, Y.-F. Li, Y.-B. Kang, M. S. Rahman, R. Shahriyar, XL-Sum: Large-scale multilingual abstractive summarization for 44 languages, in: The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP), Association for Computational Linguistics, 2021, pp. 4693–4703.
Ahmad et al. [2022] I. Ahmad, F. Alqurashi, R. Mehmood, Potrika: Raw and balanced newspaper datasets in the bangla language with eight topics and five attributes, 2022. arXiv:2210.09389.
Karaca and Aydın [2023] A. Karaca, O. Aydın, Generating headlines for turkish news texts with transformer architecture based deep learning method, Gazi Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi 39 (2023) 485–495. doi:10.17341/gazimmfd.963240.
Theledi and Pule [2024] K. Theledi, V. M. Pule, President’s speech and terminology used during the covid-19 pandemic: The interpretation of linguistic meaning in context and situational context, in: Public Health Communication Challenges to Minority and Indigenous Communities, IGI Global, 2024, pp. 92–107.
Liu et al. [2020] D. Liu, Y. Gong, Y. Yan, J. Fu, B. Shao, D. Jiang, J. Lv, N. Duan, Diverse, controllable, and keyphrase-aware: A corpus and method for news multi-headline generation, in: B. Webber, T. Cohn, Y. He, Y. Liu (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, 2020, pp. 6241–6250. doi:10.18653/v1/2020.emnlp-main.505.
Kiefer [2022] S. Kiefer, Case: Explaining text classifications by fusion of local surrogate explanation models with contextual and semantic knowledge, Information Fusion 77 (2022) 184–195. doi:10.1016/j.inffus.2021.07.014.
Chen [2019] N. Chen, Ci-snf: Exploiting contextual information to improve snf based information retrieval, Information Fusion 52 (2019) 175–186. doi:j.inffus.2018.08.004.
Zhu et al. [2023] L. Zhu, Z. Zhu, C. Zhang, Y. Xu, X. Kong, Multimodal sentiment analysis based on fusion methods: A survey, Information Fusion 95 (2023) 306–325. doi:j.inffus.2023.02.028.
Aziz et al. [2023] A. Aziz, N. K. Chowdhury, M. A. Kabir, A. N. Chy, M. J. Siddique, Mmtf-des: A fusion of multimodal transformer models for desire, emotion, and sentiment analysis of social media data, 2023. arXiv:2310.14143.
Hasan et al. [2020] T. Hasan, A. Bhattacharjee, K. Samin, M. Hasan, M. Basak, M. S. Rahman, R. Shahriyar, Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, 2020, pp. 2612–2623. doi:10.18653/v1/2020.emnlp-main.207.
Lin [2004] C.-Y. Lin, ROUGE: A package for automatic evaluation of summaries, in: Text Summarization Branches Out, Association for Computational Linguistics, Barcelona, Spain, 2004, pp. 74–81. URL: https://aclanthology.org/W04-1013.
Papineni et al. [2002] K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, Bleu: a method for automatic evaluation of machine translation, in: P. Isabelle, E. Charniak, D. Lin (Eds.), Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 2002, pp. 311–318. doi:10.3115/1073083.1073135.
Banerjee and Lavie [2005] S. Banerjee, A. Lavie, METEOR: An automatic metric for MT evaluation with improved correlation with human judgments, in: J. Goldstein, A. Lavie, C.-Y. Lin, C. Voss (Eds.), Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Association for Computational Linguistics, Ann Arbor, Michigan, 2005, pp. 65–72. URL: https://aclanthology.org/W05-0909.
Zhang et al. [2020] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, Y. Artzi, BERTScore: Evaluating text generation with bert, in: International Conference on Learning Representations, 2020, p. 43. URL: https://openreview.net/forum?id=SkeHuCVFDr.
Raffel et al. [2020] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring the limits of transfer learning with a unified text-to-text transformer, Journal of Machine Learning Research 21 (2020) 1–67. URL: http://jmlr.org/papers/v21/20-074.html.
Lewis et al. [2020] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, L. Zettlemoyer, BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, in: D. Jurafsky, J. Chai, N. Schluter, J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online, 2020, pp. 7871–7880. doi:10.18653/v1/2020.acl-main.703.
Bhattacharjee et al. [2022] A. Bhattacharjee, T. Hasan, W. U. Ahmad, R. Shahriyar, BanglaNLG: Benchmarks and resources for evaluating low-resource natural language generation in bangla, 2022. arXiv:2205.11081.
Muennighoff et al. [2023] N. Muennighoff, T. Wang, L. Sutawika, A. Roberts, S. Biderman, T. Le Scao, M. S. Bari, S. Shen, Z. X. Yong, H. Schoelkopf, X. Tang, D. Radev, A. F. Aji, K. Almubarak, S. Albanie, Z. Alyafeai, A. Webson, E. Raff, C. Raffel, Crosslingual generalization through multitask finetuning, in: A. Rogers, J. Boyd-Graber, N. Okazaki (Eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Toronto, Canada, 2023, pp. 15991–16111. doi:10.18653/v1/2023.acl-long.891.
Xue et al. [2021] L. Xue, N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, C. Raffel, mT5: A massively multilingual pre-trained text-to-text transformer, in: K. Toutanova, A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tur, I. Beltagy, S. Bethard, R. Cotterell, T. Chakraborty, Y. Zhou (Eds.), Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Online, 2021, pp. 483–498. doi:10.18653/v1/2021.naacl-main.41.
Tang et al. [2021] Y. Tang, C. Tran, X. Li, P.-J. Chen, N. Goyal, V. Chaudhary, J. Gu, A. Fan, Multilingual translation from denoising pre-training, in: The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP), Association for Computational Linguistics, 2021, pp. 3450–3466.
Rahman and Mamun [2024] A. Rahman, A. Mamun, The rise of clickbait headlines: A study on media platforms from bangladesh, Athens Journal of Mass Media and Communications 10 (2024) 109–130. doi:10.30958/ajmmc.10-2-3.