Text Classification using Graph Convolutional Networks: A Comprehensive Survey
Abstract.
Text classification is a quintessential and practical problem in natural language processing with applications in diverse domains such as sentiment analysis, fake news detection, medical diagnosis, and document classification. A sizable body of recent works exists where researchers have studied and tackled text classification from different angles with varying degrees of success. Graph convolution network (GCN)-based approaches have gained a lot of traction in this domain over the last decade with many implementations achieving state-of-the-art performance in more recent literature and thus, warranting the need for an updated survey. This work aims to summarize and categorize various GCN-based Text Classification approaches with regard to the architecture and mode of supervision. It identifies their strengths and limitations and compares their performance on various benchmark datasets. We also discuss future research directions and the challenges that exist in this domain.
1. Introduction
The need for automatic text classification has been felt since the advent of digital documents. However, interest in this domain and the need to develop more efficient and robust techniques has proliferated in recent times as we as a society continue to disseminate and consume exponentially growing quantities of textual information in the form of emails, blogs, and messages on social media. Moreover, vital government and military correspondence as well as commercial documentation is also often shared and preserved in the form of electronic text documents.
In machine learning, text classification is a task that entails the categorization of text, primarily documents, sentences, or phrases, into one or more predefined categories, themes, genres, and sentiments (Sun and Lim, 2001). Formally, if is the set of documents (or texts) and be the set of categories, then the objective of text classification is to assign a document to its most likely category (Aggarwal and Zhai, 2012; Ikonomakis et al., 2005).
1.1. Text Classification by Level of Abstraction
In general, text classification can be applied at multiple levels of document hierarchy as shown in Fig 1.
-
•
Word level: This is the most common level of text classification, where the goal is to classify individual words into categories such as topics, sentiment, or named entities.
-
•
Phrase level: This involves classifying groups of words that function as a unit, such as noun phrases or verb phrases. This level of granularity is usually required in applications pertaining to sentiment classification where it would not be correct to assume that the sentiment reflected in an entire document or paragraph would remain consistent (Litman, 1996; Tan et al., 2012). A frequent challenge while attempting classification at this level is the need for extensive manual annotation for each phrase and therefore, researchers have resorted to semi-supervised methods (Nam et al., 2009).
-
•
Sentence level: At a higher level of abstraction, sentence classification seeks to predict labels for each complete sentence (Guo et al., 2019; Hassan and Mahmood, 2018; Wieting and Kiela, 2019). Similar to phrase-level classification, this also enables us to derive a more meaningful description of a document by inferring distinct ideas and arguments contained within sentences instead of considering the entire text in broad strokes. As an example, this can be useful in identifying not just different speakers in a written dialogue but also changes in their attitudes and emotions.
-
•
Paragraph level: Paragraphs or subsections of a document are classified in to a set of categories. A paragraph can have one or more sentences and can even span an entire document. Learning distributed vector representations for pieces of texts of variable length is an important research direction (Le and Mikolov, 2014).
-
•
Document level: A significant bulk of text classification has been dedicated to document classification. At this level, the text classification algorithm predicts the classes of the entire document, i.e., the topic of an article or the genre of a book. In recent times, neural network-based approaches (Akhter et al., 2020; Yao et al., 2019) have dominated this domain. Researchers have also attempted to exploit BERT (Adhikari et al., 2019) for this purpose despite its prohibitively large number of parameters and computational overhead.
1.2. Applications of Text Classification
Text classification has broad applications in Natural Language Processing (NLP). The most frequent examples include:
-
•
Sentiment Analysis: Sentiment analysis classifies text as positive, negative, or neutral. It’s widely used in social media monitoring, customer feedback analysis, and product review categorization. Methods like Decision Trees, Naive Bayes, and Maximum Entropy have shown high performance (Hemalatha et al., 2013). Moreover, deep learning techniques using CNNs (Liao et al., 2017), RNNs (Can et al., 2018), and their combinations (Basiri et al., 2021; Wang et al., 2016a) have also achieved promising results.
-
•
Content Moderation: Automated text classification is used to detect and flag inappropriate content on social media and online communities, including hate speech (Gambäck and Sikdar, 2017; Del Vigna12 et al., 2017), offensive language (Hajibabaee et al., 2022), adult content (Kim and Nam, 2006), and privacy protection (Liang et al., 2024), ensuring a safer user experience.
-
•
Spam Filtering: Text classification helps prevent unwanted communications from reaching a user’s inbox by classifying messages as spam or not. Naive Bayes, along with SVM (Sculley and Wachman, 2007), Decision Trees (Sharma and Sahni, 2011), and Random Forests (Sjarif et al., 2019), have proven effective in spam filtering.
- •
1.3. The Use of Machine Learning for Text Classification
Over the years, a wide variety of machine learning techniques, ranging from classic techniques to deep neural networks have been leveraged for the task of text classification. The more traditional approaches include Naïve Bayes (Dai et al., 2007; Chen et al., 2009), Support Vector Machines (SVM) (Sun et al., 2009) and Decision Trees (Bahassine et al., 2016) as well as various ensembles and combinations of these approaches (Onan, 2017). More recent works indicate a notable transition to different neural network architectures such as Recurrent Neural Networks (RNN) (Liu et al., 2016), and Long short-term memory (LSTM) (Liu and Guo, 2019; Zhou et al., 2016). RNNs can assign more weightage to previous data points in a sequence, which makes them suitable for understanding semantics and context while performing text classification. However, they tend to suffer from vanishing and exploding gradients, which makes it difficult for them to preserve long-term dependencies. An LSTM is a special kind of RNN that is not vulnerable to these problems and better suited for such scenarios (Sherstinsky, 2020).
Convolutional Neural Networks (CNN) (O’Shea and Nash, 2015), although originally developed for image processing, have proven to be useful for text classification as well (Wang et al., 2018a; Liu et al., 2017a). In a typical CNN, the input undergoes convolution with learnable feature maps that can be layered to offer multiple filters on the input. To decrease computational load, CNNs use pooling to reduce the output size from one layer to the next in the network. The final layers in a CNN are typically fully connected.
More recently, transformer models have demonstrated great prowess in handling a wide range of NLP tasks. Transformers (Vaswani et al., 2017a) feature an encoder-decoder structure that relies on attention to generate an output. The encoder first maps an input sequence to a sequence of continuous representations. The decoder then takes these along with its own output at the previous time step to generate an output. Some of the more commonly used transformer-based models for text classification include BERT (Sun et al., 2019), XLNet (Chang et al., 2020), and RoBERTa (Briskilal and Subalalitha, 2022).
Bidirectional Encoder Representations from Transformers (BERT), in particular, has achieved amazing results in many NLP tasks and text classification is no exception (Zeng et al., 2024). BERT consists of multiple encoder blocks from the transformer stacked atop one another. Unlike many other transformer models, BERT is able to incorporate context on both sides of a word to gain better results. Since BERT is pre-trained on general tasks, it does not require huge amounts of additional data to fine-tune it for a particular target task. Graph Neural Networks (GNN) operate on graphs and are able to model global information in corpora. There exist several variations of GNNs, namely Graph Convolutional Networks (GCN) (Kipf and Welling, 2016a), Graph Attention Networks (GAT) (Veličković et al., 2017; Pal et al., 2020), Graph Auto-encoders (Kipf and Welling, 2016b), Gated GNNs (Li et al., 2015), and GraphSAGE (Hamilton et al., 2017). Besides these GNN architectures, there are also Graph Transformers (Yun et al., 2019; Zhang and Zhang, 2020) that embed a graph structure into the transformer architecture (Vaswani et al., 2017a), enabling learning from the entire graph rather than just the local neighborhood.
Among the various deep learning models that operate on graph data, GCNs, in particular, have shown great promise with regard to text classification (Yao et al., 2019). One of their key strengths is their ability to factor in the global information between words and concepts by performing convolution operations on neighboring nodes in a graph to aggregate information from a node’s neighbors and update that node’s representation.
While there exist several other surveys on the topic of general machine learning and deep learning text classification techniques (Aggarwal and Zhai, 2012; Ikonomakis et al., 2005; Kowsari et al., 2019; Minaee et al., 2021), as well as those pertaining to graph convolutional networks and graph neural networks (Zhang et al., 2019b; Zhou et al., 2020a; Wu et al., 2020b), this article, to the best of our knowledge, is the first attempt at extensively and exclusively reviewing various state-of-the-art methodologies using GCNs for text classification. Significant strides have been made in this area of research in the last few years and we have therefore placed particular emphasis on the more recent GCN-based text classification methods, their conception and various strengths and weaknesses.
1.4. Inductive vs. Transductive Learning
Text classification can be performed using two types of learning mechanisms:
-
•
Inductive Learning: This mode of learning takes place in two phases, namely the training phase and the inference phase. In the training phase, we use a training set to build a machine learning model, while during inference, we generalize this model to a separate, previously unseen test set to predict its labels.
-
•
Transductive Learning: In transductive learning, both training and inference occur simultaneously. In other words, both training and test data are available to us at the same time and we make use of patterns present in both sets and labels of the training set to infer the unknown labels of the test set.
Generally, transductive learning is more computationally expensive as it requires the algorithm to rerun to infer the labels of any new datapoints whereas in inductive learning, we build a generalized predictive model beforehand that does not require retraining for inference.
1.5. Key Findings
From our comparative analysis of various GCN-based text classification approaches, several key findings have emerged:
-
•
Initially, the focus was predominantly on supervised learning approaches. Early methods such as TextGCN (Yao et al., 2019) showcased remarkable performance and underscored the potential of GCNs in text classification tasks. These approaches laid the groundwork for subsequent innovations by demonstrating how graph-based representations could capture semantic relationships within text data.
-
•
In terms of architecture, earlier innovations leveraged optimization-centric methods and multigraph approaches to enhance graph representations whereas more recent architectures integrate GCNs with advanced models such as BERT and other LLMs to further enhance text classification performance.
-
•
As the field progressed, there was a noticeable shift towards semi-supervised methods, motivated by the need to leverage large amounts of unlabeled data, which is often more readily available than labeled datasets.
-
•
More recently, the research focus has shifted towards self-supervised learning, reflecting a broader trend in machine learning towards minimizing the dependency on labeled data.
-
•
Overall, the performance on benchmark datasets has improved significantly as models have evolved from basic GCNs to more complex and hybrid architectures over the years.
1.6. Criteria for Selection of Approaches and Datasets
In this survey, we adopted a rigorous criteria to evaluate and compare different GCN-based approaches for text classification. Specifically, we focused on seminal works as well as recent papers from the last five years that are well-cited and published in reputable venues. This approach allowed us to incorporate both historical context and the latest advancements in the field. Datasets commonly used across multiple studies are included in this review to ensure consistent comparisons and to yield meaningful insights regarding their relative strengths and limitations.
1.7. Main Contributions of This Work
The key contributions of our work are outlined as follows:
-
•
While there are several reviews on GCNs, this survey uniquely provides an exhaustive and categorical breakdown of GCN-based text classification techniques, focusing on the architecture and mode of supervision.
-
•
Based on supervision, we categorize existing methodologies into supervised, semi-supervised, self-supervised and weakly supervised techniques.
-
•
Based on architecture, we categorize these as fundamental techniques and GCN integration with generative models. In the latter we discuss combination of GCN with RNNs, LSTMs, BERT and LLMs, offering a structured understanding of the field.
-
•
Detailed analysis and comparison of various methods based on their performance on popular, benchmark datasets are provided to highlight their strengths and limitations.
-
•
Future research directions and challenges in GCN-based text classification are discussed to guide further advancements in the field.
The remainder of this survey paper is organized as follows. Section 2 provides a detailed review of the existing surveys. Section 3 provides insight into various textual embeddings. Section 4 provides an overview of the GCN architectures for text classification. Section 5 categorizes GCN architectures based on integration with generative models. Section 6 categorizes existing GCN approaches as supervised, semi-supervised, self-supervised, and weakly supervised. Section 7 covers the performance comparison, datasets, metrics and analysis of results. Finally, we conclude this paper in Section 8 and also discuss future research directions.
2. Review of Existing Surveys
In recent years, many surveys are published for various text classification techniques in general as well as those particularly focusing on the use of graph networks for this objective. These works through effective comparisons of contemporary approaches, have not only assisted other researchers to catch up with recent developments in this domain but also identified shortcomings and potential areas for future research. There exist several renowned works that aim to underscore the most quintessential developments with regards to text classification techniques. Some works cover a wide range of classic ML techniques and several DL approaches for text classification (Aggarwal and Zhai, 2012), (Ikonomakis et al., 2005), (Kowsari et al., 2019), and (Kadhim, 2019). Altınel et al. discuss a wide gamut of text classification techniques and break them down into different approaches based on domain knowledge, corpus analysis, deep learning, character sequence enhancement, and linguistic enrichment (Altınel and Ganiz, 2018). However, these works have not covered graph neural network-based techniques. Nevertheless, these surveys provide a foundational understanding of text classification and its evolution, and the key concepts they discuss still hold true.
In recent times, there has been a general transition to deep learning-based approaches for text classification as they have been found to better preserve the complex semantic relationships between words and documents as opposed to conventional ML techniques that use handcrafted features. Minaee et al. (Minaee et al., 2021) reviewed many deep learning models for text classification including those based on recurrent neural networks, convolutional neural networks, attention-based mechanisms, and graph neural networks. While it is exhaustive with respect to its breadth, it leaves more to be desired in terms of depth. Contrarily, Pham et al. (Pham et al., 2022), offer a comprehensive look at six GNN-based architectures for text classification published from 2019 to 2021, and highlights their potential as well as associated limitations and challenges.
Another category of surveys includes those with a primary focus on graph neural networks instead of text classification (Zhou et al., 2020a), (Wu et al., 2020b), and (Gupta et al., 2021). These papers provide a holistic overview of developments pertaining to graph neural networks but merely glance over their use in text classification as it was not the primary focus of these studies. Han et al. (Han et al., 2022) focus on analyzing the graph construction and learning mechanisms for text classification, rather than comparing the performance with other baselines. Also Zhang et al. (Zhang et al., 2019b) and Ren et al. (Ren et al., 2022) delve much deeper and focus exclusively on GCNs and dissect them by the type of convolution operations they use as well as their areas of application, among which text classification is also discussed albeit only briefly and as an introduction.
In comparison, this article provides a comprehensive and categorical review of a wide range of quintessential GCN-based approaches for text classification, along with their associated fundamentals and prospective uses. It primarily focuses on more recent state-of-the-art architectures that have yielded unprecedented results on benchmark datasets and traces their origin and evolution to earlier seminal GCN methodologies and further back to conventional machine learning and deep learning techniques. Fig 2 shows the distributions of research articles across different publishing venues as well as across different categories of algorithms based on supervision. It gives the reader a quick overview of the field regarding text classification using GCN-based approaches.
3. Overview of Textual Embeddings
Text embedding techniques generate high-dimensional vector representations for text, capturing semantic and syntactic relationships between words. These embeddings are crucial for tasks like text classification, sentiment analysis, and language translation, potentially improving performance and reducing training time. The choice of embeddings can significantly impact task success, as different embeddings capture various text aspects (Wang et al., 2018b; Shahmirzadi et al., 2019; Joulin et al., 2016; Kaibi et al., 2019).
-
•
Term Frequency-Inverse Document Frequency (TF-IDF): Maps words into a feature space based on their frequency in a document and rarity in the corpus. Since the vocabulary may contain millions of words, such models are hard to scale. Moreover, unlike contextualized embeddings (Devlin et al., 2018), TF-IDF is unable to account for the similarity between the words in a document since each word is encoded in isolation.
-
•
Word2Vec: Word2Vec (Mikolov et al., 2013) represents words as vectors learned from context by predicting a target word based on the context words around it (Continuous Bag-of-Words) or by predicting the context words given the middle word (Skip-gram). Word2Vec can capture important semantic and syntactic relationships between words while being computationally efficient compared to other prediction-based embedding techniques.
-
•
Doc2Vec: Doc2Vec (Le and Mikolov, 2014) extends Word2Vec to learn vector representations for entire documents or paragraphs, handling variable-length texts with fixed-length vectors. Analogous to Word2Vec’s Continuous Bag-of-Words and Skip-gram, Doc2Vec has the distributed Bag-of-Words model and the distributed memory model, respectively, to learn distributed representations of documents.
-
•
GloVe: Global Vectors (GloVe) (Pennington et al., 2014) generates word vectors by factorizing a global word-word co-occurrence matrix, capturing semantic and syntactic relationships between words and understanding word meaning across larger contexts through aggregated statistics.
-
•
Graph Embeddings: Graph embeddings represent nodes and edges in a graph as high-dimensional vectors. While studies have demonstrated the ability of techniques like Word2Vec and GloVe to capture global connections between words in a language, these connections are still rather limited. Graph embeddings go beyond this and capture dependencies over a much longer range, leveraging the structural and semantic properties of the graph. Random walk-based methods such as DeepWalk (Perozzi et al., 2014) and Node2Vec (Grover and Leskovec, 2016) generate embeddings by performing random walks on the graph and using the sequence of nodes visited as the context for the node. Deep learning-based techniques such as GCNs (Kipf and Welling, 2016a) and GraphSAGE (Hamilton et al., 2017) generate embeddings using neural networks to aggregate information from the node’s local neighborhood.
-
•
LLM-Based Embeddings: LLMs typically utilize embedding models that are integral to their architecture for representing words, tokens, or sub-words. These embeddings are learned jointly with the model during pre-training. In transformers such as GPT, tokens are initially input using basic embeddings (Vaswani et al., 2017a), which are then processed through multiple transformer layers to get contextual embeddings. WordPiece tokenizer breaks words into smaller units called sub-word tokens (Wu et al., 2016; Yang et al., 2019) and each sub-word token is assigned an embedding vector. Similarly SentencePiece (Kudo and Richardson, 2018) tokenization is used in models like T5 (Text-To-Text Transfer Transformer) (Raffel et al., 2020b). It segments text into smaller units and assigns embeddings to these units based on their context. Positional embeddings are added to word embeddings to encode the position or order of tokens in a sequence. They help the model understand the sequential structure of input sequence. Models like ELMo (Sarzynska-Wawer et al., 2021) and GPT (OpenAI, 2023) use contextual embeddings, where each token’s embedding depends not only on the token itself but also on the entire input sentence. These embeddings capture richer semantic and syntactic information.
-
•
BERT Embeddings: Based on the transformer architecture, Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al., 2018) is an advanced pre-trained embedding model that generates unique vector representations that account for the context of a word within a sentence. A deeply bidirectional system, BERT can understand the context of words to the left and to the right of a given word in a sentence. This affords it more power than earlier models, which were unidirectional. BERT embeddings are based on the internal representations of the model learned during pre-training on a large corpus of text. The pre-training involves training the model on two tasks: masked language modeling (predict masked words in a sentence) and next sentence prediction (predict whether two sentences are adjacent). The embeddings generated by BERT capture both the meaning and syntax of words along with the relationship between words in a sentence. This makes them particularly useful for tasks that require understanding the context of words in a sentence, such as text classification and named entity recognition.
4. Overview of GCN Architectures for Text Classification
In this section, we provide an overview of Graph Convolutional Networks (GCNs) based on the findings of Kipf and Welling (Kipf and Welling, 2016a), and some fundamental concepts. Additionally, we will overview some foundational GCN architectures that have significantly contributed to the advancement of GCNs for text classification.
4.1. GCN Preliminaries
Let be a graph where and represent sets of nodes and edges respectively. Each node in the graph has a corresponding m-dimensional feature vector . If we have nodes, this translates to a feature matrix that serves as the input to the first GCN layer. In supervised classification, the labels of only a subset of nodes is known and the objective is to predict the remaining unknown labels.
Adjacency matrix , the Degree matrix , and the normalized adjacency matrix are often used as Graph Shift Operators (GSO). is a sparse matrix with non-zero entities with all the diagonal elements set to one since each node is assumed to be connected to itself. For its remaining elements, we set , for all (, ) where is the edge weight between nodes and . is a diagonal matrix with degrees as diagonal entries, i.e., . The normalized adjacency matrix is given by .
4.2. Graph Convolutional Networks
A GCN is a multilayer neural network that can be applied directly to graphs, yielding vectors of nodes based on their connections with other nodes and overall graph topography. For a single layer GCN, a graph convolution can be represented as: , where is a learnable weight matrix for the first layer and is an activation function like ReLU. Such a GCN can only update a node’s representation based on information captured from its immediate neighbors. To leverage information from distant neighbors, multiple GCN layers can be stacked as given:
A k-layer GCN as depicted in Figure 3 can thus allow message passing among nodes that are at maximum k hops away, which tends to have a similar effect as an increased receptive field in deeper CNNs. Successive convolutions smoothen the resulting hidden representations locally along the edges of the graph, resulting in similar predictions among closely connected nodes. Typically, 2-layer GCNs have been used in literature as a higher number of layers yields diminishing returns and even worse performance due to over-smoothing. Predictions are made using a softmax classifier at the output whereas cross-entropy error over all labeled nodes is used as the loss function.
4.3. Fundamental GCN Approaches for Text Classification
Graph-based approaches for text classification have long existed (Angelova and Weikum, 2006). Defferrard et al. (Defferrard et al., 2016) were among the first to extend and demonstrate the efficacy of Spectral Graph Convolutional Neural Networks for this task. They framed their problem as a semi-supervised node classification problem where labels were only available for a small subset of nodes and based it on the assumption that connected nodes likely share the same label. They leveraged a global graph with Word2Vec embeddings and fast K-localized convolutions were applied.
Kipf and Welling (Kipf and Welling, 2016a) put forward a solution based on spectral GCNs featuring fast localized convolutions. They also considered transductive node classification in notably larger networks. They demonstrated that with a layer-wise propagation rule along with fewer parameters and operations, both scalability and classification performance can be improved in large-scale networks. This method proved to be a seminal work in this domain as it formalized GCNs as we know them and laid the foundation for all GCN methods to come. Besides being extensively applied to word-document graphs for text classification in subsequent literature, several works have also built upon this architecture for semi-supervised node classification. Specifically by introducing a discriminative hierarchical convolutional mechanism (Jin et al., 2021b), incorporation of high-order proximities using quantum information theory (Shah et al., 2021), integrating with a neural topic model (Yu et al., 2021), and including more robust privacy measures (Igamberdiev and Habernal, 2021).
Yao et al. (Yao et al., 2019) built upon the work of (Defferrard et al., 2016) and (Kipf and Welling, 2016a) by leveraging a singular heterogeneous text graph for the entire corpus and further extended them through the inclusion of document nodes and the representation of relationships among word and document nodes using suitable metrics such as Pointwise mutual information for word-to-word relations and TF-IDF for word-to-document relations as shown in Fig. 4. They then modeled this graph with GCN (Kipf and Welling, 2016a) so it may automatically learn embeddings of both words and documents in conjunction, supervised by a small subset of labeled documents. The text graph construction method and application of a two-layer GCN presented in this paper has influenced many later works (Huang et al., 2019; Wu et al., 2019; Zhu and Koniusz, 2020; Lei et al., 2021; Wang et al., 2023a; Liu et al., 2020; Zhou et al., 2020b; Wang et al., 2021b; Dai et al., 2022; Wang et al., 2023b; Zhang et al., 2020; Wang et al., 2022a; Cai et al., 2020; Ma et al., 2021; Liu, 2022; Ye et al., 2020; Yang et al., 2021b; Zhao et al., 2022; Cui et al., 2022; Li et al., 2021; Liu et al., 2021; Ragesh et al., 2021; Wang et al., 2022b; Marreddy et al., 2022; Zhu et al., 2021a; Yang et al., 2022; Wu et al., 2023) as discussed in the following sections.
4.4. Integration of GCN with CNN
Some researchers have sought to use Convolutional Neural Networks (CNNs) with GCN, owing to the former’s ability to capture local contextual information effectively. Zeng et al. (Zeng et al., 2022) proposed a GCN-CNN boosting ensemble in which a CNN learned from examples misclassified by a GCN to improve performance. Similarly, in addition to presenting a hybrid GCN-Bi-LSTM model, Yang et al. (Yang et al., 2021a) also introduced variant models (ServeNet, C-LSTM) that used 1-D or 2-D CNN layers in addition to the Bi-LSTM layers to better capture local information.
Category | Description | Methods |
---|---|---|
Fundamental GCN Approaches | Seminal techniques that formalized GCNs for text classification and paved the way for further innovations. | Graph-CNN (2016) (Defferrard et al., 2016), GCN (2017) (Kipf and Welling, 2016a), TextGCN (2019) (Yao et al., 2019) |
Integration with CNNs | Architectures combining GCNs with CNNs to enhance feature extraction by leveraging graph structures and spatial hierarchies in text data. | TextGCN C-LSTM, TextGCN ServeNet (2021) (Yang et al., 2021a), GCN-CNN (2022) (Zeng et al., 2022) |
Integration with RNNs, LSTMs, GRUs | Architectures combining GCNs with RNNs and variants like LSTM and GRU to capture sequential dependencies and graph structures. | IGCN (2020) (Tang et al., 2020a), GCN-LSTM (2020) (Gao et al., 2020), GL-GCN (2021) (Zhu et al., 2021b), TextGCN Bi-LSTM (2021) (Yang et al., 2021a), BiGRU+GCN (2022) (Dong et al., 2022) |
Integration with Transformer | Methods combining GCNs with Transformer models to leverage self-attention mechanisms and capture long-range dependencies in text. | GTG (2023) (Liu et al., 2023), TLC-XML (2024) (Zhao et al., 2024) |
Integration with BERT | Hybrid models combining GCNs with BERT to capture both graph and contextual information for enhanced performance. | VGCN-BERT (2020) (Lu et al., 2020), GC-GCN-BERT (2021) (Gao and Huang, 2021), MGCN (2021) (Xue et al., 2021), R-GCN (2022) (Chen et al., 2022), BERT-GCN + MA (2022) (She et al., 2022), BERT-GCN, RoBERTa-GCN (2022) (Lin et al., 2021), HINT-G (2023) (Li et al., 2023) |
Integration with BERT+LSTMs | Models combining GCNs, BERT, and LSTMs to exploit graph structures, contextual embeddings, and sequential dependencies. | WordBERT-BiLSTM-SGCN (2021) (Zeyu et al., 2021), IMGCN (2022) (Xue et al., 2022) |
Integration with LLMs | Models that integrate GCNs with large language models to leverage extensive pre-trained knowledge for improved text processing. | Clip-GCN (2024) (Zhou et al., 2024), GCN+GPT (2024) (Chen et al., 2024), GCN+Llama2-13B (2024) (Li et al., 2024), Graph Aware Convolution + ChatGPT (2024) (Du et al., 2024) |
5. Integration of GCN with Generative Models
There is a notable body of work dedicated to improving the text classification performance of GCNs by augmenting them with various state-of-the-art generative models as shown in Table 1 and Fig. 5. The most common approaches involve integrating GCNs with recurrent neural networks (RNNs), LSTMs, and GRUs. Additionally, GCNs have been combined with large language models (LLMs), BERT and its many variants, showcasing the versatility and enhanced performance achieved through these hybrid architectures.
5.1. GCN Integration with RNNs, LSTMs, and GRUs
GCNs have been used in tandem with recurrent neural networks (RNN) and their variants, such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). The primary goal here is to enable the overall architecture to effectively capture long-range and short-range contextual dependencies. Tang et al. (Tang et al., 2020a) combined textual and part-of-speech features obtained using Bi-LSTM with an adjacency matrix capturing the dependency relationship to address the problems of contextual dependency and lexical polysemy in GCNs. Gao et al. (Gao et al., 2020) also proposed a GCN and LSTM hybrid structure that combined outputs of each GCN layer with an embedding generated by passing them through an LSTM. Meanwhile, Yang et al. (Yang et al., 2021a) leveraged a weighted combination of evaluation values from GCN and Bi-LSTM classifiers to improve text classification performance. Similarly, Zhu et al. (Zhu et al., 2021b) utilized a Bi-LSTM to capture the local structure information of a sentence and a text graph and GCN to model the global dependency information between words. Both global and local dependency structure signals were then fused using an attention mechanism and used to guide the training process.
In terms of performance, GRUs are generally faster and less complex than LSTMs because they have fewer gates. Thus, Dong et al. (Dong et al., 2022) opted to use a two-way GRU model in tandem with a GCN in their proposed architecture. They fed Word2Vec embeddings into a BiGRU layer, yielding a representation that captured the global contextual features and long-range dependencies within the text. This representation was then passed through a GCN layer to extract complex semantic relations and spatial feature information. The GCN output was then fed into a classifier that predicted the class label of the input text.
5.2. GCN Integration with Transformer
Liu et al. (Liu et al., 2023) proposed a method that combined Transformer with GCN for improved semantic representation of documents. After the text graph underwent the first graph convolutional layer, its word nodes were fed into a transformer to capture the contextual and sequential information of the text. The graph was then augmented with the transformer’s outputs and passed through a second graph convolutional layer to ultimately yield the final classification results. TLC-XML (Zhao et al., 2024) is a Transformer-based model for extreme multi-label classification. It employs GCNs for cluster correlation learning.
5.3. GCN Integration with BERT

Lu et al. (Lu et al., 2020) augmented GCN’s capacity to model global information about the vocabulary of a language with BERT’s ability to capture the local contextual information within a sentence or document and proposed a solution that outperformed either of its individual components as evident by their experiments on various state-of-the-art text classification datasets. They first built a GCN on the vocabulary graph based on word co-occurrence information, similar to approaches in (Defferrard et al., 2016; Kipf and Welling, 2016a; Wu et al., 2019), and then passed the word embedding and the relevant part of the graph embedding together to a self-attention encoder in BERT. This enabled them to interact and guide each other while learning the classifier such that the resulting representation could harness local and global information when performing text classification (see Fig. 6). Xue et al. (Xue et al., 2021) applied a GCN to a text graph constructed based on NPMI and WordNet to obtain a hidden state representation that could effectively capture global contextual dependencies and semantic information. An attention mechanism was then used to combine this with the local information extracted using BERT. Alternatively, Gao and Huang (Gao and Huang, 2021) and She et al. (She et al., 2022) used a gating mechanism to integrate BERT and GCN embeddings. Chen et al. (Chen et al., 2022) proposed a relational graph convolutional network to process the semantic features obtained by BERT representation as part of a text-based accident causal classification method. The R-GCN captured immediate syntactic neighbor information of each word and assigned different weights to different types of edges. They also introduced a gate mechanism to reduce the influence of false dependency edges caused by the domain gap. Lin et al. (Lin et al., 2021) proposed a method that first constructed a heterogeneous graph for the corpus with nodes representing either words or documents, similar to TextGCN. However, the document node embeddings were initialized with pre-trained BERT representations. Thus, by jointly training the BERT and GCN modules, this model could take advantage of both BERT’s large-scale pretraining and GCN’s message propagation mechanism, the combined effectiveness of which had not been previously explored. The resulting model was able to obtain state-of-the-art performance on a wide range of text classification datasets. More recently, Li et al. (Li et al., 2023) used a pre-trained BERT model to initialize document nodes in their heterogeneous graph structure that also included word and entity nodes. They generated entity nodes by mapping the entities in the text to an exogenous knowledge base. While word-word and document-word edges were modeled as in (Yao et al., 2019), an attention mechanism was used to weigh document-entity edges and personalized PageRank to measure the semantic relatedness of entity-entity edges. An enhanced GCN with graph sampling and DropEdge techniques was then applied to mitigate the problems of neighbor explosion and noisy inputs in GCN, and the final node embeddings were used for text classification.
5.4. GCN Integration with BERT+LSTMs
Several hybrid GCN architectures have also been explored in recent research. For instance, various authors have leveraged GCN, BERT, and Bi-LSTM simultaneously to improve text classification performance. Zeyu et al. (Zeyu et al., 2021) used BERT to obtain word representations from long texts and fed them into a Bi-LSTM model to capture their semantic relationship. A GCN was then applied to a graph constructed using these word features as nodes and vectors similarity between them as edges. Xue et al. (Xue et al., 2022) also used BERT embeddings as input features along with one graph based on NPMI and WordNet and the other based on dependency relationships. GCNs were trained on these graphs separately. The resulting hidden states were concatenated, fused with the input BERT features, and fed into a fusion model composed of two Bi-LSTM layers with an attention layer in between to generate the final features for classification.
5.5. GCN Integration with LLMs
The synergy between GCNs and the LLMs in NLP enhances performance in text classification. Zhou et al. (Zhou et al., 2024) proposed Clip-GCN multimodal fake news detection, which leveraged the CLIP (Radford et al., 2021) pre-training model to extract joint semantic features from image-text information. The model utilized adversarial neural network to extract inter-domain invariant features and employed GCNs to capture intra-domain knowledge for detecting emergent news. Chen et al. (Chen et al., 2024) used, gpt-3.5-turbo-0613 (OpenAI, 2023) as LLM, in graph machine learning for text classification. They investigated two approaches: LLMs-as-Enhancers and LLMs-as-Predictors. The former enhanced nodes’ text attributes with LLMs’ knowledge and generated predictions using MLP, GCN, and GAT, comparing their performance. Li et al. (Li et al., 2024) proposed a hybrid approach that combined Natural Language Inference (NLI) and Graph Convolutional Networks (GCNs) for ontology completion. The GCN models used ConCN concept embeddings as input features and achieved strong performance, with the GCN and Llama2-13B (Touvron et al., 2023a) variant. Du et al. (Du et al., 2024) proposed a Graph-aware Convolutional LLM method aimed at enabling LLMs to capture high-order relations within user-item graphs using textual data. This method employed the LLM as an aggregator in graph processing to facilitate a step-by-step understanding of graph-based information. Specifically, the approach leveraged ChatGPT to enhance descriptions by systematically exploring multi-hop neighbors layer by layer, thus progressively propagating information throughout the GCN.
5.6. Discussion
The integration of GCNs with generative models has emerged as a powerful approach for enhancing text classification performance. Various hybrid architectures are developed, leveraging the strengths of GCNs in modeling relational structures and the advanced contextual understanding provided by models like BERT, LLMs, and LSTMs. This integration have consistently led to improvements in accuracy and robustness across diverse datasets. Early models combining GCNs with BERT, such as VGCN-BERT (Lu et al., 2020) and GC-GCN-BERT (Gao and Huang, 2021), successfully merged global and local text features, resulting in superior performance. Subsequent models, such as R-GCN (Chen et al., 2022) and BERT-GCN (Lin et al., 2021), further enhanced this synergy by introducing gating and attention mechanisms. The integration of GCNs with LLMs represents a more recent and cutting-edge development. Models like Clip-GCN (Zhou et al., 2024) and GCN+GPT (Chen et al., 2024) have utilized the extensive pre-trained knowledge of LLMs to enrich the text attributes within GCNs, significantly boosting performance in tasks such as fake news detection.
Incorporating GCNs with RNNs, including LSTM and GRU variants, has also proven effective. These architectures benefit from RNNs’ capability to model sequential dependencies, which complements GCNs’ graph-based relational understanding. Hybrid models like GCN-LSTM (Gao et al., 2020) and BiGRU+GCN (Dong et al., 2022) have leveraged this dual capability to capture both long-range dependencies and local contextual information, resulting in improved text classification outcomes.Beyond these combinations, GCNs have been integrated with a variety of other architectures, including CNNs and Transformers, to further enhance their performance. Models like GCN-CNN (Zeng et al., 2022) and GTG (Liu et al., 2023) benefit from CNNs’ ability to capture local features and Transformers’ capacity for contextual understanding, resulting in robust and accurate text classification systems. From early GCN-BERT combinations to the latest GCN-LLM integrations, the augmentation of GCNs with generative models has consistently driven advancements in text classification. By leveraging the complementary strengths of GCNs and various generative models, researchers have developed increasingly sophisticated and effective text classification techniques, leading to continuous improvements in performance across a wide range of applications.
6. Supervision-based Categorization of GCN Approaches
Category | Description | Sub-category | Methods |
Supervised | Require labeled data for training and make predictions accordingly. | Optimization-centric | TL-GNN (2019) (Huang et al., 2019), SGC (2019) (Wu et al., 2019), SSGC (2021) (Zhu and Koniusz, 2020), NMGC (2021) (Lei et al., 2021), LDGCN (2023) (Wang et al., 2023a) |
Multigraph | TensorGCN (2020) (Liu et al., 2020), SK-GCN (2020) (Zhou et al., 2020b), GFN (2022) (Dai et al., 2022), KG-GCN (2023) (Wang et al., 2023b) | ||
Inductive | TextING (2020) (Zhang et al., 2020), InducT-GCN (2022) (Wang et al., 2022a) | ||
Multilabel | HBLA (2020) (Cai et al., 2020), LDGN (2021) (Ma et al., 2021), GCN-BERT (2022) (Liu, 2022) | ||
Classification with Class Imbalance | MMCT-GCN (2023) (Karajeh et al., 2023), GNN-AWB (2023) (Badiei et al., 2023), MCICIT (2024) (He et al., 2024) | ||
Extreme Text Classification | TLC-XML (2024) (Zhao et al., 2024) | ||
Multilingual | CLHG (2021) (Wang et al., 2021b), MSA-GCN (2024) (Mercha et al., 2024) | ||
Hierarchical | AMKI-HTC (2024) (Feng et al., 2024) | ||
Semi-supervised | Use a small amount of labeled data and a large amount of unlabeled data to improve training. | Short Text Classification | STGCN (2020) (Ye et al., 2020), HGAT (2021) (Yang et al., 2021b), MP-GCN (2022) (Zhao et al., 2022), ST-TextGCN (2022) (Cui et al., 2022) |
Multigraph | TextGTL (2021) (Li et al., 2021) | ||
Zero-Shot Classification | ZS-TC (2021) (Liu et al., 2021) | ||
Inductive | HeteGCN (2021) (Ragesh et al., 2021), HDGAT (2024) (Lin et al., 2024) | ||
Document-Document Edge Definition | ME-GCN (2022) (Wang et al., 2022b) | ||
Multi-Task Classification | MT-TextGCN (2022) (Marreddy et al., 2022) | ||
Neighborhood-level Contrastive Learning | NNC-GCN (2024) (Xiao et al., 2024) | ||
Self-supervised | Generate supervisory signals from input data to train without labels. | Multimodal Representation Learning | GCNW-FL (2021) (Zhu et al., 2021a) |
Contrastive Learning with Augmentation | CGA2-TC (2022) (Yang et al., 2022) | ||
Pre-Trained Language Model Integration | Cont-GCN-BERT, Cont-GCN-XLNet, Cont-GCN-RoBERTa (2023) (Wu et al., 2023) | ||
Weakly Supervised | Use noisy or limited labels when fully labeled data is scarce or costly. | Multiple Instance Learning | GNN for MIL (2019) (Tu et al., 2019) |
In this section, we summarize various GCN-based approaches that innovated upon the foundation laid by Yao et al. (Yao et al., 2019) and categorize them by their mode of supervision, i.e., supervised, semi-supervised, self-supervised, and weakly supervised (see Fig. 7 and Table 2).
6.1. Supervised Text Classification
In supervised text classification, the GCN model is trained using labeled data. More specifically, nodes of documents that form the training set make use of associated labels from a set of one or more predefined classes to train the model such that it can predict labels for unseen documents. During training, the model’s parameters are optimized using a loss function, such as Binary or Categorical Cross-Entropy, which measures the difference between the model’s predicted outputs and the true labels of the training examples.
The following approaches aim to build upon TextGCN in a multitude of ways, i.e., optimizing the architecture for improved efficiency, stacking multiple graphs to capture additional context, or addressing its limitations so it may effectively perform in inductive settings.
6.1.1. Optimization-centric Approaches
The aforementioned methods rely on building a single corpus-level graph, for which training can prove highly cumbersome and computationally taxing as the vocabulary increases. A large vocabulary size in turn increases the number of nodes with an even greater increase in the number of edges. Consequently, matrix operations for deriving Graph shift operators and their subsequent application while applying k-localized convolutions also becomes computationally expensive. Researchers have attempted to address this issue and proposed various computationally efficient algorithms.
Huang et al. (Huang et al., 2019) refrained from applying convolution operations to a global graph, citing memory consumption as a possible problem. They instead produced document-level graphs for each input article by connecting word nodes within them. These more local representations were then shared globally, allowing their weights to be updated through a message-passing mechanism, where a node takes in the information from neighboring nodes to update its representation, essentially preserving context. Finally, representations of all the nodes were summarized in the graph to predict the results while consuming notably less memory.
Wu et al. (Wu et al., 2019) approached the issue of increasing computational complexity from another angle. They argued that GCNs were unnecessarily complex and computationally redundant owing to origins rooted in prior deep learning approaches. Their Simple Graph Convolution approach (SGC) attempted to reduce this excess complexity by removing non-linearities and in turn, collapsing weight matrices between successive layers of the GCN to yield a linear model. This essentially reduced the entire procedure to a simple feature propagation step (applying the K-th power of the normalized adjacency matrix in a single-layer neural network) followed by a standard logistic regression layer instead of softmax as in a standard GCN. Their results demonstrated that their simplified model was more computationally efficient and scalable than its nonlinear counterparts while still achieving similar and, in some cases, superior classification performance.
Building upon this approach, Zhu and Koniusz (Zhu and Koniusz, 2020) also leveraged a linear model that enabled them to keep computational costs down. However, unlike SGC, their approach, Simple Spectral Graph Convolution (SSGC) was able to keep over-smoothing in check when a larger number of graph convolutional layers were applied while also preserving the large context of each node. This was done by preventing the largest neighborhoods from over-dominating while aggregating over neighborhoods of gradually increasing sizes.
Moreover, Lei et al. (Lei et al., 2021) also addressed overfitting concerns and improved computational efficiency while demonstrating similar performance to TextGCN by proposing a weight-sharing mechanism that enabled them to use the same weight matrix for different order graph convolutions. By then fusing different neighbor features from 1-hop to k-hops using a multi-hop neighbor information fusion mechanism, they were able to capture additional information without an increase in the number of parameters.
Wang et al. (Wang et al., 2023a) introduced a new discriminative objective function that minimized the intra-class distance and maximized the inter-class distance in the resultant features of the texts while also minimizing the classification cross-entropy loss function. Thus, by jointly training on these objectives, their model was able to learn representative embeddings while utilizing the intra and inter-class manifold structures inherent to the graph.

6.1.2. Multigraph Approaches
Some approaches leverage multiple graphs to capture additional contextual information from the corpus. Vashishth et al. (Vashishth et al., 2018) proposed capturing additional contextual information, i.e., semantic and syntactic context using multiple graphs. They used two GCNs to independently learn from semantic and syntactic graphs built from the same corpus and demonstrated the efficacy of combining them to perform various tasks. This idea was later built upon and applied to text classification by Liu et al. (Liu et al., 2020). Their proposed approach leveraged a graph tensor that captured semantic, syntactic, and sequential context of textual information using three separate heterogeneous graphs and used a GCN to jointly learn on each of them. This approach simultaneously performed intra-graph propagation to aggregate information from the neighbors of each node and inter-graph propagation to integrate the heterogeneous information across these graphs (Fig. 8).
Wu et al. (Dai et al., 2022) proposed a new approach that built corpus-level text graphs instead of defining each document as a node, as that would limit their method to only transductive settings. For document embeddings, their framework merged word embeddings as per document-level structural information in real-time. They built multiple corpus-level graphs to capture different views of structural information, applied graph convolutions to them and fused their results to obtain a better decision boundary. Some works have leveraged multiple graphs to capture external information not explicitly expressed within the text. Zhou et al. (Zhou et al., 2020b) constructed syntactic and knowledge graphs, combined their adjacency matrices, and applied a GCN to them along with multi-head positional attention to enhance the sentence representation towards a given aspect. Wang et al. (Wang et al., 2023b) also constructed and fed two graphs based on syntactic dependency and entity relationships into separate GCN modules and fused their outputs to improve Chinese long-text classification performance.
6.1.3. Inductive Approaches
While most approaches for text classification are transductive, literature that tries to tackle this problem using inductive processes does exist. Unlike transductive learning techniques in which we have observed all the data beforehand during training, inductive learning relies on only the training data for training the model and then applying the learned model to a dataset it has never seen before.

One such approach was proposed by Zhang et al. (Zhang et al., 2020) to overcome the limitations posed by transductive GCN methods such as TextGCN. To learn fine-grained text-level word relations, they first built individual graphs for each document. Information from word nodes was then propagated to their neighbors via Gated GCNs and aggregated into each document embedding, which were in turn used to obtain the final prediction. Their approach achieved good classification performance, however, its most significant gains were underscored under inductive settings (see Fig. 9). Wang et al. (Wang et al., 2022a) also augmented transductive models like TextGCN and SGC with inductive learning. They generated document node representations from one-hot encoded word node vectors weighted by TF-IDF, using only the training set documents. A GCN was then trained with cross-entropy loss on these training document node representations. During testing, unidirectional propagation updated test document nodes by leveraging stored input and hidden layer representations from the training step.
6.1.4. Multilabel Classification
In multilabel text classification, an instance can be assigned multiple number of classes or labels. Cai et al. (Cai et al., 2020) featured a hybrid network that uses a pre-trained BERT model to generate context-aware document representations, while a GCN was used to learn contextualized label embeddings. Attention was learned to assign the weight of the label to each word, yielding a label-specific word representation. The context-aware and label-specific word features were then combined and fed into a Bi-LSTM for classification.
Liu and Bin (Liu, 2022) proposed a GCN and BERT-based framework for multilabel classification on Chinese government hotline event text. A GCN was applied to an abstract meaning representation-based graph to produce an event topic information vector. It was then fused with an event semantic information vector extracted using BERT to predict the label count. A memory network stored the event label semantic information and obtained a candidate set, which was then matched with the GCN-BERT fusion vector using an answer selection framework. The top k labels with maximum probability were selected as the output.
Ma et al. (Ma et al., 2021) learned label-specific text representations for the documents. However, to achieve this they extracted relevant semantic components for each of the target classes and used a dual graph GCN to model interactions among them based on the statistical label co-occurrence and dynamic reconstruction graph. The resulting component representations were used to predict the document labels.
6.1.5. Classification with Class Imbalance
Text classification may sometimes have class imbalance issue. To address this, a multi-label classification approach for imbalanced clinical text has recently been proposed by He et al. (He et al., 2024). This approach leverages BioBERT, a pre-trained language model specialized for biomedical texts, to obtain fine-grained semantic features. To tackle class imbalance, it incorporates a co-occurrence based embeddings with additional information enhanced GCN, ultimately learning representations. Karahej et al. (Karajeh et al., 2023) proposed Multi-view Minority Class Text Graph Convolutional Network (MMCT-GCN) that addresses minority classes by capturing textual graph representations in addition to sequence-based text representations.
Badiei et al. (Badiei et al., 2023) combined GCNs and LSTMs to process text, addressing class imbalance with an adversarial loss framework. They used separate weight generators per class to adjust sample weights dynamically during training. Their approach increased weights for misclassified samples and decreased weights for correctly classified ones, enhancing classifier performance over epochs.
6.1.6. Extreme Text Classification
Unlike traditional text classification where we might have 10-20 categories, extreme text classification deals with datasets that have hundreds of thousands, or even millions of potential labels (Xiong et al., 2021). While there are a vast number of possible labels, there is often a scarcity of training data for each specific label. This means there might be very few examples for some of the more uncommon labels. This type of classification can be useful for tasks like automatically tagging products with highly specific attributes on an e-commerce website, or categorizing scientific research papers within a very granular subject hierarchy. Various approaches to tackle extreme text classification include transfer learning, leveraging pre-trained language models, and developing new methods for handling imbalanced datasets. Transformers are also used for extreme multi-label text classification due to their effective text representation capabilities. Zhao et al. (Zhao et al., 2024) proposed TLC-XML, a Transformer-based model for XML classification. The model includes three modules: Partition, Matcher, and Ranker. In the Partition module, label correlation graphs are constructed using semantic and co-occurrence information, grouping strongly correlated labels into clusters. The Matcher module employs GCNs for cluster correlation learning, embedding these correlations into the classifier. The Ranker module improves label predictions by integrating raw predictions with information from neighboring labels.
Dahiya et al. (Dahiya et al., 2021) developed the Siamese Extreme Multi-Label GCN model (SiameseXML), leveraging a probabilistic model that supports a modular approach combining Siamese architectures with powerful extreme GCN based classifiers. They also designed a scalable training pipeline capable of handling tasks with up to 100 million labels. Jiang et al. (Jiang et al., 2021) introduced LightXML, which employed an end-to-end training method along with dynamic sampling of negative labels. Xiong et al. (Xiong et al., 2023) also proposed transformer based two-stage Extreme Multi-label Text Classification (XMTC) model.
6.1.7. Multilingual Text Classification
Multilingual text classification involves categorizing written texts in different languages. Multilingual Sentiment Analysis GCN (MSA-GCN) (Mercha et al., 2024) used a GCN to capture both short-distance and long-distance semantics effectively. This approach employed a unified heterogeneous text graph and uses a moderately deep GCN to acquire predictive representations for all nodes, facilitating transfer learning across languages. Wu et al. (Wang et al., 2021b) also used a GCN to capture rich information contained within and across languages for cross-lingual text classification. They added part-of-speech tags to edges as well as direct connections between similar documents and machine-translated versions of the same document in a base text graph. A GCN was then applied to aggregate information from multiple subgraphs separated by different types of edges and learn a language-agnostic representation for the documents.
6.1.8. Hierarchical Text Classification
Hierarchical text classification is the process of categorizing a text into multiple hierarchical levels of categories. Feng et al. (Feng et al., 2024) proposed the Adaptive Micro-knowledge and Macro-Knowledge incorporation for Hierarchical Text Classification (AMKI-HTC) model, which integrated micro-knowledge to capture class-relevant keywords for discriminative representations and enhanced label graph accuracy with macro-knowledge. It incorporated a confidence maximization fusion strategy for adaptive aggregation of multi-view features.
6.2. Semi-supervised Text Classification
In a semi-supervised setting, the model is trained using a combination of labeled and unlabeled data. The labeled data guides learning of the relationship between inputs and outputs as it does in supervised learning. However, semi-supervised approaches additionally use unlabeled data to improve the quality of the learned representation so that it is more robust and better generalizes. In practice, this can be achieved by minimizing a loss function that combines the classification loss for the labeled examples with a regularization term that encourages the model to learn similar representations for similar documents.
6.2.1. Short Text Classification
TextGCN had previously been extended to short-text classification, namely for product query and product title classification, either by leveraging side information to construct the graph (Tayal et al., 2019) or by incorporating label dependencies in the output space (Tayal et al., 2020). Ye et al. (Ye et al., 2020) employed a topic model to obtain global short text topic information to complement the word co-occurrence, and document word relation information for graph construction. A GCN was applied to the short-text graph, and the resulting word and document nodes and pre-trained vector obtained from BERT’s hidden layer were input into a Bi-LSTM classifier.
Yang et al. (Linmei et al., 2019; Yang et al., 2021b) proposed inductive learning and multi-label classification of short texts using two steps: first, a flexible Heterogeneous Information Network (HIN) modeled short texts by capturing rich relations among them and augmenting them with additional information, such as topics and entities. Doing so helped alleviate semantic sparsity, combat noise, and make predictions with greater confidence in the downstream classification task. Then, their proposed Heterogeneous Graph Attention (HGAT) model embedded the HIN outputs based on a node and type-level attention mechanism. This dual-level attention mechanism considered the significance of not just neighboring nodes but also the different types of information associated with a particular node. The HGAT employed heterogeneous graph convolution to factor in the difference between various types of information and map them into an implicit shared space using their corresponding transformation matrices.
Zhao et al. (Zhao et al., 2022) proposed a Multi-head-Pooling-based GCN for more robust short text classification without the need for pre-trained word embeddings and with a lower computational overhead. They introduced three architectures focusing on the first-order nodes of isomorphic graphs, first and second-order nodes of isomorphic graphs, and first-order nodes of heterogeneous graphs, respectively. The key innovation of this approach was the use of a graph pooling method based on self-attention to evaluate and select important nodes from these multiple perspectives without an increase in trainable parameters. The authors demonstrated their model’s ability to capture rich semantic information in short texts and effectiveness across multiple benchmark datasets.
Cui et al. (Cui et al., 2022) designed a model to overcome the challenges of short text classification due to sparsity and limited labeled data. Instead of generating text samples, they opted for a more convenient self-training method that propagated labeled information to target samples through the graph structure. Their model added keywords to the training set and calculated the confidence of each word. Words with high confidence were identified automatically as pseudo-labeled data, and the confidence of each word was used to compute edge weights in the graph, reducing the impact of ambiguous words on classification performance.
6.2.2. Multigraph Approaches
Li et al. (Li et al., 2021) argued against heterogeneous graphs as they increased the number of parameters by an unwarranted amount. Moreover, they claimed that previous graph construction methods relied solely on empirical design, had no theoretical foundations, and introduced additional problems like node redundancy, missing information, and error cascade propagation. Their approach leveraged multiple non-heterogeneous graphs and refined the graph topology to propagate information more effectively. It also incorporated attribute space interpolation based on dense substructure in graphs to predict low-entropy labels with high-quality feature nodes for data augmentation. Overall, they were able to reduce parameter complexity and make the trained model lightweight while still effectively capturing different types of context.
6.2.3. Zero-shot Classification
Liu et al. (Liu et al., 2021) proposed a novel method to achieve zero-shot text classification by connecting seen classes to unseen classes using semantic category knowledge from ConceptNet (Speer et al., 2017) and constructing a graph of all categories. Unseen classes could then be identified by information propagation through this connection. It was done by transferring category knowledge through convolution on the constructed graph and semi-supervised training using samples of the seen classes.
6.2.4. Inductive Approaches
Ragesh et al. (Ragesh et al., 2021) addressed the problem of learning efficient and inductive graph convolutional networks for text classification with many examples and features. This work featured a heterogeneous GCN architecture that integrated the best aspects of predictive text embedding (PTE) and TextGCN to derive document embeddings using compatible graphs across multiple layers. It decomposed TextGCN into simpler models that stored feature embeddings at various layers but had fewer parameters allowing for faster training and better generalization performance when the amount of labeled data was scarce.
Lin et al. (Lin et al., 2024) proposed Heterogeneous Directed Graph Attention Networks (HDGAT), which integrates sentence-transformer, global attention mechanism (GAT), and Squeeze-and-Excitation Network (SENet) based channel attention for multilevel semantic embedding and automatic learning of node connections, for text classification
6.2.5. Document-document Edge Definition
Wang et al. (Wang et al., 2022b) improved the performance of text classification by integrating a rich source of graph edge information of the entire text corpus. For the text graph, they used Word2Vec and Doc2Vec embeddings as word and document node features, respectively. They defined document-document edges in addition to word-word and word-document edges. Weights for word-word and document-document edges were inversely proportional to the distance between the feature values of the nodes linking them while word-document edges were represented using TF-IDF values. The generated graph was then trained with their proposed model, which considered the edge features as multi-stream signals, with each stream performing a separate graph convolutional operation. Pooling was used at the output layer to further synthesize the multi-stream features of each node for final classification. In multi-task text classification, the goal is to classify text data into multiple categories by sharing the knowledge and features learned from different related tasks. Thus, improving the performance of each task by leveraging the similarities and differences among them and reducing the amount of labeled data required for each task. Marreddy et al. (Marreddy et al., 2022) proposed a novel semi-supervised multi-task text classification framework to address the challenges of applying GCN to low-resource languages such as Telugu. It comprised mainly of a graph autoencoder (GAE) and a multi-task GCN. The GAE learned low-dimensional word and sentence graph embeddings from word-sentence graph reconstruction, whereas the multi-task text GCN performed multi-task text classification using these latent sentence graph embeddings. In addition, their approach achieved significant improvements on four text classification tasks, including sentiment analysis, emotion identification, hate speech, and sarcasm detection.
6.2.6. Neighborhood-level Contrastive Learning
Xiao et al. (Xiao et al., 2024) proposed a simple and efficient Neighbors-to-Neighbors Contrastive GCN (NNC-GCN) for semi-supervised classification. It built consistent multi-views using topologies of original input graphs and used an improved version of Info Noise Contrastive Estimation (InfoNCE) (Oord et al., 2018) loss function. InfoNCE was adapted to neighborhood-level contrast learning by weighting and treating the neighborhoods and remaining nodes of the selected anchor as positive and negative sample sets.
6.3. Self-supervised Text Classification
In self-supervised conditions, the model is able to learn a meaningful representation of input documents without any explicit supervision through training labels. Instead, the model is trained to predict certain hidden features of the input document based on some other unhidden aspect of the same document. More specifically, the model solves a pretext task from unlabeled data, such as predicting the next sentence in a document or reconstructing a corrupted version of the same document. By doing so, the model can learn to capture important semantic and syntactic features of the documents, which can be useful for downstream tasks such as text classification.
6.3.1. Multimodal Representation Learning
The GCNW-FL by Zhu et al. (Zhu et al., 2021a) learned multimodal word representations using GCNs by harnessing their ability to capture the relationships between different language modalities, such as phonetics and syntax. To train their model, they used a greedy strategy to update the modality-relation matrix in the GCN and effectively learn multimodal word representations by predicting the context of words from their phonetic and syntactic information. They evaluated it on downstream text classification task, and demonstrated its efficacy at capturing rich semantics through the learned word representations.
6.3.2. Contrastive Learning with Augmentation Strategies
Yang et al. (Yang et al., 2022) obtained a robust node representation through contrastive learning using noise and centrality-based augmentations. This enabled them to preserve essential connections between nodes while also reducing noise at the same time. They used nodes with the same label as multiple positive samples and assigned them to the anchor node while applying consistency training on unlabeled nodes to constrain model predictions. They used random node sampling for more efficient resource utilization while computing the contrastive loss.
6.3.3. Pre-trained Language Model Integration
Similar to previously discussed inductive methods (Zhang et al., 2020; Wang et al., 2022a), Wu et al. (Wu et al., 2023) addressed the limitation posed by the transductive nature of most GCN-based models and the challenge of deploying a GCN-based model in an online system, where new data is added continually, and the model needs to be updated to account for this change. They proposed a new ‘all-token-any-document (ATAD)’ paradigm that uses the vocabulary of a pre-trained language model such as BERT (Devlin et al., 2018), RoBERTa (Liu et al., 2019), or XLNet (Yang et al., 2019) to dynamically update the connections between documents and tokens in the graph, allowing the model to predict for previously unseen documents. They introduced a method for online updating without the need for labels. This approach fine-tuned an occurrence memory module and efficiently updated the network parameters using a self-supervised contrastive learning objective.
6.4. Weakly Supervised Approaches
Weak supervision is a form of machine learning where the model is trained using limited, or imprecise labels. Instead of having access to a fully labeled dataset, weak supervision leverages various forms of indirect supervision to construct approximate labels. These forms can include domain knowledge, heuristics, and other semi-automated methods. Weakly supervision is particularly useful in scenarios where obtaining fully labeled data is challenging, such as in large-scale datasets or specialized domains like natural language processing.
6.4.1. Multiple Instance Learning
Multiple Instance Learning (MIL) is a weakly supervised learning framework where the model receives a set of labeled bags, each containing multiple instances. The label is provided at the bag level, not the instance level, which means that the model must learn to predict the bag label based on the instances within it. MIL is particularly effective in scenarios where instance-level labels are not available but bag-level labels are. The MIL framework naturally fits various problem settings and as a result, it has been applied to various domains such as computer vision (Wu et al., 2014, 2015a; Babenko et al., 2010), natural language processing (Angelidis and Lapata, 2018; Pappas and Popescu-Belis, 2014; Amplayo et al., 2021; Wang et al., 2016b), anomaly detection (Sultani et al., 2018; Quellec et al., 2016), remote sensing (Liu et al., 2017b; Wang et al., 2011), and medical image analysis (Xu et al., 2014; Ilse et al., 2018; Liu et al., 2018). In text classification, MIL can be applied by considering documents as bags and sentences or paragraphs as instances. The goal is to classify the entire document (bag) based on the information contained within its sentences or paragraphs (instances). Traditional MIL approaches often treat instances as independent and identically distributed (i.i.d.), which overlooks the structural relationships between them. Thus, the combination of MIL with GCNs can be potentially advantageous in applications where preserving complex relationships between instances within bags is essential, such as text categorization (Tu et al., 2019), medical imaging (Yin et al., 2019), and speech classification (Zhang et al., 2021b).
Tu et al. (Tu et al., 2019) integrated MIL with GCNs by treating each bag as a graph, with instances as nodes connected by edges representing their relationships. A GCN was applied to this graph to learn node embeddings that captured the structural information of the instances. These embeddings were then aggregated to form a representation of the entire bag, which was used by a classifier to predict the bag-level label. By capturing the structural information within the bag, this method provided a more nuanced representation, improving classification performance.
6.5. Discussion
The landscape of GCN approaches for text classification has evolved significantly, marked by a transition from traditional supervised methods to more sophisticated semi-supervised and self-supervised techniques. Initially, supervised approaches dominated the field, focusing on leveraging labeled data to train models. These early models, such as TextGCN (Yao et al., 2019), TL-GNN (Huang et al., 2019) and SGC (Wu et al., 2019), aimed to optimize computational efficiency and reduce complexity while enhancing classification performance.
As the field progressed, researchers explored multigraph approaches like TensorGCN (Liu et al., 2020), which utilized multiple graphs to capture richer contextual information, such as semantic and syntactic structures. This shift allowed models to better understand nuanced text relationships, leading to improved classification accuracy. Inductive approaches such as TextING (Zhang et al., 2020) and InducT-GCN (Wang et al., 2022a) also emerged to address the limitations of transductive methods, enhancing the models’ ability to generalize to unseen data without retraining. The development of multilabel classification techniques, including HBLA (Cai et al., 2020) and GCN-BERT (Liu, 2022), marked another significant advancement. These models handled the complexity of assigning multiple labels to a single instance, addressing challenges like label correlation and class imbalance. Recent innovations introduced specialized subcategories, like multilingual models (e.g., MSA-GCN (Mercha et al., 2024)) and hierarchical models (e.g., AMKI-HTC (Feng et al., 2024)), expanding the scope of GCN applications to more complex and varied classification tasks.
In subsequent years, there was a noticeable shift towards semi-supervised methods, motivated by the need to leverage large amounts of unlabeled data, which is often more readily available than labeled datasets. Notably, HeteGCN (Ragesh et al., 2021) demonstrated consistent high performance across various datasets, solidifying the importance of semi-supervised approaches in achieving robust and scalable text classification. More recently, the research focus has shifted towards self-supervised learning, reflecting a broader trend in machine learning towards minimizing the dependency on labeled data. Advanced architectures like Cont-GCN-BERT (Wu et al., 2023) represent the cutting edge research, utilizing self-generated supervisory signals to train models. These methods have shown remarkable performance by integrating self-supervised learning with powerful pre-trained language models.
Overall, there is a consistent trend of performance improvement across benchmarks. Early models focused on addressing basic computational and memory constraints, while later models incorporated more complex architectural innovations to capture richer contextual information and handle diverse classification challenges effectively. The semi-supervised and self-supervised approaches reflect a growing emphasis on scalability and adaptability, allowing GCNs to be applied to larger and more varied datasets without relying on labels.
7. Performance Comparison
To evaluate and compare different GCN-based approaches, we adopted a rigorous methodology:
-
•
Selection Criteria: Approaches were selected based on their relevance, innovation, impact, and citations. Both foundational and recent high-impact works are included.
-
•
Datasets: We utilized widely recognized benchmark datasets to ensure consistent and meaningful comparisons across methods. Details of these datasets are provided in Section 7.1.
-
•
Metrics: While accuracy scores are the most abundantly reported metric across all approaches, macro-averaged F1 scores have also been considered wherever possible to highlight notable trends across different categories. We primarily reported test accuracy and sometimes test F1 scores in our analysis, but a more holistic discussion on metrics is provided in Section 7.2.
-
•
Experimental Setup: We reviewed the experimental setups reported in the literature to ensure comparisons are fair, considering factors like training data, model configurations, and evaluation protocols. Some methods used different train-test splits, which we have reported accordingly.
-
•
Comparative Analysis: Comparisons have been made by evaluating approaches within each category, namely supervised, semi-supervised, and self-supervised.
7.1. Datasets
In this section, we provide specifics on some of the most widely used datasets in relevant literature that have been used to benchmark the text classification performance of GCN methods. As results on these datasets have been reported across numerous studies, we can use them to provide a more meaningful comparison of approaches to highlight their relative strengths and limitations. Their statistics in the standard configuration (Yao et al., 2019) are also summarized in Table 3.
-
•
20 NG: The 20 NG dataset contains 18,846 newsgroup documents evenly categorized into 20 different categories, covering a broad spectrum of topics such as sports, politics, technology, and religion, among others. In total, 11,314 documents are in the training set and 7,532 documents are in the test set. In addition to text classification, this dataset has also been used for text clustering and out-of-distribution detection.
-
•
Reuters: This dataset is a collection of documents that appeared on Reuters newswire in 1987 and has been widely used for text classification. R8 and R52 are popular subsets of the Reuters dataset. The former has 8 news categories split into 5,485 training and 2,189 test documents while the latter contains 52 categories, split into 6,532 training and 2,568 test documents.
-
•
Movie Review (MR): The MR dataset is comprised of movie reviews and used primarily for sentiment analysis, i.e., whether a review is negative or positive. It contains 5,331 positive and 5,331 negative reviews.
-
•
Ohsumed: Ohsumed collects medical abstracts tagged by one or multiple classes from 23 cardiovascular disease categories from the MEDLINE database. Since most literature focuses on single-class classification, out of these only 7,400 single-category documents are retained, out of which 3,357 make up the training set and 4,043 documents are in the test set.
-
•
CoLA: The CoLA (Corpus of Linguistic Acceptability) dataset is a collection of English sentences labeled for grammatical acceptability. The labels are binary (i.e., a sentence is either linguistically acceptable or it is not). This dataset was introduced by Warstadt et al. (Warstadt et al., 2019) and consists of 10,000 sentences from various sources, including linguistic literature, standardized tests, and online forums. The publically available version has 9594 sentences in training and development sets and 1063 sentences in the test set.
-
•
SST-2: First introduced by Socher et al. (Socher et al., 2013), this variant of the Stanford Sentiment Treebank (SST) dataset is a collection of movie reviews labeled with binary sentiment classification (positive or negative). The reviews are parsed into constituency trees and labeled with sentiment annotations for each sub-phrase, allowing for fine-grained analysis of the sentiment of the text. The dataset consists of over 11,000 sentences from movie reviews and is widely used as a benchmark dataset for evaluating the performance of models on text classification tasks.
Dataset | Year | Docs | Train | Test | Words | Nodes | Classes | Avg. Length |
20 NG | 1995 | 18,846 | 11,314 | 7,532 | 42,757 | 61,603 | 20 | 221.3 |
R8 | 2004 | 7,674 | 5,485 | 2,189 | 7,688 | 15,362 | 8 | 65.7 |
R52 | 2004 | 9,100 | 6,532 | 2,568 | 8,892 | 17,992 | 52 | 69.8 |
Ohsumed | 1994 | 7,400 | 3,357 | 4,043 | 14,157 | 21,557 | 23 | 135.8 |
MR | 2005 | 10,662 | 7,108 | 3,554 | 18,764 | 29,426 | 2 | 20.4 |
CoLA (Warstadt et al., 2019) | 2019 | 9,594 | 8,551 | 1,043 | - | - | 2 | 7.7 |
SST-2 (Socher et al., 2013) | 2013 | 9,613 | 7,792 | 1,821 | - | - | 2 | 19.3 |
These datasets have been used as benchmarks for a plethora of applications in addition to text classification. Some of these have been summarized in Table 4.
Dataset | Task | Notable architectures | |
20 NG | Text Clustering, Topic Modelling | G-BAT (Wang et al., 2020) | |
Out-of-distribution Detection | 2-Layered GRU (Thulasidasan et al., 2021) | ||
Reuters | Multi-label Text Classification | HiddeN (Chatterjee et al., 2021) | |
Movie Review | Sentiment Analysis | VLAWE (Ionescu and Butnaru, 2019), EFL (Wang et al., 2021a) | |
Few-shot Learning | DART (Zhang et al., 2021a) | ||
Ohsumed | Information Retrieval | BERT+CONCEPT FILTER (Dervakos et al., 2021) | |
CoLA | Linguistic Acceptability |
|
|
SST-2 | Sentiment Analysis | T5-11B (Raffel et al., 2020b), MT-DNN-SMART (Jiang et al., 2019b) | |
Few-shot Learning | DART |
7.2. Metrics
This section serves as a primer for various metrics that have been used for text classification in literature. In the definitions that follow , , , have been used to denote true positive, false positive, true negative, and false negative, respectively: Accuracy = (TP+TN)/(TP+FP+FN+TN), Precision = TP/(TP+FP), Recall = TP/(TP+FN), F1-score = (2.Recall.Precision)/(Recall + Precision). In literature, primarily the test accuracy has been used to evaluate the performance of various models. However, F1-score, Precision, and Recall are also used to evaluate an approach depending on the exact nature of the application and distribution of a dataset.
In general, accuracy is a good measure when we have a balanced class distribution, whereas the F1-score is usually best for imbalanced classes. Precision is preferable when the goal is to optimize for True Positives while one should choose Recall if the occurrence of false positives is more desirable than the occurrence of false negatives. The F1-score represents a balance of sorts between Precision and Recall as it is derived from their harmonic mean. For multi-class classification problems, most text classification methods compute precision, recall, or F1-score for each class and compute its macro average. Such averaging considers each class equally important and balances out the effect of majority classes.
7.3. Analysis of Results
This section compares the performance of various methods discussed in this review over a range of benchmark datasets. Primarily, comparisons have been made by evaluating approaches within each category, namely supervised, semi-supervised, and self-supervised. However, cross-category comparisons have also been presented for a more thorough understanding of the relative strengths and weaknesses of these approaches. While accuracy scores are the most abundantly reported metric across all of these approaches, macro-averaged F1-scores have also been considered wherever possible to highlight any notable trends across different categories. Metrics for all GCN methods have been categorically presented under different conditions in Tables 5 to 7. For each of these tables, if a set of results has been obtained under a different train/test split than in (Yao et al., 2019), its details can be found in the table notes. If for a given split, results have been obtained from papers other than the original paper of that approach, those have also been cited next to the method name.
[b] Method Year 20 NG R8 R52 Ohsumed MR CoLA SST-2 Acc. F1 Acc. F1 Acc. F1 Acc. F1 Acc. F1 Acc. F1 Acc. F1 TextGCN (Yao et al., 2019) 2019 86.3 85.6 97.1 92.4 93.6 65.2 68.4 59.1 76.7 76.8 52.3 52.3 82.4 80.5 TextGCN1 (Yao et al., 2019) 2019 77.6 85.6 81.4 50.9 60.5 TextGCN3 (Yao et al., 2019) 2019 80.9 80.5 94.0 78.3 89.4 47.3 56.3 36.7 74.6 74.5 TextGCN4 (Yao et al., 2019) 2019 11.9 86.3 48.5 16.1 62.2 TextGCN5 (Yao et al., 2019) 2019 40.2 23.2 70.3 70.2 TextGCN6 (Yao et al., 2019) 2019 41.6 27.4 59.1 59.0 TextGCN7 (Yao et al., 2019) 2019 91.2 78.9 22.3 53.4 SGC (Wu et al., 2019) 2019 88.5 97.2 94.0 68.5 75.9 SGC7 (Wu et al., 2019) 2019 89.6 77.3 24.7 60.2 TL-GNN (Huang et al., 2019) 2019 85.9 97.8 95.9 94.6 92.5 69.4 54.0 76.4 76.1 VGCN-BERT (Lu et al., 2020) 2020 55.8 98.0 95.4 95.9 70.2 86.4 86.4 83.7 80.5 91.9 91.9 TensorGCN (Liu et al., 2020) 2020 87.7 98.0 95.1 70.1 77.9 TensorGCN1 (Liu et al., 2020) 2020 78.6 86.2 82.3 52.2 61.3 TextING (Zhang et al., 2020) 2020 82.5 98.1 95.7 70.8 80.2 TextING5 (Zhang et al., 2020) 2020 41.8 24.9 69.9 69.7 TextING7 (Wang et al., 2022a) 2020 86.5 74.7 30.3 61.2 SSGC (Zhu and Koniusz, 2020) 2021 88.6 97.4 94.5 68.5 76.7 BERT-GCN (Lin et al., 2021) 2021 89.3 98.1 96.6 72.8 86.0 RoBERTa-GCN (Lin et al., 2021) 2021 89.5 98.2 96.1 72.8 89.7 NMGC-2 (Lei et al., 2021) 2021 86.6 97.3 94.4 69.2 76.2 TGCN-Bi-LSTM (Yang et al., 2021a) 2021 93.0 92.7 97.6 94.0 94.7 72.9 72.2 68.4 TGCN-C-LSTM (Yang et al., 2021a) 2021 93.2 93.0 97.6 93.6 94.4 71.3 72.6 68.4 TGCN-ServeNet (Yang et al., 2021a) 2021 92.9 92.7 97.9 94.6 94.9 74.2 72.1 68.4 MGCN (Xue et al., 2021) 2021 87.4 80.3 92.3 GFN (Dai et al., 2022) 2022 87.0 86.3 98.2 95.5 95.3 74.6 70.2 60.3 78.0 77.8 IMGCN (Xue et al., 2022) 2022 98.3 96.5 87.8 84.4 80.9 92.5 GCN-CNN (Zeng et al., 2022) 2022 98.5 96.4 71.9 87.6 BiGRU+GCN (Dong et al., 2022) 2022 86.8 85.5 97.1 93.4 93.9 70.7 68.4 62.2 77.6 77.5 InducT-GCN (Wang et al., 2022a) 2022 96.5 95.4 93.2 92.8 67.8 66.9 75.4 75.9 InducT-SGC7 (Wang et al., 2022a) 2022 90.5 80.5 31.1 60.2 InducT-GCN7 (Wang et al., 2022a) 2022 91.6 81.4 35.6 60.4 HINT-G (Li et al., 2023) 2023 87.7 98.2 95.0 72.7 78.2 GTG (Liu et al., 2023) 2023 87.0 85.7 97.2 93.7 94.5 71.2 69.7 62.8 77.2 77.0 LDGCN (Wang et al., 2023a) 2023 87.8 87.8 98.3 97.3 95.7 92.7 70.9 59.1 78.3 78.2
-
1
20 labeled data per class as reported in (Li et al., 2021)
-
3
20% stratified sample of training documents as reported in (Ragesh et al., 2021)
-
4
1-99 train/test split as reported in (Wang et al., 2022b)
-
5
10-90 train/test split as reported in (Cui et al., 2022)
- 6
-
7
5-95 train/test split as reported in (Wang et al., 2022a)
[b] Method Year 20 NG R8 R52 Ohsumed MR Acc. F1 Acc. F1 Acc. F1 Acc. F1 Acc. F1 STGCN (Ye et al., 2020) 2020 97.2 78.2 STGCN+BiLSTM (Ye et al., 2020) 2020 86.6 85.4 97.4 94.3 94.2 71.0 69.2 62.3 78.5 78.2 STGCN+BERT+BiLSTM (Ye et al., 2020) 2020 98.5 82.5 TextGTL1 (Li et al., 2021) 2021 80.1 87.1 83.2 54.1 62.4 ZS-TC2 (Liu et al., 2021) 2021 69.0 HeteGCN (F-X)3 (Ragesh et al., 2021) 2021 84.6 84.0 97.2 92.3 93.9 66.5 63.8 50.2 75.6 75.6 HeteGCN (X-TX-X)3 (Ragesh et al., 2021) 2021 84.1 83.4 97.3 92.9 93.2 61.3 65.3 54.0 75.5 75.5 HeteGCN (TX-X)3 (Ragesh et al., 2021) 2021 84.8 84.3 97.1 92.0 93.7 66.0 65.8 57.1 76.1 76.1 HeteGCN (F-X) (Ragesh et al., 2021) 2021 87.2 86.6 97.2 93.0 94.4 68.4 68.1 60.6 76.7 76.7 HeteGCN (X-TX-X) (Ragesh et al., 2021) 2021 86.3 85.6 97.3 93.4 93.3 56.6 66.7 58.0 77.6 77.6 HeteGCN (TX-X) (Ragesh et al., 2021) 2021 86.6 86.0 97.5 93.9 93.8 65.3 68.9 61.8 76.5 76.5 HGAT6 (Yang et al., 2021b) 2021 42.7 24.8 62.8 62.4 MP-GCN (Zhao et al., 2022) 2022 86.8 97.8 94.5 70.3 77.9 ME-GCN4 (Wang et al., 2022b) 2022 28.6 86.8 78.3 27.4 68.1 ST-TextGCN5 (Cui et al., 2022) 2022 42.4 25.1 72.4 72.4 NNC-GCN (Xiao et al., 2024) 2024 97.6 94.3
-
1
20 labeled data per class as reported in (Li et al., 2021)
-
2
25% of classes unseen as reported in (Liu et al., 2021)
-
3
20% stratified sample of training documents as in (Ragesh et al., 2021)
-
4
1-99 train/test split as reported in (Wang et al., 2022b)
-
5
10-90 train/test split as reported in (Cui et al., 2022)
- 6
[b] Method Year 20 NG R8 R52 Ohsumed MR IMDB Yelp Acc. Acc. F1 Acc. F1 Acc. F1 Acc. F1 Acc. Acc. GCNW-FL (Zhu et al., 2021a) 2021 68.5 62.7 CGA2-TC (Yang et al., 2022) 2022 97.8 94.2 94.5 73.3 70.6 65.0 77.8 77.3 CGA2-TC6 (Yang et al., 2022) 2022 50.7 41.9 52.0 51.8 Cont-GCN-BERT (Wu et al., 2023) 2023 89.4 98.3 96.9 73.1 86.4 Cont-GCN-XLNet (Wu et al., 2023) 2023 89.7 98.5 97.0 73.1 88.7 Cont-GCN-RoBERTa (Wu et al., 2023) 2023 90.1 98.6 96.6 73.4 91.3
- 6
For text classification, Deferrard et al. (Defferrard et al., 2016) applied their approach to the 20 NG dataset. While it was only able to attain second place next to the multinomial Naive Bayes classifier with an accuracy of 68.3, it outperformed traditional fully connected networks while having fewer parameters. This approach was later formalized and further generalized for improved scalability and classification performance in large-scale networks by (Kipf and Welling, 2016a). (Yao et al., 2019) focused exclusively on text classification and proposed a GCN based on (Kipf and Welling, 2016a) and formulated using a novel heterogeneous graph approach. Experiments on 20 NG, R8, R52, MR, and Ohsumed demonstrated that their proposed method was able to outperform existing state-of-the-art solutions without relying on any pre-trained word embeddings, especially when training data was scarce. This would provide the basis and serve as a frequent benchmark for all subsequent approaches. Works that immediately followed, such as (Huang et al., 2019) and (Wu et al., 2019), sought to optimize the approach in (Yao et al., 2019), and while there was only an incremental improvement in classification performance, it was clear that proposing more computationally efficient, scalable, and robust models was their primary goal. Later, (Zhu and Koniusz, 2020) additionally optimized this approach for handling over-smoothing due to a higher number of graph convolutions using the Markov Diffusion Kernel, while more recently, (Wang et al., 2023a) opted for a different route by addressing local intra-class diversity and local inter-class similarity that are implicitly encoded within the graph structure. In general, these approaches improved TextGCN performance in different ways without relying on pre-trained embeddings or any extrinsic knowledge sources. These are, however, primarily transductive in nature. In terms of test accuracy, the most notable improvements in classification performance came when researchers attempted to rework (Yao et al., 2019) by either enriching the graph representation to capture more textual context (Liu et al., 2020) or by augmenting the GCN with other models/embeddings such as BERT (Lu et al., 2020), (Lin et al., 2021) to achieve a solution that was, in theory, greater than the sum of its parts. Among these divergent approaches, the latter yielded the most promising results. In this regard, (Lin et al., 2021) has generally attained SOTA performance across multiple datasets (Table 5).
Another line of research attempted to extend the existing transductive GCN-based approaches to inductive settings to allow for online testing, that is to generalize patterns and relationships from the training data to make accurate predictions on new, unseen instances (Zhang et al., 2020), (Wang et al., 2022a). These methods were able to demonstrably outperform TextGCN under inductive constraints such as limited labeled data. InducT-GCN, in particular, bested various GCN-based baselines and certain models using pre-trained embeddings owing to its better generalization capabilities. Similar to inductive approaches, semi-supervised approaches also demonstrate their efficacy with limited labeled datasets as they are able to compensate for this constraint by extracting and supplementing additional information from unlabeled data and improve model performance. In literature, researchers have employed a variety of subsetting and sampling techniques to demonstrate the effectiveness of their proposed models in limited labeled settings. While this makes it difficult to draw a definitive conclusion about the overall state of semi-supervised approaches, we can still gain valuable insights by considering common baselines.
As graph convolutions enable information sharing among neighboring nodes through the inherent message propagation mechanism, GCN models can transfer knowledge from labeled nodes to unlabeled ones and essentially leverage the underlying graph structure to improve their ability to make predictions on the unlabeled data. While this was successfully demonstrated early on for citation networks in (Kipf and Welling, 2016a), semi-supervised GCN approaches for modeling free text are fairly recent. The authors of (Li et al., 2021) claimed their approach to be the first to model free text under strict semi-supervised conditions and demonstrated their model’s performance gain over (Yao et al., 2019) and (Liu et al., 2020) when using 20 labeled samples per class for training. Different variations of (Ragesh et al., 2021) were also able to outperform (Yao et al., 2019) across the board with regards to test accuracy and f1 scores, albeit under different limited labeled data conditions (20% stratified sample of training documents as in (Yao et al., 2019)). However, the performance gains were not as pronounced in the large labeled data scenario. The same was also true for the approach in (Zhao et al., 2022), which reported slight gains over (Ragesh et al., 2021) under the same large labeled conditions. (Yang et al., 2021b), (Wang et al., 2022b), and (Cui et al., 2022) also exhibited improved performance over (Yao et al., 2019) across multiple datasets under their respective limited labeled conditions (Table 6). Self-supervised GCN techniques for text classification have also begun to gain prominence in recent years. Like semi-supervised approaches, these are also able to leverage unlabeled data to improve model performance. (Yang et al., 2022) was able to attain performance comparable to the best supervised approach that did not use pre-trained embeddings in the large labeled setting and that of (Yang et al., 2021b) in the same limited labeled setting. Moreover, (Wu et al., 2023) utilized various pre-trained language models along with their proposed ATAD scheme and reportedly outperformed all aforementioned methods in offline settings while also faring similarly well in online settings (Table 7). However, it should be noted that this approach is very recent and requires further validation and scrutiny by the research community at large. Nonetheless, it offers valuable insight into the trajectory of research in this domain and could be a promising avenue for future exploration.
8. Conclusion and Future Research Directions
Graph Convolutional Networks hold great promise for addressing text classification, as they have demonstrated impressive results in various studies and benchmarks. However, there are still many challenges and research directions to explore in order to improve their effectiveness and efficiency in this domain.
Deep graph learning with limited-labeled data or noisy data can hinder the performance of GCNs and their generalization in real-world scenarios. Conventional data augmentation techniques often fall short in addressing data scarcity and noise in graph structures. Researchers are exploring specialized augmentation methods, such as transforming the graph adjacency matrix or the node feature matrix, or using label enrichment (Ding et al., 2022). However, most existing approaches rely on handcrafted strategies based on performance on downstream tasks like text classification or recommendation (Volokhin et al., 2023), limiting their practical use without abundant labeled data. Developing dynamic data augmentation algorithms that automatically apply optimal transformations and perturbations is crucial. Additionally, preserving the semantics encoded within the graph structure is essential. In text graphs, nodes represent documents and words, while edges represent relationships based on criteria like word co-occurrence, TF-IDF, or semantic similarity. Effective augmentation strategies should maintain these semantics, potentially guided by pre-trained word embeddings (e.g., BERT, GPT), ontologies like WordNet, and syntactic and semantic rules. Graph diffusion algorithms can also facilitate data augmentation by iteratively updating node features based on neighbors, exploiting the global structure knowledge of the graph. This approach generates augmented samples with updated node features, enriching the dataset and improving the model’s robustness to variations in information propagation, enhancing its understanding of the graph’s structure.
Like graph convolution, graph diffusion also utilizes the graph structure to model information propagation and relationships between nodes. However, unlike the former which captures local patterns and relationships by aggregating information from a node’s immediate neighbors, graph diffusion focuses on integrating global influences and properties across the entire graph. Such local and global information integration is crucial for understanding the nuances of language and particularly beneficial in text classification tasks, as demonstrated by combining GCN with BERT models (Lu et al., 2020; Lin et al., 2021). Recent studies suggest implementing diffusion either as a preprocessing step to enhance the initial graph structure (Gasteiger et al., 2019) or as an adaptation within the GCN architecture (Jiang et al., 2019c), promoting a more robust and context-aware learning process. These works report improvements in node classification, underscoring the potential for improved performance on complex text classification tasks.
For instance, improved data augmentation and diffusion techniques could significantly enhance performance in sentiment analysis, topic detection, and document classification. By effectively handling noisy and limited-labeled data, GCNs can be more robust in real-world applications, such as in large-scale social media monitoring, customer feedback analysis, and automated content moderation.
Integrating GCNs with LLMs like GPT is another promising direction. GCNs capture structural information in text data, while LLMs excel in understanding and generating human-like text. Combining these strengths can help create more robust, context-aware models. GCNs can enhance LLMs by embedding word and sentence relationships into the learning process, improving text classification, sentiment analysis, and other NLP tasks. This approach is especially effective in domains needing a deep understanding of semantic content and contextual relationships, such as legal document analysis, biomedical text mining, and social media analytics.
In the context of privacy protection, GCNs offer unique advantages. As data privacy becomes increasingly important, especially with regulations like GDPR and CCPA, it is crucial to develop models that can operate efficiently while ensuring user data is protected. (Liang et al., 2024) employed GCNs in a novel model that integrates text data and label correlations, utilizing a double-attention mechanism to significantly enhance detection performance for privacy disclosures in online posts. By leveraging GCNs’ ability to understand complex relationships among different types of private information, we can improve privacy detection tools for social media and similar platforms, effectively mitigating potential risks. This also makes GCNs suitable for applications in healthcare, finance, and other sectors handling sensitive information.
References
- (1)
- Adhikari et al. (2019) A Adhikari, A Ram, R Tang, and J Lin. 2019. Docbert: Bert for document classification. arXiv:1904.08398 (2019).
- Aggarwal and Zhai (2012) Charu C Aggarwal and ChengXiang Zhai. 2012. A survey of text classification algorithms. In Mining text data. Springer, 163–222.
- Akhter et al. (2020) M P Akhter, Z Jiangbin, I R Naqvi, M Abdelmajeed, A Mehmood, and M T Sadiq. 2020. Document-level text classification using single-layer multisize filters convolutional neural network. IEEE Access 8 (2020), 42689–42707.
- Altınel and Ganiz (2018) B. Altınel and M. Ganiz. 2018. Semantic text classification: A survey of past and recent advances. Inf. Proc. & Man. 54, 6 (2018), 129–153.
- Amplayo et al. (2021) R. K. Amplayo, S. Angelidis, and M. Lapata. 2021. Aspect-controllable opinion summarization. arXiv:2109.03171 (2021).
- Angelidis and Lapata (2018) S Angelid and M Lapata. 2018. Multiple instance learning networks fine-grained sentiment analysis. TA Comp. Ling. 6 (2018), 17–31.
- Angelova and Weikum (2006) Ralitsa Angelova and Gerhard Weikum. 2006. Graph-based text classification: learn from your neighbors. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. 485–492.
- Babenko et al. (2010) Boris Babenko, Ming-Hsuan Yang, and Serge Belongie. 2010. Robust object tracking with online multiple instance learning. IEEE transactions on pattern analysis and machine intelligence 33, 8 (2010), 1619–1632.
- Badiei et al. (2023) Fatemeh Badiei, Makan Kananian, and S AmirAli Gh Ghahramani. 2023. Text Classification on Imbalanced Data using Graph Neural Networks and Adversarial Weight Balancer. In 2023 Asia-Pacific Conf. on Computer Science and Data Engineering (CSDE). IEEE, 01–06.
- Bahassine et al. (2016) Said Bahassine, Abdellah Madani, and Mohamed Kissi. 2016. An improved Chi-sqaure feature selection for Arabic text classification using decision tree. In 2016 11th International Conference on Intelligent Systems: Theories and Applications (SITA). IEEE, 1–5.
- Basiri et al. (2021) Mohammad Ehsan Basiri, Shahla Nemati, Moloud Abdar, Erik Cambria, and U Rajendra Acharya. 2021. ABCDM: An attention-based bidirectional CNN-RNN deep model for sentiment analysis. Future Generation Computer Systems 115 (2021), 279–294.
- Briskilal and Subalalitha (2022) J Briskilal and CN Subalalitha. 2022. An ensemble model for classifying idioms and literal texts using BERT and RoBERTa. Information Processing & Management 59, 1 (2022), 102756.
- Cai et al. (2020) Linkun Cai, Yu Song, Tao Liu, and Kunli Zhang. 2020. A hybrid BERT model that incorporates label semantics via adjustive attention for multi-label text classification. Ieee Access 8 (2020), 152183–152192.
- Can et al. (2018) E F Can, A E-Can, and F Can. 2018. Multilingual sentiment analysis: An RNN-based framework for limited data. arXiv:1806.04511 (2018).
- Chang et al. (2020) Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, and Inderjit S Dhillon. 2020. Taming pretrained transformers for extreme multi-label text classification. In Proc. ACM SIGKDD international conference on knowledge discovery & data mining. 3163–3171.
- Chatterjee et al. (2021) Soumya Chatterjee, Ayush Maheshwari, Ganesh Ramakrishnan, and Saketha Nath Jagaralpudi. 2021. Joint learning of hyperbolic label embeddings for hierarchical multi-label classification. arXiv:2101.04997 (2021).
- Chen et al. (2009) Jingnian Chen, Houkuan Huang, Shengfeng Tian, and Youli Qu. 2009. Feature selection for text classification with Naïve Bayes. Expert Systems with Applications 36, 3 (2009), 5432–5435.
- Chen et al. (2022) Zaili Chen, Kai Huang, Li Wu, Zhenyu Zhong, and Zeyu Jiao. 2022. Relational graph convolutional network for text-mining-based accident causal classification. Applied Sciences 12, 5 (2022), 2482.
- Chen et al. (2024) Zhikai Chen, Haitao Mao, Hang Li, Wei Jin, Hongzhi Wen, Xiaochi Wei, Shuaiqiang Wang, Dawei Yin, Wenqi Fan, Hui Liu, et al. 2024. Exploring the potential of large language models (llms) in learning on graphs. ACM SIGKDD Explorations Newsletter 25, 2 (2024), 42–61.
- Cui et al. (2022) Hongyan Cui, Gangkun Wang, Yuanxin Li, and Roy E Welsch. 2022. Self-training method based on GCN for semi-supervised short text classification. Information Sciences 611 (2022), 18–29.
- Dadgar et al. (2016) S M H Dadgar, M S Araghi, and M M Farahani. 2016. A novel text mining approach based on TF-IDF and Support Vector Machine for news classification. In 2016 IEEE International Conference on Engineering and Technology (ICETECH). IEEE, 112–116.
- Dahiya et al. (2021) Kunal Dahiya, Ananye Agarwal, Deepak Saini, K Gururaj, Jian Jiao, Amit Singh, Sumeet Agarwal, Purushottam Kar, and Manik Varma. 2021. Siamesexml: Siamese networks meet extreme classifiers with 100M labels. In Int. Conf. on ML. PMLR, 2330–2340.
- Dai et al. (2007) W Dai, G-R Xue, Q Yang, and Y Yu. 2007. Transferring naive bayes classifiers for text classification. In AAAI, Vol. 7. 540–545.
- Dai et al. (2022) Yong Dai, Linjun Shou, Ming Gong, Xiaolin Xia, Zhao Kang, Zenglin Xu, and Daxin Jiang. 2022. Graph fusion network for text classification. Knowledge-based systems 236 (2022), 107659.
- Defferrard et al. (2016) Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems 29 (2016).
- Del Vigna12 et al. (2017) Fabio Del Vigna12, Andrea Cimino23, Felice Dell‚ÄôOrletta, Marinella Petrocchi, and Maurizio Tesconi. 2017. Hate me, hate me not: Hate speech detection on facebook. In Proceedings of the first Italian conference on cybersecurity (ITASEC17). 86–95.
- Dervakos et al. (2021) E Dervakos, G Filandrianos, K Thomas, A Mandalios, C Zerva, and G Stamou. 2021. Semantic Enrichment of Pretrained Embedding Output for Unsupervised IR. In AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering, Vol. 2846.
- Devlin et al. (2018) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018).
- Ding et al. (2022) K Ding, Z Xu, H Tong, and H Liu. 2022. Data augmentation for deep graph learning: A survey. SIGKDD Expl. NL 24, 2 (2022), 61–77.
- Dong et al. (2022) Yonghao Dong, Zhenmin Yang, and Hui Cao. 2022. A Text Classification Model Based on GCN and BiGRU Fusion. In Proceedings of the 8th International Conference on Computing and Artificial Intelligence. 318–322.
- Du et al. (2024) Yingpeng Du, Ziyan Wang, Zhu Sun, Haoyan Chua, Hongzhi Liu, Zhonghai Wu, Yining Ma, Jie Zhang, and Youchen Sun. 2024. Large Language Model with Graph Convolution for Recommendation. arXiv:2402.08859 (2024).
- Feng et al. (2024) Zijian Feng, Kezhi Mao, and Hanzhang Zhou. 2024. Adaptive micro-and macro-knowledge incorporation for hierarchical text classification. Expert Systems with Applications (2024), 123374.
- Gambäck and Sikdar (2017) Björn Gambäck and Utpal Kumar Sikdar. 2017. Using convolutional neural networks to classify hate-speech. In Proceedings of the first workshop on abusive language online. 85–90.
- Gao et al. (2020) Lingchao Gao, Jiakai Wang, Zhixian Pi, Huaixun Zhang, Xiao Yang, Peizhuo Huang, and Jiasong Sun. 2020. A hybrid GCN and RNN structure based on attention mechanism for text classification. In Journal of Physics: Conference Series, Vol. 1575. IOP Publishing, 012130.
- Gao and Huang (2021) Weiqi Gao and Hao Huang. 2021. A gating context-aware text classification model with BERT and graph convolutional networks. Journal of Intelligent & Fuzzy Systems 40, 3 (2021), 4331–4343.
- Gasteiger et al. (2019) J Gasteiger, S Weißenberger, and S Günnemann. 2019. Diffusion improves graph learning. Adv. in neural inf. processing systems 32 (2019).
- Grover and Leskovec (2016) Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855–864.
- Guo et al. (2019) H Guo, Y Mao, and R Zhang. 2019. Augmenting data with mixup for sentence classification: An empirical study. arXiv:1905.08941 (2019).
- Gupta et al. (2021) Atika Gupta, Priya Matta, and Bhasker Pant. 2021. Graph neural network: Current state of Art, challenges and applications. Materials Today: Proceedings 46 (2021), 10927–10932.
- Hajibabaee et al. (2022) P Hajibabaee, M Malekzadeh, M Ahmadi, M Heidari, A Esmaeilzadeh, R Abdolazimi, and H James Jr. 2022. Offensive language detection on social media based on text classification. In 2022 Computing and Communication Workshop and Conference. IEEE, 0092–0098.
- Hamilton et al. (2017) W Hamilton, Z Ying, and J Leskovec. 2017. Inductive representation learning on large graphs. Adv. in Neu. inf. proc. sys. 30 (2017).
- Han et al. (2022) S Han, Z Yuan, K Wang, Long, and Poon. 2022. Understanding graph convolutional networks for text classification. arXiv:16060 (2022).
- Hassan and Mahmood (2018) A R Hassan and A Mahmood. 2018. Conv. recurrent deep learning model for sentence classification. Ieee Access 6 (2018), 13949–13957.
- He et al. (2020b) P He, X Liu, J Gao, and W Chen. 2020b. Deberta: Decoding-enhanced bert with disentangled attention. arXiv:2006.03654 (2020).
- He et al. (2024) Yao He, Qingyu Xiong, Cai Ke, Yaqiang Wang, Zhengyi Yang, Hualing Yi, and Qilin Fan. 2024. MCICT: Graph convolutional network-based end-to-end model for multi-label classification of imbalanced clinical text. Bio. Sig. Proc. and Control 91 (2024), 105873.
- Hemalatha et al. (2013) I Hemalatha, GP Saradhi Varma, and A Govardhan. 2013. Sentiment analysis tool using machine learning algorithms. International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) 2, 2 (2013), 105–109.
- Huang et al. (2019) L Huang, D Ma, S Li, X Zhang, and Houfeng Wang. 2019. Text level graph neural network for text classification. arXiv:1910.02356 (2019).
- Huang et al. (2022) Yen-Hao Huang, Yi-Hsin Chen, and Yi-Shin Chen. 2022. ConTextING: Granting Document-Wise Contextual Embeddings to Graph Neural Networks for Inductive Text Classification. In Proc. Int. Conf. on Computational Linguistics. 1163–1168.
- Igamberdiev and Habernal (2021) T Igamberdiev and I Habernal. 2021. Privacy-preserving graph convolutional networks for text classification. arXiv:2102.09604 (2021).
- Ikonomakis et al. (2005) M Ikonomakis, S Kotsiantis, and V Tampakas. 2005. Text classification using machine learning techniques. WSEAS Tr. Comp. 4, 8 (2005).
- Ilse et al. (2018) M Ilse, J Tomczak, and Max Welling. 2018. Attention-based deep multiple instance learning. In Int. Conf. on M. L.. PMLR, 2127–2136.
- Ionescu and Butnaru (2019) R Ionescu and A Butnaru. 2019. Vector of locally-aggregated word embeddings a novel document representation. arXiv:08850 (2019).
- Jiang et al. (2019c) Bo Jiang, Doudou Lin, Jin Tang, and Bin Luo. 2019c. Data representation and learning with graph diffusion-embedding networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10414–10423.
- Jiang et al. (2019b) Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Tuo Zhao. 2019b. Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization. arXiv:1911.03437 (2019).
- Jiang et al. (2021) T Jiang, D Wang, L Sun, H Yang, Z Zhao, and F Zhuang. 2021. Lightxml: Transformer with dynamic negative sampling for high-performance extreme multi-label text classification. In Pro. AAAI conference on artificial intelligence, Vol. 35. 7987–7994.
- Jin et al. (2021b) Di Jin, Xiangchen Song, Zhizhi Yu, Ziyang Liu, Heling Zhang, Zhaomeng Cheng, and Jiawei Han. 2021b. Bite-gcn: A new gcn architecture via bidirectional convolution of topology and features on text-rich networks. In ACM Int. Conf. Web Search and Data Mining. 157–165.
- Joulin et al. (2016) A Joulin, E Grave, P Bojanowski, and T Mikolov. 2016. Bag of tricks for efficient text classification. arXiv:1607.01759 (2016).
- Kadhim (2019) A Kadhim. 2019. Survey on supervised machine learning techniques for automatic text classification. Art. Int. Rev. 52, 1 (2019), 273–292.
- Kaibi et al. (2019) Ibrahim Kaibi, Hassan Satori, et al. 2019. A comparative evaluation of word embeddings techniques for twitter sentiment analysis. In 2019 International Conference on Wireless Technologies, Embedded and Intelligent Systems (WITS). IEEE, 1–4.
- Karajeh et al. (2023) Ola Karajeh, Ismini Lourentzou, and Edward A Fox. 2023. Multi-view graph-based text representations for imbalanced classification. In International Conference on Theory and Practice of Digital Libraries. Springer, 249–264.
- Kim and Nam (2006) Y Kim and T Nam. 2006. An efficient text filter for adult web documents. In 2006 Int. Conf. Adv. Comm. Tech., Vol. 1. IEEE, 3–pp.
- Kipf and Welling (2016a) Thomas N Kipf and Max Welling. 2016a. Semi-supervised classification with graph convolutional networks. arXiv:1609.02907 (2016).
- Kipf and Welling (2016b) Thomas N Kipf and Max Welling. 2016b. Variational graph auto-encoders. arXiv:1611.07308 (2016).
- Kowsari et al. (2019) Kamran Kowsari, Kiana Jafari Meimandi, Mojtaba Heidarysafa, Sanjana Mendu, Laura Barnes, and Donald Brown. 2019. Text classification algorithms: A survey. Information 10, 4 (2019), 150.
- Krishnalal et al. (2010) G Krishnalal, S Babu Rengarajan, and KG Srinivasagan. 2010. A new text mining approach based on HMM-SVM for web news classification. International Journal of Computer Applications 1, 19 (2010), 98–104.
- Kudo and Richardson (2018) Taku Kudo and John Richardson. 2018. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv:1808.06226 (2018).
- Le and Mikolov (2014) Q Le and T Mikolov. 2014. Distributed representations of sentences and documents. In Int. Conf. on machine learning. PMLR, 1188–1196.
- Lei et al. (2021) F Lei, X Liu, Z Li, Q Dai, and S Wang. 2021. Multihop neighbor information fusion graph convolutional network for text classification. Mathematical Problems in Engineering 2021 (2021), 1–9.
- Leistner et al. (2010) C Leistner, A Saffari, and H Bischof. 2010. MIForests: Multiple-instance learning with randomized trees. In ECCV. 29–42.
- Li et al. (2021) Chen Li, Xutan Peng, Hao Peng, Jianxin Li, and Lihong Wang. 2021. TextGTL: Graph-based Transductive Learning for Semi-supervised Text Classification via Structure-Sensitive Interpolation.. In IJCAI. 2680–2686.
- Li et al. (2023) Hui Li, Yan Yan, Shuo Wang, Juan Liu, and Yunpeng Cui. 2023. Text classification on heterogeneous information network via enhanced GCN and knowledge. Neural Computing and Applications 35, 20 (2023), 14911–14927.
- Li et al. (2024) Na Li, Thomas Bailleux, Zied Bouraoui, and Steven Schockaert. 2024. Ontology Completion with Natural Language Inference and Concept Embeddings: An Analysis. arXiv:2403.17216 (2024).
- Li et al. (2015) Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2015. Gated graph sequence neural networks. arXiv:1511.05493 (2015).
- Liang et al. (2024) Zhanbo Liang, Jie Guo, Weidong Qiu, Zheng Huang, and Shujun Li. 2024. When graph convolution meets double attention: online privacy disclosure detection with multi-label text classification. Data Mining and Knowledge Discovery (2024), 1–22.
- Liao et al. (2017) Shiyang Liao, Junbo Wang, Ruiyun Yu, Koichi Sato, and Zixue Cheng. 2017. CNN for situations understanding based on sentiment analysis of twitter data. Procedia computer science 111 (2017), 376–381.
- Lin et al. (2024) Mu Lin, Tao Wang, Yifan Zhu, Xiaobo Li, Xin Zhou, and Weiping Wang. 2024. A Heterogeneous Directed Graph Attention Network for inductive text classification using multilevel semantic embeddings. Knowledge-Based Systems 295 (2024), 111797.
- Lin et al. (2021) Yuxiao Lin, Yuxian Meng, Xiaofei Sun, Qinghong Han, Kun Kuang, Jiwei Li, and Fei Wu. 2021. Bertgcn: Transductive text classification by combining gcn and bert. arXiv:2105.05727 (2021).
- Linmei et al. (2019) H Linmei, T Yang, C Shi, H Ji, and X Li. 2019. Heterogeneous graph attention networks for semi-supervised short text classification. In Conf. on empirical methods in natural language processing and Int. joint conference on natural language processing. 4821–4830.
- Litman (1996) Diane J Litman. 1996. Cue phrase classification using machine learning. Journal of Artificial Intelligence Research 5 (1996), 53–94.
- Liu (2022) Bin Liu. 2022. GCN-BERT and Memory Network Based Multi-Label Classification for Event Text of the Chinese Government Hotline. IEEE Access 10 (2022), 109267–109276.
- Liu et al. (2023) Boting Liu, Weili Guan, Changjin Yang, Zhijie Fang, and Zhiheng Lu. 2023. Transformer and Graph Convolutional Network for Text Classification. International Journal of Computational Intelligence Systems 16, 1 (2023), 161.
- Liu and Guo (2019) G Liu and J Guo. 2019. Bidirectional LSTM with attention mechanism and convolution layer for text classification. Neurocomputing 337 (2019), 325–338.
- Liu et al. (2017a) Jingzhou Liu, Wei-Cheng Chang, Yuexin Wu, and Yiming Yang. 2017a. Deep learning for extreme multi-label text classification. In Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval. 115–124.
- Liu et al. (2018) Mingxia Liu, Jun Zhang, Ehsan Adeli, and Dinggang Shen. 2018. Landmark-based deep multi-instance learning for brain disease diagnosis. Medical image analysis 43 (2018), 157–168.
- Liu et al. (2016) P Liu, X Qiu, and X Huang. 2016. Recurrent neural network for text classification with multi-task learning. arXiv:1605.05101 (2016).
- Liu et al. (2021) Tengfei Liu, Yongli Hu, Junbin Gao, Yanfeng Sun, and Baocai Yin. 2021. Zero-shot text classification with semantically extended graph convolutional network. In 2020 25th International Conference on Pattern Recognition (ICPR). 8352–8359.
- Liu et al. (2017b) Xu Liu, Licheng Jiao, Jiaqi Zhao, Jin Zhao, Dan Zhang, Fang Liu, Shuyuan Yang, and Xu Tang. 2017b. Deep multiple instance learning-based spatial–spectral classification for PAN and MS imagery. IEEE Transactions on Geoscience and Remote Sensing 56, 1 (2017), 461–473.
- Liu et al. (2020) Xien Liu, Xinxin You, Xiao Zhang, Ji Wu, and Ping Lv. 2020. Tensor graph convolutional networks for text classification. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 8409–8416.
- Liu et al. (2019) Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692 (2019).
- Lu et al. (2020) Zhibin Lu, Pan Du, and Jian-Yun Nie. 2020. VGCN-BERT: augmenting BERT with graph embedding for text classification. In European Conference on Information Retrieval. Springer, 369–382.
- Ma et al. (2021) Qianwen Ma, Chunyuan Yuan, Wei Zhou, and Songlin Hu. 2021. Label-specific dual graph neural network for multi-label text classification. In Proc. Annual Meeting of the ACL and Int. Joint Conf. on Natural Language Processing. 3855–3864.
- Marreddy et al. (2022) M Marreddy, S R Oota, L S Vakada, V C Chinni, and R Mamidi. 2022. Multi-task text classification using graph convolutional networks for large-scale low resource language. In 2022 International Joint Conference on Neural Networks. IEEE, 1–8.
- Mercha et al. (2024) El Mahdi Mercha, Houda Benbrahim, and Mohammed Erradi. 2024. Heterogeneous text graph for comprehensive multilingual sentiment analysis: capturing short-and long-distance semantics. PeerJ Computer Science 10 (2024).
- Mikolov et al. (2013) T Mikolov, K Chen, G Corrado, and J Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013).
- Minaee et al. (2021) Shervin Minaee, Nal Kalchbrenner, Erik Cambria, Narjes Nikzad, Meysam Chenaghlu, and Jianfeng Gao. 2021. Deep learning–based text classification: a comprehensive review. ACM Computing Surveys (CSUR) 54, 3 (2021), 1–40.
- Nam et al. (2009) Sang-Hyob Nam, Seung-Hoon Na, Jungi Kim, Yeha Lee, and Jong-Hyeok Lee. 2009. Partially Supervised Phrase-Level Sentiment Classification. In International Conference on Computer Processing of Oriental Languages. Springer, 225–235.
- Onan (2017) Aytug Onan. 2017. Hybrid supervised clustering based ensemble scheme for text classification. Kybernetes (2017).
- Oord et al. (2018) Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv:03748 (2018).
- OpenAI (2023) OpenAI. 2023. GPT-3.5. https://chat.openai.com/chat Large language model.
- O’Shea and Nash (2015) Keiron O’Shea and Ryan Nash. 2015. An introduction to convolutional neural networks. arXiv:1511.08458 (2015).
- Pal et al. (2020) A Pal, M Selvakumar, and M Sank. 2020. Multi-label text classification using attention-based graph neural network. arXiv:11644 (2020).
- Pappas and Popescu-Belis (2014) Nikolaos Pappas and Andrei Popescu-Belis. 2014. Explaining the stars: Weighted multiple-instance learning for aspect-based sentiment analysis. In Proceedings of the 2014 Conference on Empirical Methods In Natural Language Processing (EMNLP). 455–466.
- Peng et al. (2024) Yinbin Peng, Wei Wu, Jiansi Ren, and Xiang Yu. 2024. Novel GCN Model Using Dense Connection and Attention Mechanism for Text Classification. Neural Processing Letters 56, 2 (2024), 1–17.
- Pennington et al. (2014) Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543.
- Perozzi et al. (2014) Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701–710.
- Pham et al. (2022) Phu Pham, Loan TT Nguyen, Witold Pedrycz, and Bay Vo. 2022. Deep learning, graph-based text representation and classification: a survey, perspectives and challenges. Artificial Intelligence Review (2022), 1–35.
- Quellec et al. (2016) Gwenolé Quellec, Mathieu Lamard, Michel Cozic, Gouenou Coatrieux, and Guy Cazuguel. 2016. Multiple-instance learning for anomaly detection in digital mammography. Ieee transactions on medical imaging 35, 7 (2016), 1604–1614.
- Radford et al. (2021) A Radford, J W Kim, C Hallacy, A Ramesh, G Goh, S Agarwal, G Sastry, A Askell, P Mishkin, J Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
- Raffel et al. (2020b) C Raffel, N Shazeer, A Roberts, K Lee, S Narang, M Matena, Y Zhou, Wei Li, and P J Liu. 2020b. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.
- Ragesh et al. (2021) R Ragesh, S Sellamanickam, A Iyer, R Bairi, and V Lingam. 2021. Hetegcn: heterogeneous graph convolutional networks for text classification. In Proceedings of the 14th ACM international conference on web search and data mining. 860–868.
- Ren et al. (2022) H Ren, Wei Lu, Y Xiao, X Chang, X Wang, Z Dong, and D Fang. 2022. Graph convolutional networks in language and vision: A survey. Knowledge-Based Systems (2022), 109250.
- Sarzynska-Wawer et al. (2021) J S-Wawer, A Wawer, A Pawlak, J Szymanowska, I Stefaniak, M Jarkiewicz, and L Okruszek. 2021. Detecting formal thought disorder by deep contextualized word representations. Psychiatry Research 304 (2021), 114135.
- Sculley and Wachman (2007) David Sculley and Gabriel M Wachman. 2007. Relaxed online SVMs for spam filtering. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 415–422.
- Shah et al. (2021) S M A Shah, H Ge, S A Haider, M Irshad, S M Noman, J A Meo, A Ahmad, and T Younas. 2021. A Quantum Spatial Graph Convolutional Network for Text Classification. Comput. Syst. Sci. Eng. 36, 2 (2021), 369–382.
- Shahmirzadi et al. (2019) Omid Shahmirzadi, Adam Lugowski, and Kenneth Younge. 2019. Text similarity in vector space models: a comparative study. In 2019 18th IEEE international conference on machine learning and applications (ICMLA). IEEE, 659–666.
- Sharma and Sahni (2011) Aman Kumar Sharma and Suruchi Sahni. 2011. A comparative study of classification algorithms for spam email data analysis. International Journal on Computer Science and Engineering 3, 5 (2011), 1890–1895.
- She et al. (2022) Xiangrong She, Jianpeng Chen, and Gang Chen. 2022. Joint learning with BERT-GCN and multi-attention for event text classification and event assignment. IEEE Access 10 (2022), 27031–27040.
- Sherstinsky (2020) Alex Sherstinsky. 2020. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena 404 (2020), 132306.
- Sjarif et al. (2019) N N A Sjarif, N F M Azmi, S Chuprat, H M Sarkan, Y Yahya, and S M Sam. 2019. SMS spam message detection using term frequency-inverse document frequency and random forest algorithm. Procedia Computer Science 161 (2019), 509–515.
- Socher et al. (2013) R Socher, A Perelygin, J Wu, J Chuang, C D Manning, A Y Ng, and C Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proc. Conf. on empirical methods in natural language processing. 1631–1642.
- Speer et al. (2017) Robyn Speer, Joshua Chin, and Catherine Havasi. 2017. Conceptnet 5.5: An open multilingual graph of general knowledge. In Proceedings of the AAAI conference on artificial intelligence, Vol. 31.
- Sultani et al. (2018) Waqas Sultani, Chen Chen, and Mubarak Shah. 2018. Real-world anomaly detection in surveillance videos. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6479–6488.
- Sun and Lim (2001) A Sun and E-P Lim. 2001. Hierarchical text classification and evaluation. In Proc. Int. Conf. on Data Mining. IEEE, 521–528.
- Sun et al. (2009) Aixin Sun, Ee-Peng Lim, and Ying Liu. 2009. On strategies for imbalanced text classification using SVM: A comparative study. Decision Support Systems 48, 1 (2009), 191–201.
- Sun et al. (2019) Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to fine-tune bert for text classification?. In China national conference on Chinese computational linguistics. Springer, 194–206.
- Tan et al. (2012) Luke Kien-Weng Tan, Jin-Cheon Na, Yin-Leng Theng, and Kuiyu Chang. 2012. Phrase-level sentiment polarity classification using rule-based typed dependencies and additional complex phrases consideration. J. of Comp. Sci. and Tech. 27, 3 (2012), 650–666.
- Tang et al. (2020a) Hengliang Tang, Yuan Mi, Fei Xue, and Yang Cao. 2020a. An integration model based on graph convolutional network for text classification. IEEE Access 8 (2020), 148865–148876.
- Tayal et al. (2020) Kshitij Tayal, Saurabh Agrawal, Nikhil Rao, Xiaowei Jia, Karthik Subbian, and Vipin Kumar. 2020. Regularized graph convolutional networks for short text classification. (2020).
- Tayal et al. (2019) Kshitij Tayal, Rao Nikhil, Saurabh Agarwal, and Karthik Subbian. 2019. Short text classification using graph convolutional network. In NIPS workshop on Graph Representation Learning.
- Thulasidasan et al. (2021) Sunil Thulasidasan, Sushil Thapa, Sayera Dhaubhadel, Gopinath Chennupati, Tanmoy Bhattacharya, and Jeff Bilmes. 2021. An effective baseline for robustness to distributional shift. In 2021 IEEE Int. Conf. on Machine Learning and Appl. (ICMLA). IEEE, 278–285.
- Touvron et al. (2023a) Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. 2023a. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288 (2023).
- Tu et al. (2019) Ming Tu, Jing Huang, Xiaodong He, and Bowen Zhou. 2019. Multiple instance learning with graph neural networks. arXiv:04881 (2019).
- Vashishth et al. (2018) Shikhar Vashishth, Manik Bhandari, Prateek Yadav, Piyush Rai, Chiranjib Bhattacharyya, and Partha Talukdar. 2018. Incorporating syntactic and semantic information in word embeddings using graph convolutional networks. arXiv:1809.04283 (2018).
- Vaswani et al. (2017a) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017a. Attention is all you need. Advances in neural information processing systems 30 (2017).
- Veličković et al. (2017) P Veličković, G Cucurull, A Casanova, A Romero, P Lio, and Y Bengio. 2017. Graph attention networks. arXiv:1710.10903 (2017).
- Volokhin et al. (2023) Sergey Volokhin, Marcus D Collins, Oleg Rokhlenko, and Eugene Agichtein. 2023. Augmenting Graph Convolutional Networks with Textual Data for Recommendations. In European Conference on Information Retrieval. Springer, 664–675.
- Wang et al. (2023a) Bolin Wang, Yuanyuan Sun, Yonghe Chu, Changrong Min, Zhihao Yang, and Hongfei Lin. 2023a. Local discriminative graph convolutional networks for text classification. Multimedia Systems (2023), 1–11.
- Wang et al. (2018b) Guoyin Wang, Chunyuan Li, Wenlin Wang, Yizhe Zhang, Dinghan Shen, Xinyuan Zhang, Ricardo Henao, and Lawrence Carin. 2018b. Joint embedding of words and labels for text classification. arXiv:1805.04174 (2018).
- Wang and Li (2022) H Wang and F Li. 2022. A text classification method based on LSTM and graph attention . Connection Science 34, 1 (2022), 2466–2480.
- Wang et al. (2022b) Kunze Wang, Soyeon Caren Han, Siqu Long, and Josiah Poon. 2022b. ME-GCN: multi-dimensional edge-embedded graph convolutional networks for semi-supervised text classification. arXiv:2204.04618 (2022).
- Wang et al. (2022a) K Wang, S C Han, and J Poon. 2022a. InducT-GCN: Inductive Graph Convolutional Networks for Text Classification. arXiv:00265 (2022).
- Wang et al. (2020) Rui Wang, Xuemeng Hu, Deyu Zhou, Yulan He, Yuxuan Xiong, Chenchen Ye, and Haiyang Xu. 2020. Neural topic modeling with bidirectional adversarial training. arXiv:2004.12331 (2020).
- Wang et al. (2021a) Sinong Wang, Han Fang, Madian Khabsa, Hanzi Mao, and Hao Ma. 2021a. Entailment as few-shot learner. arXiv:2104.14690 (2021).
- Wang et al. (2018a) S Wang, M Huang, Z Deng, et al. 2018a. Densely connected CNN with feature attention for text classification. In IJCAI. 4468–4474.
- Wang et al. (2016b) Wei Wang, Yue Ning, Huzefa Rangwala, and Naren Ramakrishnan. 2016b. A multiple instance learning framework for identifying key sentences and detecting events. In Proc. of the ACM Int. on Conf. on Information and Knowledge Management. 509–518.
- Wang et al. (2016a) Xingyou Wang, Weijie Jiang, and Zhiyong Luo. 2016a. Combination of convolutional and recurrent neural network for sentiment analysis of short texts. In Proceedings of COLING 2016, the 26th international conference on computational linguistics: Technical papers. 2428–2437.
- Wang et al. (2023b) Yifei Wang, Yongwei Wang, Hao Hu, Shengnan Zhou, and Qinwu Wang. 2023b. Knowledge-Graph-and GCN-Based Domain Chinese Long Text Classification Method. Applied Sciences 13, 13 (2023), 7915.
- Wang et al. (2011) Zhuang Wang, Liang Lan, and Slobodan Vucetic. 2011. Mixture model for multiple instance regression and applications in remote sensing. IEEE Transactions on Geoscience and Remote Sensing 50, 6 (2011), 2226–2237.
- Wang et al. (2021b) Ziyun Wang, Xuan Liu, Peiji Yang, Shixing Liu, and Zhisheng Wang. 2021b. Cross-lingual text classification with heterogeneous graph neural network. arXiv:2105.11246 (2021).
- Warstadt et al. (2019) Alex Warstadt, Amanpreet Singh, and Samuel R Bowman. 2019. Neural network acceptability judgments. Transactions of the Association for Computational Linguistics 7 (2019), 625–641.
- Wieting and Kiela (2019) John Wieting and Douwe Kiela. 2019. No training required: Exploring random encoders for sentence classification. arXiv:10444 (2019).
- Wu et al. (2019) Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. 2019. Simplifying graph convolutional networks. In International conference on machine learning. PMLR, 6861–6871.
- Wu et al. (2015a) Jiajun Wu, Yinan Yu, Chang Huang, and Kai Yu. 2015a. Deep multiple instance learning for image classification and auto-annotation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3460–3469.
- Wu et al. (2014) Jiajun Wu, Yibiao Zhao, Jun-Yan Zhu, Siwei Luo, and Zhuowen Tu. 2014. Milcut: A sweeping line multiple instance learning paradigm for interactive image segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 256–263.
- Wu et al. (2023) Tiandeng Wu, Qijiong Liu, Yi Cao, Yao Huang, Xiao-Ming Wu, and Jiandong Ding. 2023. Continual Graph Convolutional Network for Text Classification. arXiv:2304.04152 (2023).
- Wu et al. (2016) Y Wu, M Schuster, Z Chen, Q V Le, M Norouzi, W Macherey, M Krikun, Y Cao, Q Gao, K Macherey, et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144 (2016).
- Wu et al. (2020b) Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020b. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems 32, 1 (2020), 4–24.
- Xiao et al. (2024) Feng Xiao, Youfa Liu, and Jia Shao. 2024. NNC-GCN: Neighbours-to-Neighbours Contrastive Graph Convolutional Network for Semi-Supervised Classification. ACM Transactions on Knowledge Discovery from Data 18, 4 (2024), 1–18.
- Xiong et al. (2023) Jie Xiong, Li Yu, Xi Niu, and Youfang Leng. 2023. XRR: Extreme multi-label text classification with candidate retrieving and deep ranking. Information Sciences 622 (2023), 115–132.
- Xiong et al. (2021) Yuanhao Xiong, Wei-Cheng Chang, Cho-Jui Hsieh, Hsiang-Fu Yu, and Inderjit Dhillon. 2021. Extreme zero-shot learning for extreme text classification. arXiv:2112.08652 (2021).
- Xu et al. (2014) Yan Xu, Tao Mo, Qiwei Feng, Peilin Zhong, Maode Lai, I Eric, and Chao Chang. 2014. Deep learning of feature representation with multiple instance learning for medical image analysis. In 2014 ICASSP. IEEE, 1626–1630.
- Xue et al. (2021) Bingxin Xue, Cui Zhu, Xuan Wang, and Wenjun Zhu. 2021. An Integration Model for Text Classification using Graph Convolutional Network and BERT. In Journal of Physics: Conference Series, Vol. 2137. IOP Publishing, 012052.
- Xue et al. (2022) Bingxin Xue, Cui Zhu, Xuan Wang, and Wenjun Zhu. 2022. The Study on the Text Classification Based on Graph Convolutional Network and BiLSTM. In Proceedings of the 8th International Conference on Computing and Artificial Intelligence. 323–331.
- Yang et al. (2021a) Chunlian Yang, Yuchen Guo, Xiaowei Li, and Benhui Chen. 2021a. A Novel Method Using Local Feature to Enhance GCN for Text Classification. In 2021 11th International Conference on Intelligent Control and Information Processing (ICICIP). IEEE, 59–65.
- Yang et al. (2021b) Tianchi Yang, Linmei Hu, Chuan Shi, Houye Ji, Xiaoli Li, and Liqiang Nie. 2021b. HGAT: Heterogeneous graph attention networks for semi-supervised short text classification. ACM Transactions on Information Systems (TOIS) 39, 3 (2021), 1–29.
- Yang et al. (2022) Yintao Yang, Rui Miao, Yili Wang, and Xin Wang. 2022. Contrastive Graph Convolutional Networks with adaptive augmentation for text classification. Information Processing & Management 59, 4 (2022), 102946.
- Yang et al. (2019) Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019).
- Yao et al. (2019) L Yao, C Mao, and Y Luo. 2019. Graph convolutional networks for text classification. In Proc. AAAI Conf. on Art. Int., Vol. 33. 7370–7377.
- Ye et al. (2020) Zhihao Ye, Gongyao Jiang, Ye Liu, Zhiyong Li, and Jin Yuan. 2020. Document and word representations generated by graph convolutional network and bert for short text classification. In ECAI 2020. IOS Press, 2275–2281.
- Yin et al. (2019) S Yin, Q Peng, H Li, Z Zhang, X You, H Liu, K Fischer, S L Furth, G E Tasian, and Y Fan. 2019. Multi-instance deep learning with graph convolutional neural networks for diagnosis of kidney diseases using ultrasound imaging. In WUNSURE, WCLIP 2019,. 146–154.
- Yu et al. (2021) Zhizhi Yu, Di Jin, Ziyang Liu, Dongxiao He, Xiao Wang, Hanghang Tong, and Jiawei Han. 2021. AS-GCN: Adaptive semantic architecture of graph convolutional networks for text-rich networks. In 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 837–846.
- Yun et al. (2019) S Yun, M Jeong, R Kim, J Kang, and H J Kim. 2019. Graph transformer networks. Adv. in neural inf. proc. sys. 32 (2019).
- Zeng et al. (2024) Delong Zeng, Enze Zha, Jiayi Kuang, and Ying Shen. 2024. Multi-label text classification based on semantic-sensitive graph convolutional network. Knowledge-Based Systems 284 (2024), 111303.
- Zeng et al. (2022) F Zeng, N Chen, D Yang, and Z Meng. 2022. Simplified-boosting ensemble convolutional network for text classification. Neural Proc. Letters 54, 6 (2022), 4971–4986.
- Zeyu et al. (2021) Zhou Zeyu, Wang Hao, Zhao Zibo, Li Yueyan, and Zhang Xiaoqin. 2021. Construction and Application of GCN Model for Text Classification with Associated Information. Data Analysis and Knowledge Discovery 5, 9 (2021), 31–41.
- Zhang and Zhang (2020) H Zhang and J Zhang. 2020. Text graph transformer for document classification. In Conf. Emp. methods in natural language processing.
- Zhang et al. (2021b) J Zhang, J Yao, Y Chu, and J Yan. 2021b. A Multiple Instance Learning Algorithm Using Graph Convolutional Network for Speech Content Classification. In Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Vol. 5. IEEE, 1480–1484.
- Zhang et al. (2021a) Ningyu Zhang, Luoqiu Li, Xiang Chen, Shumin Deng, Zhen Bi, Chuanqi Tan, Fei Huang, and Huajun Chen. 2021a. Differentiable prompt makes pre-trained language models better few-shot learners. arXiv:2108.13161 (2021).
- Zhang et al. (2019b) Si Zhang, H Tong, J Xu, and R Maciejewski. 2019b. Graph convolutional networks: a comprehensive review. Comp. Social Networks 6, 1 (2019), 1–23.
- Zhang et al. (2020) Yufeng Zhang, Xueli Yu, Zeyu Cui, Shu Wu, Zhongzhen Wen, and Liang Wang. 2020. Every document owns its structure: Inductive text classification via graph neural networks. arXiv:2004.13826 (2020).
- Zhao et al. (2024) Fei Zhao, Qing Ai, Xiangna Li, Wenhui Wang, Qingyun Gao, and Yichun Liu. 2024. TLC-XML: Transformer with Label Correlation for Extreme Multi-label Text Classification. Neural Processing Letters 56, 1 (2024), 25.
- Zhao et al. (2022) Hongyu Zhao, Jiazhi Xie, and Hongbin Wang. 2022. Graph convolutional network based on multi-head pooling for short text classification. IEEE Access 10 (2022), 11947–11956.
- Zhou et al. (2020a) Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2020a. Graph neural networks: A review of methods and applications. AI Open 1 (2020), 57–81.
- Zhou et al. (2020b) Jie Zhou, Jimmy Xiangji Huang, Qinmin Vivian Hu, and Liang He. 2020b. Sk-gcn: Modeling syntax and knowledge via graph convolutional network for aspect-level sentiment classification. Knowledge-Based Systems 205 (2020), 106292.
- Zhou et al. (2016) Peng Zhou, Zhenyu Qi, Suncong Zheng, Jiaming Xu, Hongyun Bao, and Bo Xu. 2016. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv:1611.06639 (2016).
- Zhou et al. (2024) Y Zhou, A Pang, and G Yu. 2024. Clip-GCN: an adaptive detection model for multimodal emergent fake news domains. Complex & Intelligent Systems (2024), 1–18.
- Zhu and Koniusz (2020) Hao Zhu and Piotr Koniusz. 2020. Simple spectral graph convolution. In International Conference on Learning Representations.
- Zhu et al. (2021a) W Zhu, S Liu, and C Liu. 2021a. Learning multimodal word representation with graph convolutional networks. Information Processing & Management 58, 6 (2021), 102709.
- Zhu et al. (2021b) Xiaofei Zhu, Ling Zhu, Jiafeng Guo, Shangsong Liang, and Stefan Dietze. 2021b. GL-GCN: Global and local dependency guided graph convolutional networks for aspect-based sentiment classification. Expert Systems with Applications 186 (2021), 115712.