11email: [email protected], 11email: [email protected]
A Few-shot Approach to Resume Information Extraction via Prompts
Abstract
Prompt learning’s fine-tune performance on text classification tasks has attracted the NLP community. This paper applies it to resume information extraction, improving existing methods for this task. We created manual templates and verbalizers tailored to resume texts and compared the performance of Masked Language Model (MLM) and Seq2Seq PLMs. Also, we enhanced the verbalizer design for Knowledgeable Prompt-tuning, contributing to prompt template design across NLP tasks. We present the Manual Knowledgeable Verbalizer (MKV), a rule for constructing verbalizers for specific applications. Our tests show that MKV rules yield more effective, robust templates and verbalizers than existing methods. Our MKV approach resolved sample imbalance, surpassing current automatic prompt methods. This study underscores the value of tailored prompt learning for resume extraction, stressing the importance of custom-designed templates and verbalizers.
Keywords:
resume prompt few-shot learning tempalte verbalizer information extraction text classification.1 Introduction
With the introduction of the Transformer architecture [19], large-scale language models pre-trained on unsupervised tasks have consistently achieved state-of-the-art results on a wide range of NLP tasks. The most prominent of these models include the Encoder-based BERT [1], a task-based MLM pre-trained model primarily used for classification tasks, and its modified version, RoBERTa [11]. Another example is the T5 model [13], which uses the Seq2Seq MLM pre-training method. The pre-training and fine-tuning paradigm was widely applied to various NLP downstream tasks until 2020, when a novel pre-training and prompt paradigm was proposed.
The first prompt-based approach for text classification tasks, which utilizes models to answer cloze questions and predict labels, was proposed for sentiment classification [16]. This approach was shown to achieve higher accuracy than the traditional fine-tuning paradigm, even with limited training data. The underlying principle of prompt learning involves using manually designed templates to wrap sentences, which are then masked to provide the target label relation words. This masked text is then input into the model, which predicts the corresponding label relation words. The development of manual prompt templates led to the exploration of automatic prompt generation methods. These methods are broadly categorized into two groups: discrete prompts and continuous prompts [9].
As previously noted, prompt methods have demonstrated exceptional performance across various benchmark datasets. However, their efficacy in unique practical scenarios, such as information extraction from resumes, remains a question. This application is of great significance to businesses, given the daily influx of resumes. Though deep learning has facilitated automated resume screening, the resource demand for annotating resume texts for Pretrained Language Model (PLM) fine-tuning can be prohibitive. Prompt methods, requiring only a few labeled resume texts, present an economical solution, particularly for small businesses or niche industries. In this study, we leverage a seven-category sentence classification of English resume data [3]. This approach transforms the resume extraction task into a sentence classification task, aligns with the current research focus in prompt learning, and offers practical value, especially for organizations with limited resources. Moreover, the recurring specific words in resumes are ideal for constructing a Knowledgeable Verbalizer (KV), a technique known for its state-of-the-art (SOTA) results [6]. Thus, we focus on information extraction from resumes in this study.

As depicted in Figure 1, the sentence to be classified, denoted as X, is inputted into the wrap class. Following this, an indicative template sentence is appended. Subsequently, a mask token is integrated into the prompted sentence, representing the label relation word to be forecasted, such as “X this sentence is about mask”. Consequently, a sentence classification problem is reframed as a fill-in-the-blanks (mask token) task. The Masked Language Model (MLM) explicitly addresses this issue. As a result, prompting enables the downstream task to align more closely with the pre-training task, facilitating rapid model adaptation for small datasets. The Knowledgeable Verbalizer (KV) extends a single label relation word to cover multiple associated terms. Initially, the output probability of each relevant word for the mask token is calculated. These probabilities are then consolidated, and the class with the highest probability of associated words is selected as the sentence’s predicted outcome, i.e., the mask token word. In conclusion, KV with a manually designed template has achieved superior scores across numerous datasets compared to other prompt methods’ baseline scores. Therefore, our initial hypothesis in this study was that the amalgamation of manually designed templates and KV is the most effective among current mainstream prompt methods. We conducted subsequent comparison experiments to validate this hypothesis. To assess the impact of different templates on the results, we devised several alternative templates for comparison experiments. The results suggest that the efficacy of prompt-learning is significantly impacted by template design. A baseline KV was created for resume text classification using the original Knowledgeable Prompt-learning method [6] and compared with the KV designed based on our refined rules. We term the KV constructed following our proposed rule as Manual Knowledgeable Verbalizer (MKV). On the 25-shot and 50-shot tasks, performance increased significantly from 54.96 to 63.65 and 59.72 to 76.53 respectively. These findings indicate our proposed MKV is more suited for the resume extraction task. We conducted comparative experiments on two models using 25/50/10-shot tasks and two distinct training techniques to compare the encoder structure of BERT and the Seq2Seq structure of the T5 model in the context of prompt-learning. In summary, our primary contributions include:
1. Development of a comprehensive set of prompt-learning techniques for the few-shot resume information extraction task.
2. Proposal of KV construction rules, based on the original Knowledgeable Prompt-tuning (KPT), that are more compatible with resume text. This provides insights for subsequent researchers to develop KVs for practical application scenarios.
3. Comparison of manual template construction with automatic template generation for the prompt method. We have demonstrated that, in the current state of prompt-learning for resume text, a method employing manually crafted prompts surpasses one using automatically generated prompts.
2 Related Work
The research related to this paper is divided into two main groups. Prompt templates and constructs of prompt verbalizer.
Prompt Template. Since PET(Pattern-Exploiting Training)[16] was proposed, a surge of prompt research has been started. A PET follow-up study mentioned that smaller size models, such as BERT-base, are also capable of few-shot learning[17]. There are also some other studies of discrete prompts[4][14][18]. The idea behind a discrete prompt is that the words in the template are real. The reason for this is because in coordinate space, the words are in a discrete state. Another hand, as all token-level prompts have been automatically generated, some research has gone a step further by replacing the token-level prompt template in the token representation with continuous vectors directly and training these prompts instead of the fine-tune model(A.K.A Continuous Prompt)[8][7][12][5][10]. Using continuous vectors, we can find the best set of vectors to replace the discrete prompt template.
Prompt Verbalizer. The automatic construction method of verbalizer in few-shot learning is explored[15]. There are also corresponding verbalizers for some of the prompt methods mentioned in the previous paragraph, such as manual, soft and automatic verbalizers. Finally, the KV in this study also has better performance than other verbalizers[6].
3 Task Setup
In this work, we select the resume information sentence classification dataset as the experimental object. Hence, this study also considers the task setting from a practical application scenario. Suppose an IT company needs to fine-tune its company resume information extraction model. First, a training dataset consisting of several hundred resumes needs to be constructed. This results in the need to annotate tens of thousands of sentences. This is a non-negligible cost for a small company. Also, some small start-up companies may not have hundreds of resumes to create a training dataset. Thus, it is essential to minimize the investment of company human resource in producing training datasets for company.
In summary, we set the task to be in the case of annotating only one or two resumes. The resume dataset was extracted from 15,000 original resumes and 1000 of them were used as tagged objects [3] 111https://www.kaggle.com/datasets/oo7kartik/resume-text-batch. In addition, the total number of sentence samples in this dataset is 78786, and the total number of annotated resumes is 1000. So it is calculated that each resume contains about 79 sentences on average.
4 Prompt Design
In this section, we firstly designed a series of prompt templates for resumes. Secondly, we presented a different KV construction method from the original KPT depending on the textual characteristics of the resume(Figure 1).
4.1 Manual Template
For the manually designed templates, we divided into two design thoughts. One is the generic template that is commonly used in prior studies (e.g., “input sentence In this sentence, the topic is mask.” ). Another type of template is designed for resume documents (e.g., “input sentence this sentence belongs in the mask section of the resume.”). Based on this, we designed a series of templates specifically for the resume text. By inserting words like “resume” and “curriculum vitae” into the templates, which are strongly related to the classification text. It is anticipated that the performance of the few-shot learning of resume text would improve with the specific design of templates and MKV.
4.2 Knowledgeable Verbalizer
In this study, KV from the KPT paper is utilized[6]. In their study, KV is constructed by introducing external knowledge. As an example, consider the class label “experience”. By using related word search sites, the words that appear frequently in conjunction with “experience” are tallied, and the top 100 words with the highest frequency are selected to constitute KV222https://relatedwords.org/. There are seven word sets for class labels to construct the KV.
As shown in Figure 2, the text of the “personal information” in the original manuscript of a resume was selected to illustrate the MKV construction rules of our proposal. In summary, we present two rules for selecting the label relation word. 1. Words that frequently appear in sentence of the target class with resume text. 2. This word does not often occur in other classes.Thus, the words marked with gray background in the figure can be selected according to the two rules mentioned above. Two of them, “languages” and “address”, are marked with a black background. The word “languages” often appears in “skill” classes for programming languages. The “address” is a word that appears not only in personal information as a home address. It also appears in “experience” classes for company addresses. So the above two words will not be selected in the label relation word set of personal information.

5 Experiments Setup
This study aims to investigate the impact of various factors on the outcomes of limited opportunity resume learning, as well as to assess the efficacy of our proposed MKV construction rules for Prompt Learning of resume material. We performed multiple comparative tests between the prompt’s template and verbalizer, evaluated the performance of several prompting methods against MKV, and examined the effectiveness of two structurally distinct PLM models in few-shot resume learning. These experiments utilized a specially created resume dataset.
For efficient iteration of different experiments, we used Openprompt555https://github.com/thunlp/OpenPrompt, an Open-Source Framework for Prompt-learning[2]. We used the F1-micro as the evaluation metric for all experiments. Given that few-shot learning typically allows only a limited number of training samples per class, we employed a random seed to extract the training set from our unbalanced dataset of resumes. This approach maintained the original sample distribution, thereby preserving the inherent imbalance of the resume samples. Distinct random seeds were used for the 25/50/100-shot experiments.
Initially, we conducted two comparison experiments to determine the most efficient template and Knowledge Verbalizer (KV). One experiment compared different manual templates, while the other compared the performance of our proposed MKV construction method with the original KV construction method. After establishing the performance of Manual Template (MT) and Manual Knowledge Verbalizer (MKV), they were compared with three other types of templates and verbalizers.
The methods we considered include: 1) The Automatic method, where the model generates discrete prompt template words automatically, 2) Soft prompt method, where by optimizing the vectors of the embedding layer, soft prompts are utilized, 3) P-tuning method, where token-level templates are replaced with dense vectors, which are trained to predict the masked words, and 4) Prefix prompt approach, where task-specific continuous sequence prefixes are trained instead of the entire transformer model.
Finally, to investigate the fine-tuning and prompt-tuning performance of RoBERTalarge and T5large, we conducted an additional experiment. The scores from the first three experiments were obtained after 4 training epochs, while the PLM comparison experiment used scores adjusted to the best epoch based on test set scores. To evaluate the performance of KV and MKV in the context of sample imbalance, we created and analyzed the confusion matrix of the 50-shot test dataset for both methods.
Method | Template | F1-score |
---|---|---|
input sentence In this sentence, the topic is mask. | 33.45 | |
MT+MKV | input sentence this sentence is talking about mask. | 55.77 |
input sentencethis sentence belongs in the mask | ||
section of the resume. | 61.32 | |
input sentencethis sentence belongs in the mask | ||
section of the curriculum vitae. | 62.09 |
6 Results and Analysis
6.1 Comparison Between Different Manual Templates
Table 1 showcases experimental results from comparing four distinct templates. To eliminate confounding variables such as training samples, we adopted a 0-shot training strategy, directly predicting the test set using the original parameters of PLMs. This amplifies the impact of each template’s efficacy. The top two are universal prompt templates suitable for any classification task. Contrarily, we developed two templates specifically for the resume dataset; incorporating “resume” into the templates notably enhanced the outcomes.
Another noteworthy observation is the 0.77 point score increment following the substitution of “resume” with “curriculum vitae”. We hypothesize this arises from the polysemous nature of “resume” (e.g., n.summary, v.recover), creating ambiguity when used within the template sentence. Replacing “resume” with the unambiguous term “curriculum vitae” consequently improved the score.
Total Label Set Size(KV method) | 25-shot | 50-shot | 100-shot |
---|---|---|---|
700(KV-baseline) | 54.96 | 59.72 | 66.46 |
63(Ours MKV) | 63.65 | 76.53 | 76.72 |
6.2 Compare Different Verbalizers
Subsequently, we compare the performance of the original KV construction method and our proposed method. As shown in Table 2, the total label set size means the sum of the extended words of the seven categories. To start with, for baseline, we follow the method in the original paper to obtain a set of related words to each sentence class by retrieving webpages with the class label as a query[6]. Later, we constructed a total of 63 MKVs according to the rule proposed in 4.2. For the KV comparison test in this section, we used the fourth manual template in Table 1 (text this sentence belongs in the mask section of the curriculum vitae.). MKV is 8.69/16.81/10.26 point higher than the origin KV on 25/50/100-shot. This result also demonstrates the effectiveness of our improved MKV.
6.3 Comparison of Different Prompt Methods
Prompt Method | F1-score | |||
---|---|---|---|---|
Template | Verbalizer | 25-shot | 50-shot | 100-shot |
MKV | 63.65 | 76.53 | 76.72 | |
MT | MV | 51.95 | 70.18 | 72.89 |
SV | 51.49 | 55.37 | 64.29 | |
AutoV | 14.72 | 14.48 | 13.93 | |
MKV | 63.92 | 70.42 | 73.91 | |
ST | MV | 39.93 | 57.91 | 65.42 |
SV | 33.69 | 53.60 | 60.29 | |
AutoV | 14.72 | 14.48 | 13.93 | |
MKV | 62.82 | 68.19 | 72.33 | |
PT | MV | 42.11 | 55.97 | 65.65 |
SV | 16.43 | 50.42 | 64.04 | |
AutoV | 14.48 | 14.28 | 15.12 | |
MKV | 60.88 | 67.29 | 73.60 | |
PFT | MV | 40.36 | 55.62 | 62.73 |
SV | 39.13 | 53.56 | 57.62 | |
AutoV | 14.85 | 14.65 | 13.96 |
Since the OpenPrompt framework divides the process of prompt-tuning into two main parts: Template and Verbalizer (See in Figure 1), these two parts can be used in combination at will. Hence, we have selected four representative Templates and four Verbalizer methods and compared their few-shot learning effectiveness with each other. As shown in Table 3, In the 50/100-shot experiments, the combination of MT+MKV achieved the best results. Especially in the 50-shot experiment, the combination of MT+MKV scored 6.11 point higher than the second place ST+MKV. However, in the 25-shot experiment, the ST+MKV combination was slightly higher than the MT+MKV combination. Overall, the MT+MKV method that we propose outperforms the other four template and verbalizer combinations. This further validates the effectiveness and robustness of our MKV constructed for the resume classification dataset.
6.4 Results on T5 and RoBERTa Model
Model | Method | Examples | F1-score |
---|---|---|---|
25 | 58.70 | ||
RoBERTa(baseline) | Fine-tune | 50 | 66.10 |
100 | 73.78 | ||
25 | 47.33 | ||
T5a(baseline) | Fine-tune | 50 | 58.97 |
100 | 70.46 | ||
25 | 57.48 | ||
RoBERTa | MT+MKV | 50 | 71.50 |
100 | 71.85 | ||
25 | 63.65 | ||
T5 | MT+MKV | 50 | 76.53 |
100 | 78.01 |
In our final experiment, we compared the performance of the RoBERTa model, trained using Masked Language Model (MLM) within an Encoder structure, and the T5 model, implemented with an Encoder-Decoder structure, on a resume classification task. Unlike previous model comparison experiments where training was conducted for a fixed four epochs, we adjusted the training duration to the optimal number of epochs based on the test dataset score. This approach better demonstrates the peak performance of both models using the two methods, thereby facilitating a more accurate comparison for determining the more suitable model for the few-shot learning task on the resume dataset.
As illustrated in Table 4, the T5 model, when trained using the fine-tuning approach, underperforms the RoBERTa model in the 25/50/100-shot outcomes. However, the 25/50/100-shot results outperform the RoBERTa model when the T5 model is trained using the MT+MKV prompt-learning method. Notably, the corresponding improvements are 6.17, 5.03, and 6.16 points respectively.
Additionally, the RoBERTa model, when provided with a larger number of training samples, exhibits a performance gain of 1.93 points at the 100-shot level, with the fine-tune method compared to the prompt-learning method. Interestingly, the score difference between the 50-shot and 100-shot instances using the MT+MKV method with the RoBERTa model is minimal, despite doubling the sample size. This suggests limited efficiency in the use of larger samples within this context. This observation aligns with the findings of [7], which propose that the performance of prompt-tuning improves with an increase in the number of parameters within the model.
6.5 Analysis of the Confusion Matrix
The “experience” category in a resume dataset features the largest sample size at 41,114, while “qualification” has the smallest at 974, leading to a notable imbalance. This imbalance challenges the use of few-shot learning models. Our proposed MKV method’s effectiveness in addressing this imbalance is demonstrated through a confusion matrix comparison in Figure 3.


To begin with, an analysis of the confusion matrix for the labels “summary”, “qualification”, “education”, “skill”, and “object” using the original KV method, as depicted in Figure 3(a), reveals a substantial misclassification rate. These labels are associated with a smaller proportion of samples, many of which are erroneously classified into the categories of “experience” and “personal information”, which represent a larger proportion of the sample. Furthermore, the distribution observed in the confusion matrix of the KV method corroborates the trend suggested in Figure 3(a): the fewer the number of samples in a class, the lower the number of correct classifications.
Conversely, the application of our proposed MKV method shows a considerable improvement in addressing classification errors associated with sample imbalance, as shown in Figure 3(b). This lends credence to the effectiveness of the MKV approach, crafted according to our rule, in maintaining high performance even in the presence of highly unbalanced samples.
7 Conclusion
In this study, we use the prompt technique on the resume dataset for few-shot learning. We created templates informed by resume sentence structures and assessed their utility. Additionally, we refined the construction of a knowledgeable verbalizer, relying on Knowledgeable Prompt Tuning (KPT). For this, we devised construction rules for the MKV, tailored to the textual features of resumes. Experimental evaluations demonstrate our MKV’s effectiveness and robustness. While the final outcomes were satisfactory, they also elucidated the constraints inherent in the utilized prompt methodology. It is anticipated that future endeavors will develop a more universal prompt approach, capable of addressing a variety of industries and accommodating diverse resume formats.
7.0.1 Acknowledgements
This research was partially supported by JSPS KAKENHI Grant Numbers JP23H00491 and JP22K00502.
References
- [1] Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proc. of NAACL (Jun 2019)
- [2] Ding, N., Hu, S., Zhao, W., Chen, Y., Liu, Z., Zheng, H., Sun, M.: OpenPrompt: An open-source framework for prompt-learning. In: Proc. of ACL (May 2022)
- [3] Gan, C., Mori, T.: Construction of english resume corpus and test with pre-trained language models. arXiv preprint arXiv:2208.03219 (2022)
- [4] Gao, T., Fisch, A., Chen, D.: Making pre-trained language models better few-shot learners. In: Proc. of ACL-IJCNLP (Aug 2021)
- [5] Hambardzumyan, K., Khachatrian, H., May, J.: WARP: Word-level Adversarial ReProgramming. In: Proc. of ACL-IJCNLP (Aug 2021)
- [6] Hu, S., Ding, N., Wang, H., Liu, Z., Wang, J., Li, J., Wu, W., Sun, M.: Knowledgeable prompt-tuning: Incorporating knowledge into prompt verbalizer for text classification. In: Proc. of ACL (May 2022)
- [7] Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. In: Proc. of EMNLP (Nov 2021)
- [8] Li, X.L., Liang, P.: Prefix-tuning: Optimizing continuous prompts for generation. In: Proc. of ACL-IJCNLP (Aug 2021)
- [9] Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys 55(9), 1–35 (2023)
- [10] Liu, X., Ji, K., Fu, Y., Tam, W., Du, Z., Yang, Z., Tang, J.: P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In: Proc. of ACL (May 2022)
- [11] Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
- [12] Qin, G., Eisner, J.: Learning how to ask: Querying LMs with mixtures of soft prompts. In: Proc. of NAACL (Jun 2021)
- [13] Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21(1), 5485–5551 (2020)
- [14] Reynolds, L., McDonell, K.: Prompt programming for large language models: Beyond the few-shot paradigm. In: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. pp. 1–7 (2021)
- [15] Schick, T., Schmid, H., Schütze, H.: Automatically identifying words that can serve as labels for few-shot text classification. In: Proc. of ICCL (Dec 2020)
- [16] Schick, T., Schütze, H.: Exploiting cloze-questions for few-shot text classification and natural language inference. In: Proc. of EACL (Apr 2021)
- [17] Schick, T., Schütze, H.: It’s not just size that matters: Small language models are also few-shot learners. In: Proc. of NAACL (Jun 2021)
- [18] Tam, D., R. Menon, R., Bansal, M., Srivastava, S., Raffel, C.: Improving and simplifying pattern exploiting training. In: Proc. of EMNLP (Nov 2021)
- [19] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)