Metaphor Detection using Deep Contextualized Word Embeddings

Shashwat Aggarwal University Of Delhi [email protected] Ramesh Singh National Informatics Center [email protected]

Abstract

Metaphors are ubiquitous in natural language, and their detection plays an essential role in many natural language processing tasks, such as language understanding, sentiment analysis, etc. Most existing approaches for metaphor detection rely on complex, hand-crafted and fine-tuned feature pipelines, which greatly limit their applicability. In this work, we present an end-to-end method composed of deep contextualized word embeddings, bidirectional LSTMs and multi-head attention mechanism to address the task of automatic metaphor detection. Our method, unlike many other existing approaches, requires only the raw text sequences as input features to detect the metaphoricity of a phrase. We compare the performance of our method against the existing baselines on two benchmark datasets, TroFi, and MOH-X respectively. Experimental evaluations confirm the effectiveness of our approach.

keywords:

Metaphors, Contextualized word embeddings, BERT, ELMo, Multi-head Attention, Bidirectional LSTMs, Raw text, Natural Language Processing

1 Introduction

A metaphor is a figurative form of expression that compares a word or a phrase to an object or an action to which it is not literally applicable but helps explain an idea or suggest a likeness or analogy between them. Metaphors have been used extensively in all types of literature and writings, especially in poetry and songs to communicate complex feelings, emotions, and visuals present in the text to readers effectively. Metaphors are ubiquitous in natural language and help in structuring our understanding of the world even without our conscious realization of its presence [1]. Given the prevalence and significance of metaphorical language, effective detection of metaphors plays an essential role in many natural language processing applications, for example, language understanding, information extraction, sentiment analysis, etc.

However, automated detection of metaphorical phrases is a difficult problem primarily due to three reasons. First, there is a subjective component involved: the metaphoricity of expression may vary across humans. Second, metaphors can be domain and context dependent. And third, there is a lack of annotated data, which is required to train supervised machine learning algorithms to facilitate automated detection accurately.

Most of the previous approaches for detection of metaphorical phrases, have either relied on manual and lexical detection [2, 3, 4] which requires heavily handcrafted features built from linguistic resources, that are costly to obtain and greatly limits their applicability or have used supervised machine learning based algorithms [5, 6, 7] with limited forms of linguistic context, for example using only the subject verb objects triplets (e.g. cat eat fish). Although these techniques automate the detection of metaphorical phrases, however, the prediction accuracies are not as good as the prediction accuracies of these techniques in other text classification tasks.

Inspired by recent works in the field of NLP and transfer learning, in this paper, we present an end-to-end method composed of deep contextualized word embeddings, bidirectional LSTMs and multi-head attention mechanism to address some of the limitations aforementioned. Our method is notable in the sense that unlike many existing approaches, it requires only the raw text sequences as input and does not depend on any complex or fine-tuned feature pipelines.

2 Literature Survey

There has been significant work done in automatic detection and discovery of metaphorical phrases in natural language ranging from the traditional rule-based methods which rely on task-specific hand-coded lexical resources to the recent statistical machine learning models to identify metaphors.

One of the first attempts to detect metaphorical phrases automatically in the text was by [8]. Their system called met* could discriminate between literalness, metonymy, and metaphor in the underlying text. [9] proposed a sentence clustering approach for non-literal language recognition using a similarity-based word sense disambiguation method. However, their approach focused only on metaphors expressed by a verb. [10] trained a maximum entropy classifier to discriminate between literal and metaphorical usage of a phrase. Their method got a high accuracy of 95.12%. However, the majority of the verbs present in their dataset were used metaphorically, thus making the task notably easier. [3] used hyponymy (IS-A) relation in WordNet [11] and word bigram counts to predict metaphors in sentences. However, the word bigram counts lose a great deal of information over the verb-noun pairs. Also, they do not deal with literal sentences.

[12] used statistical learning methods to identify metaphors automatically. They start with a small seed of manually annotated metaphorical expressions. The system generates a large number of metaphors of similar syntactic structure from a corpus. Other approaches using statistical learning methods to identify metaphors automatically include [4], that use concreteness and abstractness to detect metaphors, [13], that apply a Latent Dirichlet Allocation (LDA) based topic modeling method for automatic extraction of linguistic metaphors, [14] that construct a metaphor detection system using a random forest classifier with conceptual semantic features such as abstractness, imageability, and semantic supersenses, etc. Most of these existing approaches use a variety of features which rely on external lexical, syntactic, or semantic linguistic resources to train classification models for metaphor detection which severely limits their applicability.

Recently, there has been an introduction of several deep learning based approaches to tackle the problem of metaphor detection. Several methods such as [15, 16, 7] train word embeddings to identify and detect metaphors and have shown gains on various benchmarks. [17] used bidirectional LSTMs with raw text, SVO and dependency subsequences as input features. The 2018 VUA Metaphor Detection Shared Task has also introduced several LSTM and CRF based models [18, 19, 20, 21] which are augmented by linguistic features such as POS tags, lemmas, verb clusters, unigrams, WordNet, etc.

Furthermore, there have been works such as [22, 23] which have employed the recently proposed contextualized language representation models to detect metaphors. However, the work in this direction is limited. Also, a recently proposed popular multi-head attention mechanism [24], which has been used extensively in domains such as speech processing or machine translation in the recent past has also not been tried for the task of metaphor detection yet.

Based on the lines of some of these recent works, in this paper, we combine the contextualized language representation models, ELMo [25] and BERT [26] with multi-head attention mechanism and bidirectional LSTMs to detect metaphorical phrases automatically. We compare our proposed approach with existing baselines for the task of metaphor detection and classification.

The rest of the paper is organized as follows: In section 3, we discuss our proposed approach and in section 4 & 5, we describe the datasets we use and the experiments we perform on them.

3 Proposed Approach

We present an end-to-end method composed of deep contextualized word embeddings, bidirectional LSTMs and multi-head attention mechanism to address the task of automatic metaphor detection and classification. Our method takes the raw text sequence as input and does not depend on any complex, hand-crafted or fine-tuned feature pipelines.

Refer to caption — Figure 1: Proposed Model for metaphor detection from raw text sequences.

3.1 Contextualized Word Representations

Natural language processing (NLP) is a diversified field consisting of a wide variety of tasks such as text classification, named entity recognition, question answering, etc. However, most of these tasks contain limited datasets with a few thousand or a few hundred thousand human-labeled training examples. This shortage of training data severely affects the performance of modern deep learning-based NLP systems, which proffer benefits when trained from much larger amounts of data.

Recently, there have been techniques such as [25, 26], that address this limitation by pretraining language representation models on enormous amount of unannotated text and then transferring the learned representation to a downstream model of interest as contextualized word embeddings. These contextualized word representations have resulted in substantial gains in accuracy improvements in numerous NLP tasks as compared to training on these tasks from scratch. Following recent work, we also use these contextualized word embeddings for our task of metaphor detection.

3.2 Proposed Model

Figure 1 shows the architecture of our proposed method for the task of metaphor classification. Each token $t_{i}$ of the input text is encoded by the contextualized word embeddings, ELMo or BERT, to obtain the corresponding contextualized word representations. The contextualized representations are feed as input to the bidirectional LSTM encoder network. The encoder network outputs an encoded representation across every timestep of the input sequence of text.

We add a multi-head attention mechanism on top of the encoder network to obtain a context vector c representing the weighted sum of all the BiLSTM output states as depicted in equation 1, where $h$ is the number of attention heads used, $x_{i}$ corresponds to the hidden representations from the encoder network, $a_{i}^{j}$ is the attention weight computed for token $t_{i}$ from attention head $j$ and $c_{j}$ is the context vector obtained from attention head $j$ respectively.

\begin{split}&a_{i}^{j}=Softmax_{i}(W_{a^{j}}x_{i}+b_{a^{j}}),\\ &c_{j}=\sum_{i=1}^{n}a_{i}^{j}x_{i},\\ &c=W_{o}[c_{1};c_{2};...;c_{h}]+b,\\ \\ \end{split}

(1)

Finally, the context vector $c$ is feed to a dense layer decoder to predict the metaphoricity of the input phrase.

4 Experiments

4.1 Datasets

We evaluate our method on two benchmark datasets, TroFi [9] and MOH-X [6] respectively. In Table 1, we report a summary of each of the dataset used in our study. The TroFi dataset consists of literal and nonliteral usage of 50 English verbs drawn from the 1987-89 Wall Street Journal (WSJ) Corpus. The MOH-X dataset, on the other hand, is an adaptation of the MOH dataset [27] which consists of simple and concise sentences derived from various news articles.

4.2 Implementation Details

We use $1024d$ ELMo and BERT vectors. The LSTM module has a $256d$ hidden state. We use $4$ attention heads for the multi-head attention mechanism. We train the network using Adam optimizer with learning rate $lr=0.00003,\beta 1=0.9,\beta 2=0.999$ and with a batch size of $16$ . The network is trained for $50$ epochs.

4.3 Baselines

We compare the performance of our method with four existing baselines. The first baseline is a simple lexical baseline that classifies a phrase or token as metaphorical if that phrase or token is annotated metaphorically more frequently than as literally in the training set. The other baselines include a logistic regression classifier similar to one employed by [28], a neural similarity network with skip-gram word embeddings [7], and a BiLSTM based model combined with ELMo embeddings proposed in [22].

4.4 Evaluation Metrics

To evaluate the performance of our method with the baselines, we compute the precision, recall, F1 measure for the metaphor class and the overall accuracy. We perform $10$ fold cross-validation following prior work.

Table 1: Metaphor Detection dataset statistics. % Metaphor refers to sentence-level percentage.

Dataset	# Examples	% Metaphors	# Unique Verbs
TroFi	3,737	43%	50
MOH-X	647	49%	214

5 Results

In Table 2, we show the performance of all the baselines and our proposed method on the task of metaphor detection and classification for two benchmark datasets (MOH-X and TroFi). Both of our proposed methods outperform the existing baselines on both the datasets using only the raw text as input features. The model with BERT performs better as compared to the model with ELMo embeddings on both the datasets. Contextualized word embeddings and multi-head attention mechanism improve the performance of both models by a large margin.

In addition to the performance comparison of our methods with the existing baselines, we also show the receiver operating characteristic curve in Figure 2 showing the AUC values for the metaphor detection task and illustrating the diagnostic ability of the best method (ours w/ BERT) as its discrimination threshold is varied.

Table 2: Comparison of our proposed model with baselines on metaphor detection and classification task.

Model	TroFi				MOH-X
Model	P	R	F1	Acc	P	R	F1	Acc
Lexical Baseline	72.4	55.7	62.9	71.4	39.1	26.7	31.3	43.6
Log. Regression	70.7	71.4	70.3	72.7	68.7	66.2	67.4	73.6
Rei (2017)	-	-	-	-	73.6	76.1	74.2	74.8
Gao (2018) - CLS	68.7	74.6	72.0	73.7	75.3	84.3	79.1	78.5
Gao (2018) - SEQ	70.7	71.6	71.1	74.6	79.1	73.5	75.6	77.2
Ours + ELMo	82.7	85.6	82.7	83.1	80.4	73.7	75.9	78.1
Ours + BERT	85.3	84.9	83.2	85.8	90.7	74.8	79.8	80.7

Further, In Table 3 & 4, we report a set of examples from the TroFi and MOH-X developement sets along with the gold annotation label, the predicted label from both of our methods (with ELMo and with BERT) and the corresponding prediction scores assigned. The system confidently detects most of the metaporical and literal phrases such as ”make the people’s hearts glow.”, ”he pasted his opponent.” and respectively. However, the predicted output disagrees with the ground truth annotations for some phrases such as ”the white house sits on pennsylvania avenue.” and ”bicycle suffered major damage.” respectively. Majority of errors committed by our method are in detection of a particular type of metaphor called personification. The metaphoricity of phrases such as ”the white house sits on pennsylvania avenue.” are largely dependant on a few words like ”white house” which if replaced by the name of some person changes the metaphoricity of the phrase. The network occasionally gets confused in discriminating between different entities like people, things, etc. Given additional information along with raw text as input to the network, for example the part-of-speech or named-entity tags, the performance of the network could be improved.

Finally, we report some example phrases such as ”vesuvius erupts once in a while.” or ”the old man was sweeping the floor.” for which the model with BERT embeddings correctly detects the metaphoricity of the phrase, while the model with ELMo fails to do so.

Table 3: Examples from the TroFi development set, along with the gold label, predicted label, and the predicted score from our method with ELMo and BERT.

Input Phrase Gold Predicted Score (for Metaphor Class) w/ ELMo w/ BERT w/ ElMo w/ BERT ”make the people’s hearts glow .” 1 1 1 0.995 0.999 ”she leaned over the banister” 0 0 0 0.001 0.001 ”he pasted his opponent.” 1 1 1 0.999 0.999 ”vesuvius erupts once in a while.” 0 1 0 0.993 0.001 ”the white house sits on pennsylvania avenue.” 1 0 0 0.001 0.001

Table 4: Examples from the MOH-X development set, along with the gold label, predicted label, and the predicted score from our method with ELMo and BERT.

Input Phrase Gold Predicted Score (for Metaphor Class) w/ ELMo w/ BERT w/ ElMo w/ BERT ”this one speech could sink his candidacy.” 1 1 1 0.999 0.999 ”attach a drain hose to the radiator drain.” 0 0 0 0.001 0.001 ”the old man was sweeping the floor.” 0 1 0 0.837 0.001 ”the object then moved slowly away.” 0 1 0 0.999 0.474 ”bicycle suffered major damage.” 1 0 0 0.062 0.001

6 Conclusion

In this work, we presented an end-to-end method composed of deep contextualized word embeddings, bidirectional LSTMs, and multi-head attention mechanism to address the task of automatic metaphor detection and classification. Our method requires only the raw text sequences as input and does not depend on any complex or fine-tuned feature pipelines. Our method established new state-of-the-art on both the datasets for metaphor detection.

References

Lakoff and Johnson [1980] G. Lakoff, M. Johnson, Conceptual metaphor in everyday language., The Journal of Philosophy 7 (1980) 453–186.
Mason [2004] Z. J. Mason, Cormet: a computational, corpus-based conventional metaphor extraction system, Computational linguistics 30 (2004) 23–44.
Krishnakumaran and Zhu [2007] S. Krishnakumaran, X. Zhu, Hunting elusive metaphors using lexical resources., in: Proceedings of the Workshop on Computational approaches to Figurative Language, 2007, pp. 13–20.
Turney et al. [2011] P. D. Turney, Y. Neuman, D. Assaf, Y. Cohen, Literal and metaphorical sense identification through concrete and abstract context, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2011, pp. 680–690.
Tsvetkov et al. [2013] Y. Tsvetkov, E. Mukomel, A. Gershman, Cross-lingual metaphor detection using common semantic features, in: Proceedings of the First Workshop on Metaphor in NLP, 2013, pp. 45–51.
Shutova et al. [2016] E. Shutova, D. Kiela, J. Maillard, Black holes and white rabbits: Metaphor identification with visual features, in: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016, pp. 160–170.
Rei et al. [2017] M. Rei, L. Bulat, D. Kiela, E. Shutova, Grasping the finer point: A supervised similarity network for metaphor detection, arXiv preprint arXiv:1709.00575 (2017).
Fass [1991] D. Fass, met*: A method for discriminating metonymy and metaphor by computer, Computational Linguistics 17 (1991) 49–90.
Birke and Sarkar [2006] J. Birke, A. Sarkar, A clustering approach for nearly unsupervised recognition of nonliteral language, in: 11th Conference of the European Chapter of the Association for Computational Linguistics, 2006.
Gedigian et al. [2006] M. Gedigian, J. Bryant, S. Narayanan, B. Ciric, Catching metaphors, in: Proceedings of the Third Workshop on Scalable Natural Language Understanding, Association for Computational Linguistics, 2006, pp. 41–48.
Miller [1995] G. A. Miller, Wordnet: a lexical database for english, Communications of the ACM 38 (1995) 39–41.
Shutova et al. [2010] E. Shutova, L. Sun, A. Korhonen, Metaphor identification using verb and noun clustering, in: Proceedings of the 23rd International Conference on Computational Linguistics, Association for Computational Linguistics, 2010, pp. 1002–1010.
Heintz et al. [2013] I. Heintz, R. Gabbard, M. Srivastava, D. Barner, D. Black, M. Friedman, R. Weischedel, Automatic extraction of linguistic metaphors with lda topic modeling, in: Proceedings of the First Workshop on Metaphor in NLP, 2013, pp. 58–66.
Tsvetkov et al. [2014] Y. Tsvetkov, L. Boytsov, A. Gershman, E. Nyberg, C. Dyer, Metaphor detection with cross-lingual model transfer, in: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, 2014, pp. 248–258.
Do Dinh and Gurevych [2016] E.-L. Do Dinh, I. Gurevych, Token-level metaphor detection using neural networks, in: Proceedings of the Fourth Workshop on Metaphor in NLP, 2016, pp. 28–33.
Köper and im Walde [2017] M. Köper, S. S. im Walde, Improving verb metaphor detection by propagating abstractness to words, phrases and individual senses, in: Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications, 2017, pp. 24–30.
Sun and Xie [2017] S. Sun, Z. Xie, Bilstm-based models for metaphor detection, in: National CCF Conference on Natural Language Processing and Chinese Computing, Springer, 2017, pp. 431–442.
Mosolova et al. [2018] A. Mosolova, I. Bondarenko, V. Fomin, Conditional random fields for metaphor detection, in: Proceedings of the Workshop on Figurative Language Processing, 2018, pp. 121–123.
Wu et al. [2018] C. Wu, F. Wu, Y. Chen, S. Wu, Z. Yuan, Y. Huang, Neural metaphor detecting with cnn-lstm model, in: Proceedings of the Workshop on Figurative Language Processing, 2018, pp. 110–114.
Swarnkar and Singh [2018] K. Swarnkar, A. K. Singh, Di-lstm contrast: A deep neural network for metaphor detection, in: Proceedings of the Workshop on Figurative Language Processing, 2018, pp. 115–120.
Bizzoni and Ghanimifard [2018] Y. Bizzoni, M. Ghanimifard, Bigrams and bilstms two neural networks for sequential metaphor detection, in: Proceedings of the Workshop on Figurative Language Processing, 2018, pp. 91–101.
Gao et al. [2018] G. Gao, E. Choi, Y. Choi, L. Zettlemoyer, Neural metaphor detection in context, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (2018) 607–613.
Mu et al. [2019] J. Mu, H. Yannakoudakis, E. Shutova, Learning outside the box: Discourse-level features improve metaphor identification, arXiv preprint arXiv:1904.02246 (2019).
Vaswani et al. [2017] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, in: Advances in neural information processing systems, 2017, pp. 5998–6008.
Peters et al. [2018] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, L. Zettlemoyer, Deep contextualized word representations, in: Proc. of NAACL, 2018.
Devlin et al. [2018] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
Mohammad et al. [2016] S. Mohammad, E. Shutova, P. Turney, Metaphor as a medium for emotion: An empirical study, in: Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics, 2016, pp. 23–33.
Klebanov et al. [2016] B. B. Klebanov, C. W. Leong, E. D. Gutierrez, E. Shutova, M. Flor, Semantic classifications for detection of verb metaphors, in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), volume 2, 2016, pp. 101–106.