Inferential Text Generation with Multiple Knowledge Sources and Meta-Learning (Appendix)

First Author
Affiliation / Address line 1
Affiliation / Address line 2
Affiliation / Address line 3
email@domain
\AndSecond Author
Affiliation / Address line 1
Affiliation / Address line 2
Affiliation / Address line 3
email@domain

1 Dataset

A brief statistics and comparison of two datasets are given in Table 1.

	Event2Mind	ATOMIC
# of relations	3	9
# of events	24,716	24,313
# of triplets	171,291	877,108

Table 1: Statistics of Event2Mind [P18-1043] and ATOMIC [sap2019atomic].

2 Model Training

Here, we list our training details. Word embedding values are initialized with GloVe [pennington2014glove] and ELMo embeddings [peters2018deep]. We use dropout with a rate of 0.2 for word embeddings and the dimension of the encoder hidden state is 100. We set the maximum number of the knowledge triples as 30. Model parameters are updated using the Adam method, and the learning rate are 0.0001 and 0.0002 for Event2Mind and ATOMIC datasets, respectively. For MAML training, we set the step size $\alpha$ as 0.001, the weight $\beta$ as 0.01 and batch size as 64 for both experiments. Model hyperparameters are tuned on the development set.