Inferential Text Generation with Multiple Knowledge Sources and Meta-Learning (Appendix)
1 Dataset
A brief statistics and comparison of two datasets are given in Table 1.
Event2Mind | ATOMIC | |
---|---|---|
# of relations | 3 | 9 |
# of events | 24,716 | 24,313 |
# of triplets | 171,291 | 877,108 |
2 Model Training
Here, we list our training details. Word embedding values are initialized with GloVe [pennington2014glove] and ELMo embeddings [peters2018deep]. We use dropout with a rate of 0.2 for word embeddings and the dimension of the encoder hidden state is 100. We set the maximum number of the knowledge triples as 30. Model parameters are updated using the Adam method, and the learning rate are 0.0001 and 0.0002 for Event2Mind and ATOMIC datasets, respectively. For MAML training, we set the step size as 0.001, the weight as 0.01 and batch size as 64 for both experiments. Model hyperparameters are tuned on the development set.