MegaByte: Predicting Million-byte Sequences with Multiscale Transformers Lili Yu Dániel Simig Colin Flaherty Armen Aghajanyan Luke Zettlemoyer Mike Lewis Machine Learning, ICML