Hierarchical Autoregressive Modeling Boosts Memory Efficiency in Language Generation
TL;DR Researchers propose PHOTON, a hierarchical autoregressive architecture that replaces flat token-by-token scanning with multi-resolution, top-down context access to reduce…