Allen Institute: EMO — MoE language model with natural semantic modularity from data
EMO is a new MoE language model from the Allen Institute with 1B active and 14B total parameters, trained on 1 trillion tokens. Experts self-organize into semantic domains — with 25% of active experts the performance loss is just 1%.