new paper showing tree based routers might be helpful
specialized language routers + MoE layers for language translation etc.