Search papers, labs, and topics across Lattice.
5 papers from Apple ML Research on Architecture Design (Transformers, SSMs, MoE)
Outlier tokens in Diffusion Transformers aren't just extreme values; they corrupt local patch semantics, and can be tamed with Dual-Stage Registers to boost image generation quality.
Forget coarse sequence-level hacks: LenVM lets you precisely dial in token generation length, boosting a 7B model's length accuracy from 30.9 to 64.8 and crushing closed-source rivals.
Forget full KV caches: randomly routing attention across layers during training lets you drastically cut memory without hurting performance, and sometimes even helps.
Text-to-video generation gets a 1.58x speed boost with CalibAtt, a training-free method that exploits consistent sparsity patterns in attention layers.
Ditch slow, external segmentation pipelines: TrajTok learns trajectory tokens end-to-end, boosting video understanding while staying lean and adaptable.