Search papers, labs, and topics across Lattice.
3
0
6
By reflecting on its own reasoning, ReflectRM achieves a +10.2 improvement in mitigating positional bias compared to leading generative reward models, making it a far more stable evaluator.
Models can learn to self-differentiate between tasks requiring rigorous planning versus direct generation in creative writing, unlocking a new level of meta-cognitive ability.
Multimodal embeddings get a serious upgrade with CoCoA, a new pre-training method that forces models to compress all input information into a single token for reconstruction, leading to substantial quality gains.