Search papers, labs, and topics across Lattice.
8
1
12
3
LLMs can reason better and generate more diverse outputs by projecting negative samples onto a positive subspace during reinforcement learning.
Test-time RL's vulnerability to noisy pseudo-labels is amplified by group-relative advantage estimation, but can be mitigated with a surprisingly simple debiasing and denoising approach.
Current audio-language models are surprisingly bad at controlling and interpreting subtle vocal cues, failing in nearly half of situational dialogue scenarios.
EVT achieves 86.6% top-1 accuracy on ImageNet-1k without extra training data, redefining the potential of Vision Transformers in computer vision.
Overconfident tokens, often missed by entropy-based methods, carry surprisingly dense corrective signals in on-policy distillation, allowing for near-baseline performance with <10% of tokens.
Robots can now learn contact-rich manipulation skills like humans by feeling the forces involved, thanks to a new multimodal interface that captures synchronized visual, tactile, and force data.
A principled framework for General World Models reveals the limitations of current systems and the architectural requirements for future progress.
Overconfident errors in RLVR monopolize probability mass and suppress exploration, but a confidence-aware penalty fixes this and boosts mathematical reasoning performance.