Search papers, labs, and topics across Lattice.
5
0
8
0
PPO can be made sample-efficient and stable for long-horizon reasoning in LLMs by treating the problem as a sequence-level contextual bandit, sidestepping the need for computationally expensive multi-sampling.
By prioritizing diversity over accuracy in experience replay, DyJR significantly boosts LLM reasoning performance in RL, outperforming GRPO and other baselines without sacrificing training efficiency.
Achieve better video editing without retraining by dynamically locking background features based on a "hallucination metric" that detects when the diffusion model is about to go astray.
LLMs can boost autonomous driving behavior classification accuracy to over 94% by fusing numerical time-series data with high-level semantic features.
Achieve accurate single-shot 3D imaging of specular surfaces by intelligently fusing polarization and structured illumination cues using a physics-informed deep learning approach.