Search papers, labs, and topics across Lattice.
4
0
8
This work shows that robust preference alignment benefits from addressing different noise types with targeted interventions rather than uniform regularization, and proposes wDPO, a robust LLM alignment approach with hierarchical winsorization.
Merging RL experts effectively requires balancing sharp, informative signals with stable, dispersed components, a challenge that ResMerge addresses with innovative spectral techniques.
You can detect prompt injection attacks in screenshot-based web agents with 8x speedup and no extra memory by looking for telltale visual "smoothness" and reversed text polarity.
LLMs can be made more accurate *and* more trustworthy with a clever post-training method that selectively amplifies only the reasoning steps that progressively build confidence in the correct answer.