Search papers, labs, and topics across Lattice.
6
0
7
LLMs that ace code generation often fail to grasp intended program semantics, as evidenced by a stark performance decline when generating executable behavioral specifications on the new CodeSpecBench benchmark.
LRMs can be made more efficient and accurate by strategically adjusting their output length based on task difficulty, leading to a better accuracy-length trade-off.
By progressively refining the reward signal based on the distribution of model confidence, DistriTTRL achieves significant performance gains in RL by better aligning internal information between training and test time and mitigating reward hacking.
Instead of directly aligning to a flawed pseudo-source domain in test-time adaptation, a semantic bridge approach significantly boosts performance by first rectifying the pseudo-source using universal semantics.
By modeling the distribution of confidence scores, DistriVoting significantly boosts the accuracy of large reasoning models, outperforming existing confidence-based selection methods across diverse benchmarks.
Forget difficulty-based heuristics: InSight leverages weighted mutual information to select RL training data, boosting LLM reasoning and alignment with up to 2.2x speedup.