Search papers, labs, and topics across Lattice.
3
0
8
LLM agents can appear to reason well (high entropy) while completely ignoring the input, and mutual information is a far better metric for catching this failure.
Row-normalized optimizers can match Muon's performance on large language models while being faster in large-token and low-loss regimes, offering a practical alternative for pre-training.
Escape the curse of horizon and memory in offline POMDPs by exploiting the geometry of belief space, leading to tighter error bounds and improved sample efficiency in off-policy evaluation.