Search papers, labs, and topics across Lattice.
University of Illinois Urbana-Champaign
4
0
6
LLMs may outshine human instructors in feedback ratings, but their complexity masks critical differences in targeted sentence feedback.
DFlare achieves up to 5.52x speedup in LLM inference by allowing draft layers to independently leverage richer target knowledge, breaking through previous capacity constraints.
A 440MB multilingual translation model now rivals commercial APIs, opening the door for performant on-device translation.
Speculative decoding gets a throughput boost of up to 4.32x by using reinforcement learning to dynamically balance drafting and verification.