Search papers, labs, and topics across Lattice.
Qinzheng Sun1
8
23
12
19
Language models can learn directly from real-world user interactions, boosting performance without human annotations or simulated environments.
By rethinking RLHF, MicroCoder-GRPO enables smaller code generation models to rival larger counterparts, achieving significant performance gains and revealing 34 training insights.
Forget massive datasets – targeted training on a smaller, carefully curated dataset of challenging competitive programming problems yields 3x faster gains in code generation performance.
Forget unimodal tasks—UniM throws down the gauntlet for truly unified multimodal AI, demanding models juggle any combination of text, image, audio, video, code, documents, and 3D inputs and outputs in a single, interleaved stream.
1.58-bit LLMs are surprisingly more resilient to sparsity than their full-precision counterparts, opening new avenues for extreme compression.
Unlock 33% faster LLM inference on commodity GPUs with SlideSparse, which finally brings hardware-accelerated (2N-2):2N sparsity to the masses, bridging the accuracy gap left by NVIDIA's strict 2:4 pruning.
Speculative decoding gets a throughput boost of up to 4.32x by using reinforcement learning to dynamically balance drafting and verification.
A 1-bit LLM can match the performance of full-precision models, promising huge gains in efficiency.