Lattice AI Research

Research focus

Inference & Quantization (2)Reasoning & Chain-of-Thought (1)Tool Use & Agents (1)Natural Language Processing (1)

Frequent co-authors

Z. Li (2)Zuchao Li (2)Yi Zhao (1)Yajuan Peng (1)

Papers (2)

Apr 16, 2026

Yi Zhao +73d ago·also WHU

TrigReason: Trigger-Based Collaboration between Small and Large Reasoning Models

Achieve LLaMA-level reasoning accuracy with 44% lower latency and 73% lower API costs by strategically offloading work from large to small models only when needed.

Yi Zhao, Yajuan Peng, Cam-Tu Nguyen +5

Inference & Quantization Reasoning & Chain-of-Thought Tool Use & Agents

3d ago

RACER: Retrieval-Augmented Contextual Rapid Speculative Decoding

LLM inference gets a 2x speed boost without training, thanks to a clever technique that merges retrieval with logit-based speculation.

Zihong Zhang, Zihong Zhang, Zuchao Li +5

Inference & Quantization Natural Language Processing Recommendation & Information Retrieval

Hai Zhao

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (2)

Search

Hai Zhao

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (2)