Search papers, labs, and topics across Lattice.
3
0
6
2
By dynamically injecting frequency-aware n-gram features, X-GRAM achieves state-of-the-art accuracy with smaller embedding tables, offering a practical path to scaling memory-augmented architectures.
Ditching caches for compiler-managed data streams, Li Auto's M100 architecture achieves higher utilization than GPUs on autonomous driving tasks, hinting at a new path for efficient AI inference.
Forget months of architecture search: this hardware co-design framework slashes the time to days and beats Qwen2.5-0.5B's perplexity by 19% at the same latency on NVIDIA Jetson Orin.