Ran Zilberstein

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Architecture Design (Transformers, SSMs, MoE) (3)Inference & Quantization (3)Multimodal Models (1)Speech & Audio (1)

Frequent co-authors

Pavlo Molchanov (3)Nave Assaf (3)Tomer Asida (3)Yonatan Geifman (3)

Papers (4)

Apr 27, 2026

NVIDIA4d ago·also Amazon Science

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

Multimodal models can now handle audio natively with improved efficiency, achieving state-of-the-art results in complex tasks like document understanding and agentic computer use.

Nvidia Amala Sanjay Deshmukh, K. Chumachenko, Tuomas Rintamaki +200

Architecture Design (Transformers, SSMs, MoE)Multimodal Models Speech & Audio

Apr 14, 2026

AI22w ago·also NVIDIA, UT Austin, Waterloo, Xiaomi Robotics +1

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Nemotron 3 Super proves you can achieve comparable accuracy to existing 120B models, but with significantly higher inference throughput, by combining Mamba, Attention, and Mixture-of-Experts.

Aakshita Chandiramani, Aaron Blakeman, Abdullahi Olaoye +481

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Tool Use & Agents

Feb 12, 2026

NVIDIAFeb 12, 2026·also Technion

Extending Puzzle for Mixture-of-Experts Reasoning Models with Application to GPT-OSS Acceleration

You can slash LLM inference costs without sacrificing quality by strategically pruning experts, quantizing, and swapping full attention for windowed attention, as demonstrated on gpt-oss-120B.

A. Bercovich, Nir Ailon, Vladimir Anisimov +21

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Open-Source Models & Weights

Feb 10, 2026

Talor Abramovich +7Feb 10, 2026

SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

Synthetic data can significantly overestimate the real-world throughput gains from speculative decoding, highlighting the critical need for benchmarks like SPEED-Bench that use diverse, production-realistic workloads.

Talor Abramovich, Maor Ashkenazi, Izzy Putterman +5

Eval Frameworks & Benchmarks Inference & Quantization

Search

Ran Zilberstein

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (4)