Mohammad Shoeybi

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Architecture Design (Transformers, SSMs, MoE) (2)Tool Use & Agents (2)Multimodal Models (2)Reasoning & Chain-of-Thought (2)

Frequent co-authors

M. Shoeybi (3)Bryan Catanzaro (3)Wei Ping (3)Abhinav Khattar (2)

Papers (6)

Apr 14, 2026

AI21w ago·also NVIDIA, Communication University of China, LARK Lab, UT Austin +1

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Nemotron 3 Super proves you can achieve comparable accuracy to existing 120B models, but with significantly higher inference throughput, by combining Mamba, Attention, and Mixture-of-Experts.

Aakshita Chandiramani, Aaron Blakeman, Abdullahi Olaoye +529

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Tool Use & Agents

Apr 13, 2026

NVIDIA1w ago·also IIT Delhi, Indraprastha Institute of Information, Jaypee Institute of Information, UMD

Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music

Audio-language models can now reason about 30-minute-long audio clips with timestamp-grounded intermediate steps, unlocking a new level of fine-grained understanding.

Sreyan Ghosh, Arushi Goel, Kaousheik Jayakumar +17

Multimodal Models Open-Source Models & Weights Speech & Audio

Mar 19, 2026

NVIDIAMar 19, 2026·also HKUST, Waterloo

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

A 30B MoE model can now achieve Gold Medal-level performance in IMO, IOI, and ICPC, rivaling frontier models with 20x more parameters.

Zhuoling Yang, Zhuolin Yang, Yang Chen +25

Code Generation & Program Synthesis Reasoning & Chain-of-Thought RLHF & Preference Learning

Mar 14, 2026

NVIDIAMar 14, 2026·also UMD

MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos

Current multimodal models are surprisingly bad at understanding long, complex videos, struggling to integrate audio, visual, and text cues even for basic reasoning tasks.

Vatsal Agarwal, Katie Lyons, James Case +7

Eval Frameworks & Benchmarks Multimodal Models Reasoning & Chain-of-Thought

Mar 8, 2026

NVIDIAMar 8, 2026·also Tongji

Scalable Training of Mixture-of-Experts Models with Megatron Core

Training trillion-parameter Mixture-of-Experts models just got a whole lot faster: Megatron Core now achieves >1 PFLOP/GPU on NVIDIA's latest hardware.

Zijie Yan, Hongxiao Bai, Xin Yao +35

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Training Efficiency & Optimization

Feb 24, 2026

NVIDIAFeb 24, 2026

On Data Engineering for Scaling LLM Terminal Capabilities

Forget hand-crafted datasets: a new synthetic data pipeline lets smaller LLMs beat giants at terminal tasks.

Renjie Pi, Grace Lam, Mohammad Shoeybi +5

Data Curation & Synthetic Data Tool Use & Agents Training Efficiency & Optimization

Search

Mohammad Shoeybi

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (6)