Jul 21, 2025arXiv:2507.16099

TorchAO: PyTorch-Native Training-to-Serving Model Optimization

Andrew Or, Apurva Jain, Daniel Vega-Myhre, Jesse Cai, Charles David Hernandez, Zhenrui Zheng, Driss Guessous, V. Kuznetsov, Christian Puhrsch, Mark-Albert Saroufim, Supriya Rao, Thien Tran, Aleksandar Samardzic

AI Summary

The paper introduces TorchAO, a PyTorch-native framework for end-to-end model optimization via quantization and sparsity techniques. It supports FP8 quantized training, QAT, PTQ, and 2:4 sparsity, using a novel tensor subclass abstraction for backend-agnostic low-precision data types like INT4/8 and various FP8 formats. TorchAO integrates with tools like TorchTitan, TorchTune, HuggingFace, and vLLM to provide a unified training-to-serving workflow, exemplified by its use in quantizing Llama 3.2 1B/3B and LlamaGuard3-8B.

Key Contribution

Quantize your models with ease: TorchAO offers a PyTorch-native, end-to-end workflow for model optimization using quantization and sparsity, already powering quantized Llama 3.2 and LlamaGuard3.

Abstract

We present TorchAO, a PyTorch-native model optimization framework leveraging quantization and sparsity to provide an end-to-end, training-to-serving workflow for AI models. TorchAO supports a variety of popular model optimization techniques, including FP8 quantized training, quantization-aware training (QAT), post-training quantization (PTQ), and 2:4 sparsity, and leverages a novel tensor subclass abstraction to represent a variety of widely-used, backend agnostic low precision data types, including INT4, INT8, FP8, MXFP4, MXFP6, and MXFP8. TorchAO integrates closely with the broader ecosystem at each step of the model optimization pipeline, from pre-training (TorchTitan) to fine-tuning (TorchTune, Axolotl) to serving (HuggingFace, vLLM, SGLang, ExecuTorch), connecting an otherwise fragmented space in a single, unified workflow. TorchAO has enabled recent launches of the quantized Llama 3.2 1B/3B and LlamaGuard3-8B models and is open-source at https://github.com/pytorch/ao/.

Inference & Quantization Open-Source Models & Weights Training Efficiency & Optimization

Citation Metrics

Citations5

Influential citations0

References45

Year2025

VenuearXiv.org

Related Papers

Finding related papers...

Search

TorchAO: PyTorch-Native Training-to-Serving Model Optimization

Related Papers