Search papers, labs, and topics across Lattice.
This paper investigates data engineering strategies for training large language models to improve their terminal capabilities, focusing on synthetic data generation and training techniques. They introduce Terminal-Task-Gen, a pipeline for generating synthetic terminal tasks, and Terminal-Corpus, a large-scale dataset created using this pipeline. Training Nemotron-Terminal models (8B, 14B, 32B) on this dataset, initialized from Qwen3, significantly improves performance on Terminal-Bench 2.0, demonstrating the effectiveness of their data engineering approach.
Forget hand-crafted datasets: a new synthetic data pipeline lets smaller LLMs beat giants at terminal tasks.
Despite rapid recent progress in the terminal capabilities of large language models, the training data strategies behind state-of-the-art terminal agents remain largely undisclosed. We address this gap through a systematic study of data engineering practices for terminal agents, making two key contributions: (1) Terminal-Task-Gen, a lightweight synthetic task generation pipeline that supports seed-based and skill-based task construction, and (2) a comprehensive analysis of data and training strategies, including filtering, curriculum learning, long context training, and scaling behavior. Our pipeline yields Terminal-Corpus, a large-scale open-source dataset for terminal tasks. Using this dataset, we train Nemotron-Terminal, a family of models initialized from Qwen3(8B, 14B, 32B) that achieve substantial gains on Terminal-Bench 2.0: Nemotron-Terminal-8B improves from 2.5% to 13.0% Nemotron-Terminal-14B improves from 4.0% to 20.2%, and Nemotron-Terminal-32B improves from 3.4% to 27.4%, matching the performance of significantly larger models. To accelerate research in this domain, we open-source our model checkpoints and most of our synthetic datasets at https://huggingface.co/collections/nvidia/nemotron-terminal.