
MIT CSAIL
MIT's Computer Science and Artificial Intelligence Laboratory. One of the largest and oldest AI labs in academia.
www.csail.mit.edu12
470
39
Top Researchers
Recent Papers
This paper investigates the influence of team dynamics on OSS project selection by surveying 198 OSS practitioners. The study reveals that communication-related team dynamics like responsiveness and clarity are consistently prioritized, but the relative importance varies based on contributor motivations such as gaining reputation or networking. The findings demonstrate that aligning team dynamics with contributor motivations is crucial for understanding project selection behavior and designing better project recommendation systems.
Empirically demonstrates that team dynamics, particularly communication-related aspects, significantly influence OSS project selection, with the relative importance of specific dynamics varying based on contributor motivations.
This paper analyzes quantized matrix multiplication (MatMul) for efficient LLM deployment, considering both generic and weight-only quantization scenarios. It derives information-theoretic rate-distortion tradeoffs and benchmarks practical quantization schemes like absmax INT and floating-point against these limits, quantifying their rate loss. The authors then introduce "WaterSIC," a waterfilling-based quantization scheme for weight-only quantization that outperforms existing methods like GPTQ by adapting rate allocation to the covariance matrix, achieving near-optimal performance within 0.25 bits/entry of the information-theoretic limit.
Introduces WaterSIC, a novel waterfilling-based quantization scheme for weight-only matrix multiplication that achieves near-optimal rate-distortion performance and improves upon existing methods like GPTQ.
This study establishes SSL as a promising paradigm for ECG analysis, particularly in settings with limited annotated data, enhancing accessibility, generalizability, and fairness in AI-driven cardiac diagnostics across diverse clinical environments and questions.
This study establishes SSL as a promising paradigm for ECG analysis, particularly in settings with limited annotated data, enhancing accessibility, generalizability, and fairness in AI-driven cardiac diagnostics across diverse clinical environments and questions.
This paper introduces DreamWaQ++, a multimodal reinforcement learning framework that fuses proprioceptive and exteroceptive information for robust quadrupedal locomotion in complex environments. The approach trains a controller capable of agile navigation across challenging terrains like rough ground, steep slopes, and high stairs, while also exhibiting resilience to out-of-distribution scenarios. Key to the success is the fusion of proprioceptive feedback with exteroceptive data to enable obstacle avoidance and adaptive gait planning.
Introduces a resilient multimodal reinforcement learning framework, DreamWaQ++, that effectively fuses proprioception and exteroception for robust quadrupedal locomotion in challenging environments.
The paper introduces GPT-5, a unified system comprising a fast, general-purpose model and a deeper reasoning model, managed by a real-time router trained on user feedback and performance metrics. GPT-5 demonstrates improved performance on benchmarks, faster response times, and enhanced utility for real-world queries, with significant reductions in hallucinations, improved instruction following, and minimized sycophancy. The system incorporates "safe-completions" for safety and is treated as High capability in the Biological and Chemical domain under OpenAI's Preparedness Framework, triggering associated safeguards.
Introduces a unified GPT-5 system with a real-time router that dynamically selects between a fast, general-purpose model and a deeper reasoning model based on query characteristics, optimizing for speed and accuracy.
The International AI Safety Report 2025's Second Key Update analyzes the current state of AI risk management and technical mitigations employed by researchers, companies, and governments. It highlights advancements in training safer models and monitoring outputs while acknowledging uncertainties in the effectiveness of these measures and their variability across applications. The report aims to inform policymakers, researchers, and the public about progress and remaining gaps in AI safety.
Synthesizes recent developments in AI risk management and technical risk mitigation strategies, identifying both progress and persistent gaps in ensuring the safety of general-purpose AI systems.
The paper introduces a Dual Loop Data Cleaning (DLDC) method to automatically generate high-quality remote sensing image-text training data by leveraging contrastive multimodal quality evaluations. DLDC uses an external generation loop (EGL) based on a multimodal foundational model for layout description and an internal evaluation loop (IEL) based on contrastive learning metrics to assess image-text matching. Fine-tuning T2I models with the cleaned dataset results in significant improvements in image generation quality, as evidenced by substantial reductions in FID and increases in CLIP and RemoteCLIP scores, and improved downstream segmentation performance.
Introduces a dual-loop data cleaning method (DLDC) that automatically generates high-quality remote sensing image-text training data, eliminating the need for manual annotation.
This paper introduces gpt-oss-120b and gpt-oss-20b, two open-weight reasoning models built using a mixture-of-experts transformer architecture and trained via large-scale distillation and reinforcement learning. These models are optimized for agentic capabilities, including research browsing and tool use, and utilize a chat format for instruction following. The authors demonstrate strong performance on mathematics, coding, and safety benchmarks and release the model weights and related resources under an Apache 2.0 license.
Introduces and releases the weights for gpt-oss-120b and gpt-oss-20b, two open-weight reasoning models with strong agentic capabilities and performance across diverse benchmarks.
The paper introduces ChartGen, a fully automated pipeline for generating synthetic chart image-code pairs to improve chart understanding in vision-language models (VLMs). ChartGen leverages a VLM to reconstruct seed chart images into Python scripts and then uses a code-oriented LLM to iteratively augment these scripts, creating a diverse dataset. The authors generated 222.5K unique chart image-code pairs and used a held-out evaluation set to benchmark six open-weight VLMs, demonstrating significant room for improvement in chart-to-code reconstruction.
Introduces ChartGen, a novel code-guided synthetic chart generation pipeline that significantly expands the availability of chart image-code pairs for training and evaluating VLMs.
The authors developed and compared two open-source foundation models for ECG interpretation: DeepECG-SSL, a self-supervised model pretrained with contrastive learning and masked lead modeling, and DeepECG-SL, a supervised model. Both models were trained on over 1 million ECGs to predict 77 cardiac conditions and were evaluated on multiple datasets for ECG interpretation and digital biomarker tasks. DeepECG-SSL outperformed DeepECG-SL on digital biomarker tasks with limited labeled data, demonstrating the potential of self-supervised learning for ECG analysis, while both models showed minimal performance disparities across age and gender.
Demonstrates the efficacy of self-supervised learning for ECG analysis, particularly in low-data regimes, by developing and evaluating DeepECG-SSL, an open-source foundation model that outperforms its supervised counterpart on digital biomarker tasks.
This paper introduces a VQA system leveraging Visual BERT, ViLT, cross-modal memory networks, memory-augmented attention, and vision-language pre-training models (Flamingo, BLIP) for improved multimodal fusion and dynamic memory retrieval. The system addresses complex reasoning by adapting to novel question types through few-shot learning. Experiments on VQA v2.0 demonstrate 80% accuracy, surpassing LSTM-CNN and attention-only baselines, alongside improved BLEU scores and precision-recall metrics.
Demonstrates a modular VQA architecture that integrates multiple deep learning techniques to achieve state-of-the-art performance on complex reasoning tasks.
This paper introduces an Adaptable Reinforcement Learning-oriented Multifaceted Data Combination (AdRL-MDC) system to train a robotic hand for gaming, aiming to improve accuracy and consistency in motion management. The system integrates an adaptable training process for ensemble classification, a reinforcement learning paradigm for robot intelligence, and a multifaceted data combination framework. Experimental results demonstrate that the CNN-based ensemble framework achieves high accuracy with efficient computation, and the depth vision-oriented CNN classification algorithm attains 100% recognition accuracy.
Introduces an adaptable reinforcement learning framework (AdRL-MDC) that combines CNNs and RL to achieve high accuracy and robustness in robotic hand motion control for gaming.

