Open-Source Models & Weights

Infrastructure

Open-weight model releases, reproducibility, model licensing, and community-driven AI development.

Keywords

open sourceopen weightsmodel releasereproducibilityLLaMAopen modelmodel licensingcommunity model

Recent Papers

Feb 12, 2026

Lahore University of2d ago

On the Adoption of AI Coding Agents in Open-source Android and iOS Development

This paper presents an empirical study of AI coding agent contributions in open-source Android and iOS mobile app development by analyzing 2,901 AI-authored pull requests (PRs) from 193 GitHub repositories. The study reveals that Android projects receive more AI-authored PRs and exhibit higher acceptance rates compared to iOS, with routine tasks showing higher acceptance rates than structural changes. The analysis also indicates an initial improvement followed by a decline in PR resolution time on Android, providing insights into the evolving impact of AI agents on OSS mobile projects.

Empirically characterizes the effects of AI coding agents on open-source Android and iOS mobile app projects by analyzing PR acceptance behaviors across platforms, agents, and task categories.

Hasnain Ali, Muneeb Rana, Muhammad Saqib Ilyas +12602.12144

Code Generation & Program SynthesisTool Use & AgentsOpen-Source Models & Weights

2d ago

DHPLT: large-scale multilingual diachronic corpora and word representations for semantic change modelling

The paper introduces DHPLT, a large-scale multilingual diachronic corpus comprising web-crawled data from 41 languages across three time periods (2011-2015, 2020-2021, 2024-present). The authors leverage web crawl timestamps as a proxy for document creation time, providing 1 million documents per time period per language. They also provide pre-computed word embeddings and lexical substitutions to facilitate semantic change modeling research, addressing the scarcity of such resources for many languages.

Introduces DHPLT, a novel multilingual diachronic corpus with pre-computed embeddings and lexical substitutions, designed to facilitate research in semantic change modeling across 41 languages.

Mariia Fedorova, Andrey Kutuzov, Khonzoda Umarova2602.11968

Data Curation & Synthetic DataNatural Language ProcessingOpen-Source Models & Weights

2d ago

Cross-Architecture Model Diffing with Crosscoders: Unsupervised Discovery of Differences Between LLMs

This paper extends crosscoder model diffing to cross-architecture comparisons, enabling the unsupervised discovery of behavioral differences between LLMs with different architectures. They introduce Dedicated Feature Crosscoders (DFCs), an architectural modification to improve the isolation of unique features in one model compared to another. Applying this technique, they identify features such as CCP alignment in Qwen3-8B and Deepseek-R1-0528-Qwen3-8B, American exceptionalism in Llama3.1-8B-Instruct, and a copyright refusal mechanism in GPT-OSS-20B.

Introduces Dedicated Feature Crosscoders (DFCs), an architectural modification to enhance crosscoder model diffing for isolating features unique to individual models in cross-architecture comparisons.

Thomas Jiralerspong, Trenton Bricken2602.11729

Interpretability & Mechanistic InterpArchitecture Design (Transformers, SSMs, MoE)Open-Source Models & Weights

2d ago

Extending Puzzle for Mixture-of-Experts Reasoning Models with Application to GPT-OSS Acceleration

The authors extend the Puzzle post-training neural architecture search framework to optimize the gpt-oss-120B model, creating gpt-oss-puzzle-88B, by combining heterogeneous MoE expert pruning, selective attention replacement, FP8 quantization, and post-training reinforcement learning. This optimized model achieves significant per-token throughput speedups (up to 2.82X on a single H100 GPU) while maintaining or slightly exceeding the parent model's accuracy across various benchmarks. The paper advocates for request-level efficiency metrics to account for varying token counts and demonstrates that gpt-oss-puzzle-88B improves request-level efficiency by up to 1.29X.

Introduces a pipeline combining heterogeneous MoE expert pruning, selective attention replacement, FP8 quantization, and post-training reinforcement learning within the Puzzle framework to optimize large language models for inference.

A. Bercovich, Nir Ailon, Vladimir Anisimov +212602.11937

Architecture Design (Transformers, SSMs, MoE)Inference & QuantizationOpen-Source Models & Weights

2d ago

Scaling Model and Data for Multilingual Machine Translation with Open Large Language Models

This paper investigates the impact of model and data scaling on multilingual machine translation (MT) performance using open large language models (LLMs). The authors adapt Gemma3 models via continual pretraining and instruction finetuning, creating MiLMMT-46, a model covering 46 languages. Results demonstrate that MiLMMT-46 surpasses existing open-source SOTA models and rivals proprietary systems like Google Translate and Gemini 3 Pro in multilingual translation quality.

Demonstrates that scaling model size and training data via continual pretraining and instruction finetuning significantly improves the multilingual translation capabilities of open LLMs, achieving performance competitive with proprietary systems.

Wei Liu, Jian Luan2602.11961

Scaling Laws & Emergent AbilitiesNatural Language ProcessingOpen-Source Models & Weights

2d ago

PatientHub: A Unified Framework for Patient Simulation

The paper introduces PatientHub, a unified framework to standardize the creation, composition, and deployment of simulated patients for training counselors and scaling therapeutic assessment using Large Language Models. PatientHub addresses the fragmentation in existing patient simulation approaches by providing standardized data formats, prompts, and evaluation metrics, thus improving reproducibility and enabling fair comparisons. The authors demonstrate PatientHub's utility through case studies, showcasing standardized cross-method evaluation, seamless integration of custom evaluation metrics, and the prototyping of new simulator variants.

Introduces PatientHub, a modular framework that unifies patient simulation by standardizing data formats, prompts, and evaluation metrics to facilitate reproducibility and fair comparison of different methods.

Sahand Sabour, NG TszYam2602.11684

Eval Frameworks & BenchmarksOpen-Source Models & WeightsNatural Language Processing

Università della Svizzera2d ago

Improving Code Generation via Small Language Model-as-a-judge

This paper investigates the effectiveness of using small language models (SLMs) as judges to improve code generation, particularly in scenarios where large language models (LLMs) may underperform. The authors train and evaluate several state-of-the-art SLMs to discriminate between correct and incorrect code implementations, focusing on classification accuracy. Results demonstrate that modern SLMs, even without execution-based information, outperform previous approaches and achieve comparable performance to much larger LLMs when used as code rankers, offering a cost-effective alternative for code generation.

Demonstrates that modern small language models can effectively serve as code correctness judges and rankers, achieving performance competitive with much larger language models at a significantly reduced cost.

Giuseppe Crupi, Rosalia Tufano, Gabriele Bavota2602.11911

Code Generation & Program SynthesisTraining Efficiency & OptimizationOpen-Source Models & Weights

2d ago

VIRENA: Virtual Arena for Research, Education, and Democratic Innovation

The paper introduces VIRENA, a virtual platform designed for controlled experimentation within realistic social media environments, addressing limitations in data access and ethical constraints in studying online dynamics. VIRENA allows researchers to simulate feed-based platforms and messaging apps, enabling interactions between human participants and LLM-powered AI agents with configurable personas. The platform's no-code interface facilitates manipulation of content moderation, scheduling of stimuli, and execution of experiments, making it accessible for studying human-AI interaction, moderation interventions, and group deliberation.

Introduces VIRENA, a novel virtual platform enabling controlled social media experiments with human and AI participants, featuring a no-code interface and realistic platform simulations.

Emma Hoes, K. J. Klueser, Fabrizio Gilardi2602.12207

Data Curation & Synthetic DataOpen-Source Models & WeightsConstitutional AI & AI Ethics

Deakin University2d ago·affiliated lab: MIT CSAIL

Beyond Code: Empirical Insights into How Team Dynamics Influence OSS Project Selection

This paper investigates the influence of team dynamics on OSS project selection by surveying 198 OSS practitioners. The study reveals that communication-related team dynamics like responsiveness and clarity are consistently prioritized, but the relative importance varies based on contributor motivations such as gaining reputation or networking. The findings demonstrate that aligning team dynamics with contributor motivations is crucial for understanding project selection behavior and designing better project recommendation systems.

Empirically demonstrates that team dynamics, particularly communication-related aspects, significantly influence OSS project selection, with the relative importance of specific dynamics varying based on contributor motivations.

Shashiwadana Nirmani, Hourieh Khalajzadeh, Mojtaba Shahin2602.11692

Code Generation & Program SynthesisOpen-Source Models & WeightsRecommendation & Information Retrieval

2d ago

Leveraging Language Models to Discover Evidence-Based Actions for OSS Sustainability

The paper introduces a RAG-pipeline and two-layer prompting strategy to extract actionable recommendations (ReACTs) for improving OSS sustainability from software engineering literature. They systematically explore open LLMs and prompting techniques to derive candidate ReACTs from ICSE and FSE papers, followed by a filtering and refinement stage to ensure quality and extract supporting evidence. The pipeline generates 1,922 ReACTs, with 1,312 meeting strict quality criteria, providing a structured and scalable approach to translate research findings into practical guidance for OSS projects.

Introduces a novel RAG-pipeline leveraging LLMs to extract and structure evidence-based, actionable recommendations (ReACTs) from software engineering literature for improving OSS project sustainability.

Vladimir Filkov2602.11746

Natural Language ProcessingCode Generation & Program SynthesisOpen-Source Models & WeightsRecommendation & Information Retrieval

2d ago

Verifiable Provenance of Software Artifacts with Zero-Knowledge Compilation

This paper introduces zk-compilation, a novel approach to verifiable software provenance by executing a compiler within a zero-knowledge virtual machine (zkVM). This method generates both the compiled output and a cryptographic proof that the compilation was performed on the claimed source code with the specified compiler. The authors demonstrate the feasibility of zk-compilation using the RISC Zero zkVM and the ChibiCC C compiler, evaluating it on synthetic programs, OpenSSL, and libsodium source files, showing strong security guarantees against various attacks.

Introduces and demonstrates zk-compilation, a novel method for verifiable software provenance using zero-knowledge virtual machines.

Javier Ron, Martin Monperrus2602.11887

Code Generation & Program SynthesisOpen-Source Models & Weights

Feb 10, 2026

4d ago

GeoFormer: A Swin Transformer-Based Framework for Scene-Level Building Height and Footprint Estimation from Sentinel Imagery

The paper introduces GeoFormer, a Swin Transformer-based framework for jointly estimating building height (BH) and footprint (BF) from Sentinel-1/2 imagery and open DEM data. By using a geo-blocked splitting strategy for training and evaluation across 54 diverse cities, the authors address the challenge of cross-city generalization in urban data estimation. GeoFormer achieves a BH RMSE of 3.19 m and a BF RMSE of 0.05, demonstrating significant improvements over CNN baselines and strong cross-continent transferability.

Introduces GeoFormer, a novel Swin Transformer-based architecture, for joint building height and footprint estimation from multi-source satellite imagery, achieving state-of-the-art accuracy and generalization across diverse urban environments.

Jinzhen Han, JinByeong Lee, JiSung Kim +32602.09932

Architecture Design (Transformers, SSMs, MoE)Computer VisionOpen-Source Models & Weights

Feb 9, 2026

5d ago

StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors

The paper introduces StealthRL, a reinforcement learning framework that generates adversarial paraphrases to evade AI-text detectors. StealthRL trains a paraphrase policy using Group Relative Policy Optimization (GRPO) with LoRA adapters on Qwen-3B, optimizing for both detector evasion and semantic similarity. Experiments across six attack settings and three detector families demonstrate StealthRL's ability to achieve near-zero detection rates (0.001 TPR@1%FPR) and high attack success rates (99.9%), even transferring to unseen detector families.

Demonstrates a reinforcement learning approach, StealthRL, for generating adversarial paraphrases that effectively evade multiple AI-text detectors, revealing shared vulnerabilities across detector architectures.

Suraj Ranganath, Atharv Ramesh2602.08934

Red-Teaming & Adversarial RobustnessNatural Language ProcessingOpen-Source Models & Weights

Jan 31, 2026

2w ago

Towards Building Non-Fine-Tunable Foundation Models

The paper introduces Private Mask Pre-Training (PMP), a pre-training framework designed to create foundation models that are broadly usable but resistant to unauthorized fine-tuning. PMP concentrates representation learning into a sparse, privately masked subnetwork, releasing only the final dense weights. This induces a mismatch between the fine-tuning objective and the pre-training geometry for those without the mask, thereby limiting adaptation gains.

Introduces Private Mask Pre-Training (PMP) to build foundation models that are robust against unauthorized fine-tuning by concentrating representation learning in a private, sparse subnetwork.

Ziyao Wang, Nizhang Li, Pingzhi Li +32602.00446

Open-Source Models & WeightsTraining Efficiency & OptimizationArchitecture Design (Transformers, SSMs, MoE)

Jan 28, 2026

2w ago

SERA: Soft-Verified Efficient Repository Agents

The paper introduces Soft-Verified Efficient Repository Agents (SERA), a supervised finetuning method for efficiently training coding agents specialized to private codebases. SERA leverages Soft Verified Generation (SVG) to create thousands of synthetic trajectories from a single repository, enabling rapid and cost-effective specialization. The resulting SERA models achieve state-of-the-art performance among fully open-source models, matching the performance of models like Devstral-Small-2 at a fraction of the cost compared to reinforcement learning or previous synthetic data methods.

Introduces Soft Verified Generation (SVG), a novel method for generating synthetic code trajectories that enables efficient supervised finetuning of coding agents specialized to private codebases.

Ethan Shen, Danny Tormoen, Saurabh Shah +22601.20789

Code Generation & Program SynthesisTraining Efficiency & OptimizationOpen-Source Models & Weights

Jan 7, 2026

Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control

The authors introduce Muse, an open-source system for long-form song generation with fine-grained style conditioning, addressing the lack of reproducibility in academic research due to unavailable training data. They release a dataset of 116k fully licensed synthetic songs with lyrics and style descriptions paired with SunoV5-synthesized audio. Muse, a Qwen-based language model finetuned with discrete audio tokens, achieves competitive performance in phoneme error rate, text-music style similarity, and audio aesthetic quality, demonstrating controllable segment-level generation.

Releases Muse, a fully open-source system for long-form song generation, along with a licensed synthetic dataset and training/evaluation pipelines, to enable reproducible research.

Changhao Jiang, Jiahao Chen, Zhenghao Xiang +142601.03973

Speech & AudioOpen-Source Models & WeightsData Curation & Synthetic Data

Jan 6, 2026

When the Coffee Feature Activates on Coffins: An Analysis of Feature Extraction and Steering for Mechanistic Interpretability

This paper replicates Anthropic's mechanistic interpretability work using sparse autoencoders (SAEs) on Llama 3.1 to extract and steer human-interpretable features, stress-testing the generalizability of these methods. The authors successfully reproduce basic feature extraction and steering, but find significant fragility in feature steering, sensitivity to various parameters, and difficulty in distinguishing thematically similar features. The study concludes that current SAE-based interpretability methods lack the systematic reliability needed for safety-critical applications, suggesting a shift towards prioritizing reliable model output prediction and control.

Demonstrates the fragility and limitations of current SAE-based mechanistic interpretability techniques for Llama 3.1, particularly regarding feature steering and thematic feature differentiation.

Raphael Ronge, Markus Maier, Frederick Eberhardt2601.03047

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic InterpOpen-Source Models & Weights

Dec 31, 2025

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

This paper introduces the Agentic Learning Ecosystem (ALE), an open-source infrastructure comprising ROLL (a post-training framework), ROCK (a sandbox environment manager), and iFlow CLI (an agent framework), designed to streamline agentic model development. They release ROME, an agent trained within ALE on over a million trajectories, utilizing data composition protocols for complex behavior synthesis and a novel Interaction-Perceptive Agentic Policy Optimization (IPA) algorithm for improved long-horizon training. Empirical evaluations on benchmarks like SWE-bench Verified and Terminal Bench Pro demonstrate ROME's strong performance, validating the effectiveness of the ALE ecosystem.

Introduces the Agentic Learning Ecosystem (ALE) and the ROME agent, demonstrating a complete open-source pipeline for training and evaluating agentic models with improved long-horizon stability through Interaction-Perceptive Agentic Policy Optimization (IPA).

Weixun Wang, Xiaoxiao Xu, Wanhe An +852512.24873

Tool Use & AgentsOpen-Source Models & Weights

Dec 22, 2025

Open-Source Multimodal Moxin Models with Moxin-VLM and Moxin-VLA

The paper introduces Moxin 7B, a fully open-source LLM developed with complete transparency in training, datasets, and implementation details. To extend Moxin's capabilities, the authors developed three variants: Moxin-VLM (vision-language), Moxin-VLA (vision-language-action), and Moxin-Chinese. Experiments demonstrate that these models achieve strong performance in their respective domains, leveraging open-source frameworks and data.

Introduces Moxin, a fully transparent and open-source LLM, along with its multimodal and multilingual variants, promoting a collaborative research environment.

Pu Zhao, Xuan Shen, Zhenglun Kong +162512.22208

Multimodal ModelsOpen-Source Models & WeightsArchitecture Design (Transformers, SSMs, MoE)

Dec 17, 2025

Clarkson UniversityDec 17, 2025

An Open-Source Framework for Quality-Assured Smartphone-Based Visible Light Iris Recognition

The paper introduces CUVIRIS, a new dataset of ISO/IEC 29794-6 compliant visible light iris images captured via a custom Android application with real-time quality assessment, and benchmarks two iris recognition systems on this dataset. They also present LightIrisNet, a MobileNetV3-based segmentation model for on-device deployment, and adapt IrisFormer, a transformer-based matcher, to the visible light domain. Experiments demonstrate that the open-source OSIRIS system achieves a TAR of 97.9% at FAR = 0.01 on CUVIRIS, and IrisFormer, trained only on UBIRIS.v2, achieves an EER of 0.057%, indicating the feasibility of accurate smartphone-based iris recognition under controlled conditions.

Provides an open-source framework including a quality-assured VIS iris image dataset, a lightweight segmentation model, and a VIS-adapted transformer-based matcher to advance smartphone-based iris recognition.

Naveenkumar G. Venkataswamy, Yu Liu, Soumyabrata Dey +22512.15548

Computer VisionData Curation & Synthetic DataOpen-Source Models & Weights

Dec 15, 2025

MiniLingua: A Small Open-Source LLM for European Languages

The paper introduces MiniLingua, a 1-billion parameter multilingual LLM trained from scratch on 13 European languages, addressing the limitations of larger, English-centric models. MiniLingua aims to balance language coverage with instruction-following capabilities in a smaller, more efficient model. The instruction-tuned version of MiniLingua outperforms EuroLLM on summarization, classification, and question answering tasks, while remaining competitive on open-ended generation.

Demonstrates that a small, multilingual LLM trained from scratch can outperform larger models with similar training approaches on instruction-following tasks and remain competitive on open-ended generation.

Anna Aksenova, Boris Zverkov, Nicola Dainese +22512.13298

Open-Source Models & WeightsArchitecture Design (Transformers, SSMs, MoE)Natural Language Processing

Dec 13, 2025

AI Transparency Atlas: Framework, Scoring, and Real-Time Model Card Evaluation Pipeline

The paper introduces a weighted transparency framework based on the EU AI Act and Stanford Transparency Index to evaluate AI model documentation, addressing the current fragmentation and inconsistency. They developed an automated multi-agent pipeline leveraging LLMs to extract documentation and score completeness across 50 models, revealing significant gaps, especially in safety-critical categories. The evaluation shows frontier labs achieve higher compliance (around 80%) compared to other providers (below 60%), highlighting areas for improvement in AI transparency.

Introduces a novel weighted transparency framework and automated evaluation pipeline to systematically assess and score the completeness of AI model documentation.

Akhmadillo Mamirov, Faiaz Azmain, Hanyu Wang2512.12443

Constitutional AI & AI EthicsEval Frameworks & BenchmarksOpen-Source Models & Weights

Dec 13, 2025

MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models

The paper introduces MixtureKit, an open-source framework designed to facilitate the construction, training, and analysis of Mixture-of-Experts (MoE) models using pre-trained or fine-tuned models. MixtureKit implements three MoE methods: Traditional MoE, BTX (fine-grained token routing), and BTS (trainable stitch layers for information exchange). Experiments on multilingual code-switched data demonstrate that BTX-based models built with MixtureKit outperform dense baselines, showcasing the framework's utility.

Introduces MixtureKit, a modular open-source framework that simplifies the creation, training, and visualization of Mixture-of-Experts models with multiple routing strategies.

Ahmad Chamma, Omar El Herraoui, Guokan Shang2512.12121

Architecture Design (Transformers, SSMs, MoE)Open-Source Models & WeightsTraining Efficiency & Optimization

Dec 8, 2025

PCMind-2.1-Kaiyuan-2B Technical Report

The paper introduces PCMind-2.1-Kaiyuan-2B, a 2B parameter open-source LLM, designed to improve training efficiency under resource constraints. They employ a Quantile Data Benchmarking method for data mixing, Strategic Selective Repetition for high-quality data leverage, and a Multi-Domain Curriculum Training policy for sample ordering. Kaiyuan-2B achieves competitive performance with state-of-the-art open-source models while using optimized data preprocessing and architectural modifications for FP16 stability.

Introduces a novel training methodology for resource-constrained LLMs, combining quantile data benchmarking, strategic selective repetition, and multi-domain curriculum training.

Kairong Luo, Zhenbo Sun, Xinyu Shi +92512.07612

Training Efficiency & OptimizationOpen-Source Models & WeightsData Curation & Synthetic Data

Dec 5, 2025

From Text to Returns: Using Large Language Models for Mutual Fund Portfolio Optimization and Risk-Adjusted Allocation

This paper investigates the application of Large Language Models (LLMs) to mutual fund portfolio optimization and risk-adjusted asset allocation, aiming to enhance traditional financial decision-making. The authors employed a Retrieval-Augmented Generation (RAG) pipeline, integrating real-time economic data with standard financial optimization techniques, to guide LLMs in generating investment strategies. The study found that the Zypher 7B model outperformed Microsoft Phi 2 and Mistral 7B, consistently producing strategies that maximized investment returns while delivering superior risk-adjusted results.

Demonstrates the efficacy of using LLMs, particularly Zypher 7B, within a RAG framework to generate superior risk-adjusted mutual fund portfolio allocations compared to other LLMs.

A. Hossain, Mufakir Qamar Ansari, Haziq Jeelani +22512.05907

Natural Language ProcessingOpen-Source Models & WeightsTool Use & Agents

Dec 5, 2025

K2-V2: A 360-Open, Reasoning-Enhanced LLM

The paper introduces K2-V2, a fully open large language model (LLM) designed with a focus on reasoning adaptation, conversation, and knowledge retrieval. K2-V2 is claimed to outperform Qwen2.5-72B and approach the performance of Qwen3-235B, positioning it as a leading open-weight model in its size class. The model is trained with explicit infusion of domain knowledge, reasoning skills, long-context understanding, and tool use, and the authors release the full training history and data composition to facilitate continuous training.

Presents K2-V2, a high-performing, fully open LLM specifically engineered for enhanced reasoning capabilities through targeted training data and methodology.

K2 Team Zhengzhong Liu, Liping Tang, Linghao Jin +352512.06201

Reasoning & Chain-of-ThoughtOpen-Source Models & WeightsEval Frameworks & Benchmarks

Dec 2, 2025

COPE: Chain-Of-Thought Prediction Engine for Open-Source Large Language Model Based Stroke Outcome Prediction from Clinical Notes

The authors developed COPE, a Chain-of-Thought (CoT) Outcome Prediction Engine based on sequential open-source LLaMA-3-8B models, to predict 90-day functional outcomes after acute ischemic stroke (AIS) from unstructured clinical notes. COPE first generates clinical reasoning and then outputs a modified Rankin Scale (mRS) prediction. COPE achieved comparable performance to GPT-4.1 and outperformed ClinicalBERT, Clinical ML, and a single-step LLM, demonstrating its potential as a lightweight, interpretable, and privacy-preserving solution for outcome prediction.

Introduces COPE, a novel two-step Chain-of-Thought framework leveraging open-source LLaMA-3-8B models for predicting stroke outcomes from clinical notes.

Yongkai Liu, Helena Feng, B. Jiang +82512.02499

Reasoning & Chain-of-ThoughtOpen-Source Models & WeightsNatural Language Processing

Dec 1, 2025

Faculty of Electrical Engineering and Informatics of the Technical University of KošiceDec 1, 2025

Training of Large Language Model Mistral on Slovak Language Data

This paper fine-tunes the open-weight Mistral 7B LLM on the Araneum Slovacum VII Maximum corpus (5.3B tokens) to create Mistral-SK-7b, a specialized Slovak language model. The motivation is to address the lack of high-quality, open-source LLMs for low-resource languages like Slovak, where commercial models are proprietary. The resulting Mistral-SK-7b exhibits significantly improved grammatical correctness and contextual coherence in Slovak, eliminating issues like code-switching and repetition loops present in the original Mistral 7B.

Demonstrates the effective adaptation of a state-of-the-art LLM for a low-resource language through fine-tuning on a large, relevant corpus, resulting in a publicly available model with improved performance.

Peter Bednár, Marek Dobeš, R. Garabík

Open-Source Models & WeightsNatural Language ProcessingData Curation & Synthetic Data

Nov 27, 2025

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

The authors introduce Z-Image, a 6B-parameter image generation foundation model based on a Scalable Single-Stream Diffusion Transformer (S3-DiT) architecture, designed to be efficient and accessible. They optimize the model lifecycle through data curation and training curriculum, achieving full training in 314K H800 GPU hours and developing Z-Image-Turbo with sub-second inference latency and consumer-grade hardware compatibility via few-step distillation and reward post-training. Z-Image demonstrates comparable or superior performance to larger models in photorealistic image generation and bilingual text rendering, while significantly reducing computational costs.

Introduces an efficient 6B-parameter image generation model, Z-Image, that rivals the performance of much larger proprietary models, demonstrating state-of-the-art results with significantly reduced computational overhead.

Z-Image Team, Huanqia Cai, Sihan Cao +18252511.22699

Computer VisionArchitecture Design (Transformers, SSMs, MoE)Open-Source Models & Weights

Nov 27, 2025

Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem

This paper analyzes the Hugging Face Model Hub download history from June 2020 to August 2025, encompassing 851,000 models and 2.2B downloads, to understand concentration dynamics in the open model economy. The study reveals a shift away from US industry dominance by Google, Meta, and OpenAI towards unaffiliated developers, community organizations, and Chinese industry players like DeepSeek and Qwen. The analysis also identifies trends in model properties, including increased model size, multimodal generation, quantization, and MoE architectures, alongside decreased data transparency.

Provides a comprehensive longitudinal analysis of the open-weight AI model ecosystem, revealing shifts in economic power and model characteristics.

Shayne Longpre, C. Akiki, Campbell Lund +62512.03073

Open-Source Models & WeightsData Curation & Synthetic Data

Nov 24, 2025

HunyuanVideo 1.5 Technical Report

The paper introduces HunyuanVideo 1.5, an 8.3B parameter open-source video generation model achieving state-of-the-art visual quality and motion coherence. This is accomplished through data curation, a DiT architecture with selective and sliding tile attention (SSTA), glyph-aware text encoding for improved bilingual understanding, progressive pre-training and post-training, and an efficient video super-resolution network. The model supports both text-to-video and image-to-video generation across various durations and resolutions, demonstrating superior performance compared to existing open-source alternatives.

Introduces a highly efficient video generation model that achieves state-of-the-art performance with a relatively small parameter count, making it accessible for use on consumer-grade hardware.

Tencent Hunyuan Team132511.18870

Multimodal ModelsArchitecture Design (Transformers, SSMs, MoE)Open-Source Models & Weights

Nov 24, 2025

HuggingR$^{4}$: A Progressive Reasoning Framework for Discovering Optimal Model Companions

The paper introduces HuggingR$^4$, a novel framework for selecting optimal AI models from large repositories like Hugging Face by framing model selection as an iterative reasoning process. HuggingR$^4$ integrates Reasoning, Retrieval, Refinement, and Reflection to decompose user intent, retrieve candidates, refine selections, and validate results. Experiments on a new benchmark of 14,399 user requests demonstrate that HuggingR$^4$ significantly outperforms existing methods in workability and reasonability while reducing token consumption.

Introduces a progressive reasoning framework, HuggingR$^4$, that iteratively selects AI models from large repositories by synergistically integrating reasoning, retrieval, refinement, and reflection.

Shaoyin Ma, Chenggong Hu, Huiqiong Wang +32511.18715

Tool Use & AgentsOpen-Source Models & WeightsReasoning & Chain-of-Thought

Nov 23, 2025

Building Domain-Specific Small Language Models via Guided Data Generation

The paper introduces a cost-efficient pipeline for training domain-specific small language models (SLMs) by combining guided synthetic data generation from a seed corpus with bottom-up domain data curation. This pipeline leverages Domain-Adaptive Pretraining (DAPT), Domain-Specific Fine-tuning (DSFT), and Direct Preference Optimization (DPO). The authors demonstrate the effectiveness of their approach by training DiagnosticSLM, a 3B-parameter model for fault diagnosis, which achieves up to 25% accuracy improvement over larger open-source models on a newly introduced DiagnosticMCQ benchmark and performs competitively on other diagnostic tasks.

Introduces a guided data generation and training pipeline for creating domain-specific small language models that outperforms larger general-purpose models in specialized tasks.

Aman Kumar, E. Amin, Xian Yeow Lee +52511.21748

Data Curation & Synthetic DataOpen-Source Models & WeightsTraining Efficiency & Optimization

Nov 22, 2025

AlphaFold Protein Structure Database 2025: a redesigned interface and updated structural coverage

The AlphaFold Protein Structure Database (AFDB) has been updated to align with the UniProt 2025_03 release, expanding its structural coverage to include isoforms and underlying multiple sequence alignments. A redesigned entry page enhances usability by integrating annotations with an interactive 3D viewer and introducing dedicated domains and summary tabs. This update reinforces AFDB as a key resource for exploring protein sequence-structure relationships.

Enhances the AlphaFold Protein Structure Database by updating its structural coverage, redesigning the user interface for improved accessibility, and integrating annotations with an interactive 3D viewer.

Damian Bertoni, Maxim I. Tsenkov, Paulyna Magana +268

Scientific Discovery & Drug DesignOpen-Source Models & Weights

Nov 19, 2025

OpenPyRo-A1: An Open Python-Based Low-Cost Bimanual Robot for Embodied AI

The authors introduce OpenPyRo-A1, a low-cost (approximately $14K) bimanual humanoid robot with 0.2mm repeatability and 5kg payload per arm, designed to address the scarcity of affordable dual-arm platforms for embodied AI research. They also present a Python-first distributed control framework, installable via pip, to facilitate teleoperation, data collection, and policy deployment. Imitation learning experiments, integrating the robot with perception models, motion planning, and a large language model, demonstrate the platform's stability, user-friendliness, and high precision.

Introduces a complete open-source, low-cost bimanual robot platform, OpenPyRo-A1, along with a Python-based control framework to democratize research in dual-arm manipulation and embodied AI.

Helong Huang, Christopher E. Mower, Guowei Huang +23

Robotics & Embodied AIOpen-Source Models & WeightsArchitecture Design (Transformers, SSMs, MoE)

Nov 13, 2025

Instella: Fully Open Language Models with Stellar Performance

The paper introduces Instella, a family of fully open 3B parameter language models trained on publicly available data, addressing the lack of transparency in high-performing LLMs. Instella achieves state-of-the-art performance among fully open models of comparable size, despite using fewer pre-training tokens. The authors also release Instella-Long (128K context) and Instella-Math (reasoning-focused) variants, demonstrating the versatility of the base model.

Introduces Instella, a family of fully open 3B parameter language models, achieving state-of-the-art performance among fully open models and demonstrating competitive results with leading open-weight models of comparable size.

Jiang Liu, Jialian Wu, Xiaodong Yu +102511.10628

Open-Source Models & WeightsTraining Efficiency & OptimizationDistributed Systems & Hardware

Nov 10, 2025

Llama-Embed-Nemotron-8B: A Universal Text Embedding Model for Multilingual and Cross-Lingual Tasks

The authors introduce Llama-Embed-Nemotron-8B, a new open-weights text embedding model achieving state-of-the-art results on the MMTEB benchmark. The model is trained on a novel data mix of 16.1 million query-document pairs, combining public datasets with synthetically generated data from open-weight LLMs. Key findings include the effectiveness of their data mix, the impact of different contrastive loss implementations, and the benefits of instruction-aware training for various embedding tasks, especially in multilingual scenarios.

Presents a high-performing, fully open-source text embedding model, Llama-Embed-Nemotron-8B, along with comprehensive ablation studies on data mixing, loss functions, and synthetic data generation strategies.

Yauhen Babakhin, Radek Osmulski, Ronay Ak +5112511.07025

Eval Frameworks & BenchmarksOpen-Source Models & WeightsNatural Language Processing

Nov 5, 2025

ITMO UniversityNov 5, 2025

Open-Source Large Language Model Frameworks for Automated Penetration Testing: Opportunities, Challenges, and Solutions

This paper investigates the applicability of open-source LLM frameworks, including both large-scale and lightweight models, for automating penetration testing tasks relevant to commercial security assessments. The study identifies both the potential and limitations of these frameworks in addressing fundamental challenges in penetration testing. The authors propose a practical approach to overcome key limitations and demonstrate the potential of LLM-based frameworks in real-world penetration testing scenarios.

Demonstrates the practical application of open-source LLM frameworks for penetration testing, highlighting their capabilities and limitations, and proposes solutions to address identified challenges.

Nikolai Eritenko, Alexander Menshchikov, Danil Sviridov +2

Red-Teaming & Adversarial RobustnessCode Generation & Program SynthesisOpen-Source Models & Weights

Nov 3, 2025

OpenMENA: An Open-Source Memristor Interfacing and Compute Board for Neuromorphic Edge-AI Applications

The paper introduces OpenMENA, an open-source memristor interfacing system designed for energy-efficient edge AI applications, featuring a reproducible hardware interface, a firmware-software stack with high-level APIs, and a Voltage-Incremental Proportional-Integral (VIPI) programming method. OpenMENA enables weight transfer and on-device adaptation by mitigating device non-idealities through chip-in-the-loop fine-tuning. The system's efficacy is demonstrated through digit recognition and a real-world robot obstacle-avoidance task, showcasing its ability to map localization inputs to motor commands.

Introduces OpenMENA, the first fully open-source memristor interfacing system with integrated hardware, firmware, and software components for edge AI applications.

Ali Safa, Farida Mohsen, Zainab Ali +22511.03747

Distributed Systems & HardwareArchitecture Design (Transformers, SSMs, MoE)Open-Source Models & Weights

Oct 22, 2025

Local Obfuscation by GLINER for Impartial Context Aware Lineage: Development and evaluation of PII Removal system

The authors developed LOGICAL, a PII removal system for clinical notes, by fine-tuning a Generalist and Lightweight Named Entity Recognition (GLiNER) model on a dataset of psychiatric hospital EHRs. This approach addresses the limitations of LLMs, such as high computational costs and data privacy risks, especially in low-resource settings. The fine-tuned GLiNER model achieved a micro-average F1-score of 0.980, outperforming other methods like Gemini-Pro-2.5, while operating efficiently on a standard laptop.

Demonstrates that a fine-tuned, specialized transformer model (GLiNER) provides a more accurate, computationally efficient, and secure solution for PII removal from clinical notes compared to larger LLMs and cloud-based services.

P. Shivaprakash, Lekhansh Shukla, Animesh Mukherjee +22510.19346

Natural Language ProcessingConstitutional AI & AI EthicsOpen-Source Models & Weights

Oct 15, 2025

“One of Silicon Valley’s Most Divisive Topics”: How the Media Discusses Openness in AI

This paper analyzes the framing of AI openness in 223 news articles from the U.S., France, and China, revealing inconsistencies and oversimplifications in media portrayals. The study finds that inaccurate terminology, misleading information, and a binary "open vs. closed" framing impede effective communication about AI openness. The authors highlight the media's focus on a limited number of models and call for the AI community to contribute to a more nuanced and accurate public discourse.

Reveals how media coverage of AI openness is often inaccurate, oversimplified, and heterogeneous across news sources, hindering effective communication and potentially misinforming public opinion.

Tamara Paris, Jin L.C. Guo, AJung Moon

Constitutional AI & AI EthicsOpen-Source Models & WeightsNatural Language Processing

Oct 15, 2025

Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs

The authors introduce Honey-Data-15M, a high-quality SFT dataset of 15M QA pairs enhanced with dual-level CoT, and HoneyPipe, a data curation pipeline built on the DataStudio framework. They trained Bee-8B on Honey-Data-15M, achieving state-of-the-art performance among fully open MLLMs, rivaling semi-open models like InternVL3.5-8B. This work demonstrates the importance of high-quality data for developing competitive fully open MLLMs.

Introduces a comprehensive suite of resources, including a high-quality SFT dataset (Honey-Data-15M), a data curation pipeline (HoneyPipe), and a competitive 8B MLLM (Bee-8B), to advance fully open MLLMs.

Yi Zhang, Bolin Ni, Xin-Sheng Chen +752510.13795

Data Curation & Synthetic DataMultimodal ModelsOpen-Source Models & Weights

Oct 13, 2025

OPPOerOct 13, 2025

AndesVL Technical Report: An Efficient Mobile-side Multimodal Large Language Model

The paper introduces AndesVL, a suite of mobile-side Multimodal Large Language Models (MLLMs) ranging from 0.6B to 4B parameters, built upon Qwen3's LLM and various visual encoders, designed to address the limitations of deploying large cloud-based MLLMs on edge devices. AndesVL achieves competitive performance on diverse benchmarks, including text-rich image understanding and VQA, compared to similar-scale models. The authors also present a 1+N LoRA architecture and Quantization-Aware LoRA Fine-Tuning (QALFT) framework, along with optimizations like a cache eviction algorithm (OKV), speculative decoding, and compression, to enhance deployment efficiency on mobile devices, demonstrating significant speedups and memory reduction.

Introduces AndesVL, a suite of mobile-optimized MLLMs, along with a novel quantization-aware LoRA fine-tuning framework and memory optimization techniques, enabling efficient deployment and inference on edge devices.

Zhiwei Jin, Xiaohui Song, Nan Wang +362510.11496

Multimodal ModelsInference & QuantizationOpen-Source Models & Weights

Oct 8, 2025

Vocabulary embeddings organize linguistic structure early in language model training

This paper investigates the evolution of vocabulary embedding geometry in LLMs during training by correlating input and output embeddings of Pythia 12B and OLMo 7B with semantic, syntactic, and frequency-based metrics using representational similarity analysis. The study reveals that vocabulary embedding geometry rapidly aligns with semantic and syntactic features early in training. Furthermore, high-frequency and function words converge faster than low-frequency words, which retain initial bias.

Demonstrates that linguistic structure emerges rapidly in vocabulary embeddings during LLM training, with distinct convergence rates based on word frequency and function.

Isabel Papadimitriou, Jacob Prince2510.07613

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic InterpOpen-Source Models & WeightsNatural Language Processing

Oct 1, 2025

Apriel-1.5-15b-Thinker

Apriel-1.5-15B-Thinker, a 15B parameter multimodal model, achieves competitive performance through a three-stage training methodology involving depth upscaling, staged continual pre-training with synthetic data for enhanced visual reasoning, and high-quality text-only supervised fine-tuning with reasoning traces. The model attains a score of 52 on the Artificial Analysis Intelligence Index, matching DeepSeek-R1-0528, and performs comparably to Gemini-2.5-Flash and Claude Sonnet-3.7 on image benchmarks, demonstrating that targeted training can bridge capability gaps without relying on massive scale or reinforcement learning. This work highlights the effectiveness of data-centric continual pre-training for multimodal reasoning, particularly for organizations with limited computational resources.

Demonstrates that a carefully designed, data-centric continual pre-training approach, including depth upscaling and targeted synthetic data generation, can enable a 15B parameter model to achieve frontier-level multimodal reasoning performance competitive with much larger models.

Shruthan Radhakrishna, Aman Tiwari, Aanjaneya Shukla +212510.01141

Multimodal ModelsReasoning & Chain-of-ThoughtOpen-Source Models & Weights

Sep 29, 2025

A Cartography of Open Collaboration in Open Source AI: Mapping Practices, Motivations, and Governance in 14 Open Large Language Model Projects

This paper investigates the collaborative practices in open large language model (LLM) development by conducting semi-structured interviews with developers from 14 open LLM projects. It identifies that collaboration extends beyond the models themselves to include datasets, benchmarks, and compute partnerships, and that developers are driven by diverse motivations, including democratizing AI and promoting open science. The study also reveals five distinct organizational models employed by open LLM projects, varying in centralization and community engagement.

Systematizes the landscape of open LLM development by characterizing collaboration types, developer motivations, and organizational models across a diverse set of open LLM projects.

Johan Linåker, Cailean Osborne, Jennifer Ding +12509.25397

Open-Source Models & WeightsConstitutional AI & AI EthicsNatural Language Processing

Sep 29, 2025

MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes

The paper investigates the data requirements for reasoning in sub-billion parameter language models, challenging the assumption that massive datasets (>10T tokens) are necessary. They demonstrate that by carefully curating and resampling open-source datasets to ~2T tokens, strong reasoning abilities can emerge with significantly less data. The resulting MobileLLM-R1 models achieve state-of-the-art performance among open-source sub-billion parameter models, even surpassing larger models trained on much larger datasets.

Demonstrates that strong reasoning capabilities can emerge in sub-billion parameter language models with significantly less data than previously believed by carefully curating and resampling open-source datasets.

Changsheng Zhao, Ernie Chang, Zechun Liu +82509.24945

Reasoning & Chain-of-ThoughtScaling Laws & Emergent AbilitiesOpen-Source Models & Weights

Sep 26, 2025

OpenAI's GPT-OSS-20B Model and Safety Alignment Issues in a Low-Resource Language

The paper red-teams OpenAI's GPT-OSS-20B model in Hausa, a low-resource language, to evaluate its safety alignment. It demonstrates that minimal prompting can induce the model to generate harmful, culturally insensitive, and factually inaccurate content, particularly when using polite language that exploits reward hacking. The study reveals critical vulnerabilities, including the model's false assumptions about the safety of common toxins and its inability to distinguish between raw and processed foods, highlighting the need for improved safety tuning in low-resource languages.

Demonstrates that OpenAI's GPT-OSS-20B model exhibits significant safety alignment failures and biases when used in Hausa, a low-resource language, due to insufficient safety tuning.

Isa Inuwa-Dutse2510.01266

Red-Teaming & Adversarial RobustnessConstitutional AI & AI EthicsOpen-Source Models & Weights

Sep 10, 2025

Improving LLM Safety and Helpfulness using SFT and DPO: A Study on OPT-350M

This paper investigates the impact of Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and a combined SFT+DPO approach on the safety and helpfulness of the OPT-350M language model using the Anthropic Helpful-Harmless RLHF dataset. The study introduces three reward-model-derived metrics—Harmlessness Rate (HmR), Helpfulness Rate (HpR), and Combined Alignment Score (CAS)—to evaluate the models. Results indicate that the combined SFT+DPO model achieves the best performance across all alignment metrics, surpassing individual SFT and DPO models.

Demonstrates that combining SFT and DPO yields superior safety and helpfulness alignment compared to using either technique alone for the OPT-350M model.

P. Pant2509.09055

RLHF & Preference LearningEval Frameworks & BenchmarksOpen-Source Models & Weights

Sep 1, 2025

Dream-Coder 7B: An Open Diffusion Language Model for Code

The paper introduces Dream-Coder 7B, a discrete diffusion language model for code generation capable of any-order generation, adapting its decoding strategy based on the coding task. They convert a pretrained autoregressive model into a diffusion model using a continuous-time weighted cross-entropy objective and address padding issues with random truncation and a padding penalty during supervised fine-tuning. The model is further refined using reinforcement learning with verifiable rewards on a curated prompt set, achieving 21.4\% pass@1 on LiveCodeBench.

Introduces a novel approach to code generation by adapting a pretrained autoregressive model into a discrete diffusion model capable of any-order generation.

Zhihui Xie, Jiacheng Ye, Lin Zheng +8332509.01142

Code Generation & Program SynthesisArchitecture Design (Transformers, SSMs, MoE)Open-Source Models & Weights

Lattice is designed for desktop

Open-Source Models & Weights

Keywords

Top Labs in This Topic

Recent Papers

Lattice is designed for desktop

Open-Source Models & Weights

Keywords

Top Labs in This Topic

Recent Papers

Search