Preslav Nakov

Forget tedious multi-turn dialogues: Co-FactChecker's "trace-editing" lets human experts directly shape an LLM's reasoning process, leading to higher quality claim verification.

Dhruv Sahnan, Subhabrata Dutta, Tanmoy Chakraborty +2

Natural Language Processing Reasoning & Chain-of-Thought Tool Use & Agents

Apr 8, 2026

Yixi Zhou +52w ago·also MBZUAI

SQLStructEval: Structural Evaluation of LLM Text-to-SQL Generation

LLMs may nail the Text-to-SQL execution accuracy, but SQLStructEval reveals they're often generating wildly different query structures for the same question, raising serious reliability concerns.

Yixi Zhou, Zhiqiao Guo, Haipeng Zhang +3

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Natural Language Processing

2w ago·also Mohamed, National University of Science and Technology, Russian Academy of Sciences, Zayed University of Artificial

ReDAct: Uncertainty-Aware Deferral for LLM Agents

Deferring to a larger LLM only when a smaller LLM is uncertain can match the performance of the larger model alone, while slashing inference costs.

Dzianis Piatrashyn, Nikita Kotelevskii, Kirill Grishchenkov +6

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Tool Use & Agents

Apr 7, 2026

Mingzi Song +182w ago·also MBZUAI, SJTU

FinReporting: An Agentic Workflow for Localized Reporting of Cross-Jurisdiction Financial Disclosures

LLMs can achieve more consistent and reliable cross-jurisdictional financial reporting by acting as constrained verifiers within a structured, agentic workflow, rather than as free-form generators.

Mingzi Song, Yankai Chen, Shaobo Wang +16

Natural Language Processing Tool Use & Agents

Feb 2, 2026

AICD Bench: A Challenging Benchmark for AI-Generated Code Detection

Detecting AI-generated code is harder than you think: even state-of-the-art detectors fail to reliably identify machine-written code, especially when faced with distribution shifts or adversarial attacks.

Daniil Orel, Dilshod Azizov, Indraneil Paul +2

Jan 18, 2026

Jan 18, 2026·also MBZUAI, University of Padua

MemeLens: Multilingual Multitask VLMs for Memes

Training VLMs on a unified, multilingual, multitask meme dataset reveals that robust meme understanding requires multimodal training and is highly sensitive to dataset-specific overfitting.

Ali Ezzat Shahroor, Mohamed Bayan Kmainasi, A. Hasnat +4

Multimodal Models Reasoning & Chain-of-Thought

Apr 8, 2025

Apr 8, 2025·also CMU ML, Covenant AI

Llama-3-Nanda-10B-Chat: An Open Generative Large Language Model for Hindi

A new open-source Hindi LLM, Nanda, outperforms existing models of similar scale by strategically balancing Hindi and English training data.

Monojit Choudhury, Shivam Chauhan, Rocktim Jyoti Das +277

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Open-Source Models & Weights

Jan 13, 2025

CMU MLJan 13, 2025·also MBZUAI, Mohamed bin Zayed University, Petuum, USC

LLM360 K2: Building a 65B 360-Open-Source Large Language Model from Scratch

LLM360 K2 unveils the black box of large language model training, offering a 65B parameter model that beats LLaMA-65B while using fewer resources, all under a fully transparent, open-source framework.

Zhengzhong Liu, Bowen Tan, Hongyi Wang +229

Distributed Systems & Hardware Open-Source Models & Weights Scaling Laws & Emergent Abilities+1

Search

Preslav Nakov

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (9)