Dhruv Kumar

BITS Pilani, India

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Eval Frameworks & Benchmarks (5)Natural Language Processing (2)Reasoning & Chain-of-Thought (2)Constitutional AI & AI Ethics (1)

Frequent co-authors

Manan Gupta (2)Murari Mandal (2)Inderjeet Nair (1)Lu Wang (1)

Papers (5)

Apr 16, 2026

BITS Pilani3d ago

Context Over Content: Exposing Evaluation Faking in Automated Judges

LLM judges can be subtly manipulated by framing the consequences of their decisions, leading to biased evaluations even when the content being judged remains constant.

Manan Gupta, Inderjeet Nair, Lu Wang +1

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness

BITS Pilani3d ago

Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations

LLM judges are far less reliable on individual examples than aggregate metrics suggest: up to 67% of documents show judgment inconsistencies, and some criteria like fluency are essentially unjudgeable.

Manan Gupta, Dhruv Kumar

Eval Frameworks & Benchmarks Natural Language Processing

Apr 8, 2026

KIIT Bhubaneshwar1w ago·also Birla Institute of Technology and Science, BITS Pilani, Department of Computer Science

Beyond Accuracy: Diagnosing Algebraic Reasoning Failures in LLMs Across Nine Complexity Dimensions

LLMs hit a hard wall in algebraic reasoning, choking on problems with just 20-30 parallel branches regardless of model size, suggesting an architectural bottleneck, not just a capacity issue.

Parth Patil, Parth V. Patil, Dhruv Kumar +4

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought

Apr 7, 2026

Ojas Jain +11w ago·also BITS Pilani

LUDOBENCH: Evaluating LLM Behavioural Decision-Making Through Spot-Based Board Game Scenarios in Ludo

LLMs struggle to master even simple board games like Ludo, agreeing with optimal game-theory strategies less than half the time and exhibiting inconsistent behavior based on prompt framing.

Ojas Jain, Dhruv Kumar

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought World Models & Planning

Apr 2, 2026

Preetham Sivalingam +32w ago·also BITS Pilani

LLM-as-a-Judge for Time Series Explanations

LLMs can reliably judge the correctness of time series explanations, even when their own explanations are wrong, opening the door to reference-free evaluation.

Preetham Sivalingam, Murari Mandal, Saurabh Deshpande +1

Eval Frameworks & Benchmarks Interpretability & Mechanistic Interp Natural Language Processing

Search

Dhruv Kumar

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (5)