Iryna Gurevych

Contextual prompts can significantly boost stance detection accuracy, but only if the right type of context is chosen—LLM-generated descriptions shine while user metadata may backfire.

Tilman Beck, Shakib Yazdani, Simon Kruschinski +2

Natural Language Processing

Jun 4, 2026·also Independent, National Research Center for Applied Cybersecurity ATHENE

Many Circuits, One Mechanism: Input Variation and Evaluation Granularity in Circuit Discovery

Structural differences in circuits may be misleading, as they often reflect interchangeable mechanisms rather than distinct functionalities.

Alireza Bayat Makou, Jingcheng Niu, Subhabrata Dutta +1

Interpretability & Mechanistic Interp

May 1, 2026

Indraneil Paul +3May 1, 2026·also National Research Center for Applied Cybersecurity ATHENE, TU Darmstadt

Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

Current code reward models are myopic, mostly rewarding functional correctness, but Themis-RM learns to score code across multiple criteria and languages, opening the door to more nuanced and useful code generation.

Indraneil Paul, Glavavs Glavas, Glavaš Glavas +1

Code Generation & Program Synthesis Eval Frameworks & Benchmarks RLHF & Preference Learning

Apr 15, 2026

Apr 15, 2026·also National Research Center for Applied Cybersecurity ATHENE

Co-FactChecker: A Framework for Human-AI Collaborative Claim Verification Using Large Reasoning Models

Forget tedious multi-turn dialogues: Co-FactChecker's "trace-editing" lets human experts directly shape an LLM's reasoning process, leading to higher quality claim verification.

Dhruv Sahnan, Subhabrata Dutta, Tanmoy Chakraborty +2

Natural Language Processing Reasoning & Chain-of-Thought Tool Use & Agents

Mar 17, 2026

Aniket Pramanick +2Mar 17, 2026·also National Research Center for Applied Cybersecurity ATHENE, TU Darmstadt

ClaimFlow: Tracing the Evolution of Scientific Claims in NLP

Most scientific claims in NLP die in obscurity, and even the survivors are more likely to be subtly reshaped than outright validated or debunked.

Aniket Pramanick, Saif M. Mohammad, Iryna Gurevych

Eval Frameworks & Benchmarks Natural Language Processing Scientific Discovery & Drug Design

Search

Iryna Gurevych

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (6)