Shafiq Joty

Automatically generated Multi-Agent Systems are not only outperformed by Single-Agent Systems but also exhibit architectural inefficiencies that challenge the very foundations of multi-agent design principles.

Prathyusha Jwalapuram, Hehai Lin, Chuyuan Li +7

Distributed Systems & Hardware Eval Frameworks & Benchmarks

King Yeung Tsang +6Jun 11, 2026·also Salesforce AI

Reward Modeling for Multi-Agent Orchestration

OrchRM slashes training costs while boosting orchestration accuracy, proving that self-supervised reward modeling can revolutionize multi-agent coordination.

King Yeung Tsang, Vishal Venkataramani, Haizhou Shi +4

RLHF & Preference Learning

Apr 21, 2026

Apr 21, 2026·also UAlberta

Lost in Translation: Do LVLM Judges Generalize Across Languages?

LVLM judges, despite excelling in English, exhibit surprisingly inconsistent and unreliable behavior when evaluating content in other languages, revealing a critical blind spot in current alignment and evaluation pipelines.

Md Tahmid Rahman Laskar, Mohammed Saidul Islam, Mir Tafseer Nayeem +5

Eval Frameworks & Benchmarks Multimodal Models Natural Language Processing

Mar 16, 2026

Mar 16, 2026·also A*STAR

VIBEPASS: Can Vibe Coders Really Pass the Vibe Check?

LLMs can generate syntactically correct tests, but their ability to *reason* about code faults is surprisingly poor, hindering autonomous debugging.

Srijan Bansal, Fangkai Jiao, Yilun Zhou +3

Code Generation & Program Synthesis Eval Frameworks & Benchmarks Tool Use & Agents

Feb 23, 2026

Feb 23, 2026·also Salesforce AI, ZJU

SkillOrchestra: Learning to Route Agents via Skill Transfer

SkillOrchestra slashes the learning costs of AI agent orchestration by up to 700x while improving performance by explicitly modeling agent skills and costs, offering a more scalable and interpretable alternative to RL-based methods.

Jiayu Wang, Yifei Ming, Zixuan Ke +3

RLHF & Preference Learning Tool Use & Agents

Feb 18, 2026

References Improve LLM Alignment in Non-Verifiable Domains

Reference-guided LLM evaluators can boost alignment in non-verifiable domains, enabling self-improvement to rival reward model training.

Kejian Shi, Kejian Shi, Yixin Liu +6

Constitutional AI & AI Ethics Eval Frameworks & Benchmarks RLHF & Preference Learning

Search

Shafiq Joty

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (7)