Jiajun Wang

Forget dumb context stuffing: LongSeeker shows that strategically *editing* its own memory lets agents solve web search tasks with far greater reliability.

Yijun Lu, Rui Ye, Yuwen Du +3

Reasoning & Chain-of-Thought Recommendation & Information Retrieval Tool Use & Agents

Apr 13, 2026

Daoli Xu +22Apr 13, 2026·also Friedrich-Alexander-Universität, HIT, HKU, Micro-Intelligence +3

LoViF 2026 Challenge on Human-oriented Semantic Image Quality Assessment: Methods and Results

A new dataset, SeIQA, offers a benchmark to evaluate how humans perceive semantic loss in degraded images, pushing beyond traditional quality metrics.

Daoli Xu, Guoqiang Xiang, Chengyu Zhuang +20

Computer Vision Eval Frameworks & Benchmarks Multimodal Models

Mar 2, 2026

OpenAIMar 2, 2026·also Michael Pokorny, SJTU

MIST-RL: Mutation-based Incremental Suite Testing via Reinforcement Learning

LLMs can verify code more effectively by focusing on test case utility rather than sheer quantity, achieving a 28.5% higher mutation score with 19.3% fewer tests.

Sicheng Zhu, Jiajun Wang, Jiajun Wang +2

Code Generation & Program Synthesis Eval Frameworks & Benchmarks RLHF & Preference Learning