Search papers, labs, and topics across Lattice.
Bilkent University
2
1
3
2
Evaluating LLM-powered software engineering tools is fundamentally broken, as traditional metrics fail to capture the nuanced, non-deterministic nature of their outputs.
Automated evaluations of code review bots disagree with developer feedback nearly 40% of the time, revealing that developer actions are driven by workflow pressures, not just code quality.