Search papers, labs, and topics across Lattice.
KAIST
2
0
4
Even the best LLM judges miss cultural faux pas that are obvious to locals, achieving only 52% F1 score on a new benchmark.
LLMs can be made significantly more helpful and less cautious on sensitive topics by using fine-grained feedback that pinpoints specific errors in content, logic, and appropriateness.