Search papers, labs, and topics across Lattice.
Nanyang Normal University
3
0
7
LLMs armed with RAG can leap from 35% to 100% accuracy on complex toxicology reasoning tasks, suggesting a potent recipe for reliable scientific knowledge processing.
Current LLM agent safety benchmarks are missing over 20% of unsafe behaviors, even after agents pass the benchmark.
Coding agents struggle to maintain faithfulness to specifications that emerge gradually over long interactions, losing significant implementation fidelity compared to single-shot specifications.