Search papers, labs, and topics across Lattice.
2
0
3
7
A more robust evaluation framework for jailbreak methods, with a curated harmful question dataset, detailed case-by-case evaluation guidelines, and a scoring system equipped with these guidelines, demonstrates its ability to provide more fair and stable evaluation.
LLM agents can actually get *better* at coding when you strip away the unnecessary fluff in their skills, achieving a "less-is-more" effect.