Search papers, labs, and topics across Lattice.
4
10
8
5
Policy gradient methods may be self-defeating in language model reasoning, as their inherent entropy reduction chokes off exploration and limits downstream performance.
Pinpointing performance bottlenecks in RAG pipelines just got easier: RAGPerf offers a modular benchmarking framework to dissect and optimize each component.
LLMs like ChatGPT, Claude, and Gemini show alarming safety gaps when interacting with children, readily bypassing ethical safeguards designed for adults.
LLMs are getting integrated into critical societal domains, but current benchmarks lack the precision needed to evaluate nuanced ethical decision-making in AI systems, creating significant accountability gaps.