Search papers, labs, and topics across Lattice.
MBZUAI
9
17
12
48
Arabic LLMs can speak the language of finance, but they often fail to reason about it, especially when it comes to causality and generation.
Forget tedious multi-turn dialogues: Co-FactChecker's "trace-editing" lets human experts directly shape an LLM's reasoning process, leading to higher quality claim verification.
LLMs may nail the Text-to-SQL execution accuracy, but SQLStructEval reveals they're often generating wildly different query structures for the same question, raising serious reliability concerns.
Deferring to a larger LLM only when a smaller LLM is uncertain can match the performance of the larger model alone, while slashing inference costs.
LLMs can achieve more consistent and reliable cross-jurisdictional financial reporting by acting as constrained verifiers within a structured, agentic workflow, rather than as free-form generators.
Detecting AI-generated code is harder than you think: even state-of-the-art detectors fail to reliably identify machine-written code, especially when faced with distribution shifts or adversarial attacks.
Training VLMs on a unified, multilingual, multitask meme dataset reveals that robust meme understanding requires multimodal training and is highly sensitive to dataset-specific overfitting.
A new open-source Hindi LLM, Nanda, outperforms existing models of similar scale by strategically balancing Hindi and English training data.
LLM360 K2 unveils the black box of large language model training, offering a 65B parameter model that beats LLaMA-65B while using fewer resources, all under a fully transparent, open-source framework.