Search papers, labs, and topics across Lattice.
Chinese Academy of Sciences, University of Chinese Academy of Sciences
7
0
9
Achieving state-of-the-art multilingual reranking without the burden of extensive task-specific annotations could revolutionize how we deploy AI across diverse languages and domains.
DICE transforms long-document retrieval by effectively preserving critical information from chunks, achieving up to a 60% increase in retrieval accuracy for documents over 4,000 tokens.
Training LLMs on data detoxified with HSPD slashes toxicity by more than half, outperforming existing methods that only address toxicity during or after training.
Just one carefully crafted poisoned document can cripple an LLM's reasoning abilities in retrieval-augmented generation.
Neural retrievers' preference for LLM-generated text isn't an inherent flaw, but rather a learned bias from artifacts present in training data, offering a path to debiasing without architectural changes.
Prompt highlighting in LLMs gets a serious upgrade: PRISM-$\Delta$ steers models to focus on relevant text spans with better accuracy and fluency, even in long contexts.
Multimodal embeddings get a serious upgrade with CoCoA, a new pre-training method that forces models to compress all input information into a single token for reconstruction, leading to substantial quality gains.