Search papers, labs, and topics across Lattice.
14
0
13
Reinforcement learning boosts multimodal performance, raising task scores and creating unexpected synergies between image generation and editing.
Sparse visual prompts generated by LoRSP achieve robust adaptation with significantly fewer parameters, challenging the efficiency of traditional dense prompting methods.
Current VIP identification methods miss the forest for the trees, leading to "Temporal Importance Shift"—but a new model leveraging spatio-temporal cues and rationale generation closes the gap.
LLM search agents are often just verifying pre-existing knowledge, not truly searching, and a new benchmark shows their performance plummets when up-to-date information is required.
Seemingly harmless fine-tuning data can stealthily nudge LLMs toward unsafe behavior by subtly shifting model parameters in "danger-aligned" directions.
LLMs can be backdoored to "think well but answer wrong," even while generating seemingly correct reasoning traces, making attacks far harder to detect.
Forget training separate models for each pedestrian attribute dataset – a single Transformer can now handle RGB images, video sequences, and even event streams with comparable accuracy to specialized methods.
Social robots can now autonomously orchestrate complex tasks with improved efficiency and emotional alignment, thanks to a novel fast-slow thinking LLM framework.
GTokenLLMs suffer from a text-dominant bias, but RGLM offers a way to fix this by reconstructing graph information directly from the LLM's graph token outputs.
Achieve state-of-the-art low-field MRI enhancement by explicitly modeling and correcting the intensity distribution shift between low-field and high-field domains using a differentiable Sinkhorn optimal transport module within a diffusion framework.
A nested Mixture-of-Experts architecture lets neural operators pre-trained on diverse PDEs transfer more effectively to downstream tasks.
Event cameras can significantly boost the robustness of pre-trained OCR models for kilometer marker recognition in challenging metro environments, even under GNSS-denied conditions.
Unlock SOTA performance in long-horizon search tasks with REDSearcher, a framework that slashes the cost of training by strategically synthesizing complex tasks and boosting core LLM capabilities *before* reinforcement learning.
A single tokenizer, UniWeTok, now handles both high-fidelity image reconstruction and complex semantic understanding for multimodal LLMs, outperforming existing methods with far less training data.