Search papers, labs, and topics across Lattice.
21
0
19
Stop blind drawing: giving MLLMs eyes to see their work-in-progress boosts SVG generation performance.
Encrypting spatial data no longer means sacrificing query privacy: BRASP achieves both access and search pattern hiding for Boolean range queries.
Achieve full-attention accuracy with 10x operator speedup and 4.7x throughput improvement in long-context LLM inference by overlapping KV cache transfers with computation.
RL's success in boosting VLM reasoning hides a critical flaw: it crushes the model's ability to explore diverse solutions, leading to premature convergence and hindering scalability.
Ditch the diffusion vs. autoregressive debate: this VLA framework uses diffusion to *draft* actions and an autoregressive model to *verify* them, boosting real-world success by nearly 20%.
By explicitly modeling and predicting non-stationary factors in both time and frequency domains, TimeAPN significantly boosts the accuracy of long-term time series forecasting, outperforming existing normalization techniques.
Current LMMs can't reliably turn complex images into code, failing to preserve structural integrity even in relatively simple scenarios, as shown by the new Omni-I2C benchmark.
By explicitly modeling visibility, VSDiffusion generates more geometrically plausible and realistic shadows, outperforming prior methods on a challenging image composition task.
Spotting coordinated fake reviewers just got easier: a new graph learning method boosts detection accuracy by adaptively weighing network diversity and similarity.
Forget satellite-specific hacks: FoundPS achieves state-of-the-art pansharpening performance with a single model robust to diverse sensors and scenes.
Forget training separate models for every remote sensing modality pair: Any2Any learns a single latent space for unified translation, even generalizing to unseen modality combinations.
You can cut MLLM hallucinations in remote sensing tasks without any training by cleverly exploiting the model's own attention mechanisms to focus on relevant image regions.
By pruning and quantizing the KV cache, XStreamVGGT achieves a remarkable 4.42x memory reduction and 5.48x speedup in streaming 3D reconstruction without sacrificing performance.
Achieve superior LLM pruning performance by first nudging models toward sparsity-friendliness *before* applying any weight removal.
PIME leverages prototype-guided Monte Carlo Tree Search to extract compact, neuroscientifically-validated brain subnetworks predictive of disorder, outperforming standard deep learning approaches in both accuracy and interpretability.
Robots can now adapt to dynamic environments with minimal human involvement by learning from a world model and force-torque feedback, achieving state-of-the-art manipulation performance.
Individuals can now demand a tamper-proof, verifiable record of every action taken by AI agents operating on their own devices, thanks to a new sovereignty kernel.
Forget global coordinates: EgoPush lets mobile robots rearrange multiple objects using only an egocentric camera and learned object relationships, even in cluttered environments.
By ditching node alignment, this random-walk method cracks the code for classifying highly variable brain networks, boosting accuracy in distinguishing Alzheimer's from Lewy Body Dementia.
LLM code copilots are put to the test with SecCodeBench-V2, a new benchmark revealing their security vulnerabilities across 22 CWE categories and five programming languages.
MLLMs struggle to effectively zoom into relevant details in ultra-high-resolution remote sensing imagery, but a new staged training framework can teach them when and where to focus for substantial accuracy gains.