Search papers, labs, and topics across Lattice.

Leading Asian AI research university. Active across NLP, computer vision, and multimodal learning.
100
2
0
Contextual grounding in defect classification can elevate accuracy to over 98%, transforming a traditionally ambiguous task into a precise science.
MetaSeq achieves a 45% improvement in response accuracy for acoustic metamaterial design by treating structures as sequences, revolutionizing how we approach inverse design in this field.
Muon outperforms Adam and SGD by yielding features that are not only more robust but also transfer more effectively across tasks.
A novel unified energy framework that corrects distribution shifts in diffusion models, outperforming traditional auto-regressive methods.
Multiplex semantic networks reveal that creativity is not a one-dimensional trait but a complex interplay of diverse cognitive tasks, with significant implications for how we assess and understand creative potential.
High-fidelity simulations in Real-IKEA reveal that robust manipulation strategies can be learned by prioritizing mechanical advantage over fragile friction-based methods.
Gaze behaviors learned through reinforcement alone can lead to unprecedented humanoid locomotion capabilities, including a record 1.2m gap traversal.
SkillComposer enables language models to self-evolve skills in real-time, achieving up to +4.5 improvements on agent tasks compared to larger models.
Vortex achieves up to 4.7 times higher throughput for large language models, revolutionizing how researchers can prototype and evaluate sparse attention algorithms.
A quantum algorithm can uncover rare events with unprecedented efficiency, achieving a quadratic speedup in sampling that classical methods cannot match.
Achieving LPN hardness with inverse-polynomial noise rates opens the door to new public-key encryption schemes that were once thought impossible.
Cosine alignment in vision-language models may mislead researchers, as it correlates negatively with accuracy, revealing that latents are often bypassed in reasoning.
Even well-reviewed code can harbor subtle bugs, as demonstrated by critical flaws in Go's extended GCD implementation that compromise RSA key generation.
Base LLMs can predict their own output quality with surprising accuracy, revealing a latent self-evaluation capability that can be harnessed with minimal data.
Muon outperforms Adam by leveraging lower Normalized Directional Sharpness, revealing a critical geometric insight into optimizer efficiency.
EvoNote outperforms human-generated health notes 89.6% of the time while slashing correction production time from hours to minutes.
UCE enables LLM agents to evolve their knowledge dynamically, achieving a staggering 96.3% success rate in complex tasks by leveraging a structured experience library.
Achieving a mean recall of 98.85%, EEG-FuseFormer sets a new benchmark for seizure onset prediction by effectively leveraging transformer-based feature fusion.
Transition-level edge information can dramatically enhance routing model performance, cutting ATSP-1000 gaps by over a third.
Noise in multi-behavior recommendation can be effectively mitigated through a novel spectral filtering approach that enhances representation purity and reliability.
RDMF outperforms traditional multimodal fusion methods by leveraging reaction-diffusion processes to dynamically align video and text, revealing emergent patterns that enhance moment retrieval.
Integrating raster and vector data could revolutionize geospatial AI, unlocking richer insights from Earth observation data.
Prediction accuracy alone can mask critical failures in operator fidelity, as our spectral audit uncovers hidden instabilities and inconsistencies in neural operator networks.
Superquadric representations enable robots to achieve unprecedented accuracy in object-level reconstruction and navigation, outperforming state-of-the-art methods in cluttered environments.
SCAPO achieves accurate articulated pose estimation from a single 3D observation without any ground-truth supervision, setting a new benchmark in self-supervised learning for articulated objects.
Achieving globally aligned 4D reconstructions, TROPHIES outperforms existing methods by integrating human dynamics with scene geometry in a single framework.
Segment-level explainable forensics can drastically enhance our ability to detect and interpret localized manipulations in lengthy AI-generated videos.
Achieving photorealistic 3D human avatars from a single image in under a second could revolutionize virtual reality and gaming applications.
Merging RL experts effectively requires balancing sharp, informative signals with stable, dispersed components, a challenge that ResMerge addresses with innovative spectral techniques.
Achieving a 2.31x speedup in GNN training on heterogeneous CPU-NPU platforms could redefine efficiency benchmarks in graph learning.
Over 54% of actions taken by leading LLM coding agents in realistic projects result in harmful safety violations, exposing critical gaps in current safety alignment.
FineVerify boosts GPT-5-mini's accuracy by 8.2 points with just four sampled trajectories, outperforming standard scaling methods.
dMoE slashes the memory footprint of Mixture-of-Experts Diffusion LLMs by up to 80% without sacrificing performance, finally making them practical.
Forget domestic data – cross-market signals hidden in annual reports can significantly boost return prediction, especially when transferring insights from the US to Japan.
Uncover hidden biases in ranking systems: this new method reverse-engineers group-specific bonuses that influence candidate rankings even when sensitive features are unobserved.
Forget short-sighted compression: Future Forcing anticipates future query needs in autoregressive video generation, boosting long-horizon consistency by up to 1.49 on VBench-Long without any training.
LLM-based recommendation systems can now dynamically adjust the granularity of knowledge graph retrieval, boosting performance by adapting to the complexity of user queries.
Flow-based imitation learning can be significantly improved by distilling both rewards and actions on-policy, enabling more robust and generalizable policies, especially with limited or noisy demonstrations.
LLM agents trained with simulated user and tool noise not only become more robust in messy real-world environments, but also surprisingly improve on clean, idealized benchmarks.
The best LLM to answer a question isn't always the best LLM to *teach* the answer, and matching the "difficulty" of the explanation to the student's current abilities yields better learning.
Current LLM agents still struggle to infer and leverage user preferences from fragmented, real-world interactions, revealing a substantial gap between their capabilities and the demands of personalized decision-making.
RotMoLE's rotational gating unlocks more representational power from low-rank MoE architectures, even when expert diversity is limited.
Text watermarks can now survive even aggressive paragraph-level paraphrasing, thanks to a new self-anchoring technique that breaks the robustness-quality tradeoff.
LLMs can now diagnose diseases with the transparency of formal logic, offering verifiable reasoning chains that clinicians can audit and refine.
Free exploration in multi-armed bandits can lead to sharp phase transitions in accumulated regret, offering significant savings compared to standard regret minimization.
Weak-to-strong reward models can ace the test but still fail in the real world, revealing a hidden brittleness in current preference learning approaches.
Current AI models for liver fibrosis staging can match expert radiologists in some settings, but real-world clinical deployment is still hampered by data heterogeneity and label imbalance.
Forget red-teaming, POLARIS automatically turns safety policies into attack strategies, finding more LLM vulnerabilities with verifiable traceability.
Diffusion LLMs can achieve up to 6.1x higher throughput than autoregressive models by dynamically adjusting decoding granularity based on real-time load, a feat unattainable with fixed-block approaches.
Watermarking agent memories is now possible without performance degradation or reliance on logs, enabling snapshot-only attribution even after memory migration or leakage.
Robots can now perform complex, contact-rich tasks with significantly smoother and more continuous motions by learning high-frequency action chunks in a latent space.
Achieve spatially precise control in FPS world models by injecting actions locally, without segmentation labels, enabling zero-shot generalization across games.
Robust UAV visual servoing is now possible even with intermittent visual data and near target constraint saturation, thanks to a novel TC-MPC framework.
LLMs already know more about diverse cultures than you think; this paper unlocks that knowledge by prompting in multiple languages and aligning the responses.
Forget clunky pipelines: this multi-agent system crafts compelling short dramas from a single sentence, nailing narrative pacing and spatial consistency in ways LLMs alone can't.
Reference patches, typically discarded in software-engineering agent training, can be distilled into latent process graphs to guide trajectory curation, leading to more effective and efficient learning.
Ophthalmic VQA models can be made more accurate and transparent by explicitly grounding them in spatially-localized lesion evidence, a crucial step towards clinical interpretability.
Adaptive evaluation exposes a substantial vulnerability gap, revealing that existing defenses may underestimate the capabilities of distillation attacks.
LLMs can track depression severity from counseling transcripts more effectively by combining clinical signals with turn-level embeddings and symptom-specific predictors, even when dealing with limited data or longitudinal context.
Compressing KV caches with multi-granularity representations boosts long video QA accuracy without sacrificing memory or speed.
Forget expensive data curation: a simple, training-free entropy metric lets you train LLMs on just 20% of your reasoning data without sacrificing performance.
Forget scalar rewards: GenEvolve distills structured visual experiences from successful and failed generation trajectories, enabling token-level supervision for self-improving image generation agents.
Naively quantizing autoregressive video diffusion models tanks performance due to exponentially increasing error accumulation across frames and heterogeneous outlier patterns, but Q-ARVD solves it.
Achieve series-level cinematic remaking with Soap2Soap, a multi-agent framework that maintains narrative fidelity and character consistency across hundreds of shots, outperforming commercial video generation APIs.
LLM agents can now maintain long-term memories with 6x higher throughput thanks to a novel hierarchical temporal indexing approach that avoids costly full-state rewrites.
LLMs can now generate high-performance CUDA attention kernels that outperform hand-optimized code, thanks to a novel lift-transfer-lower approach that leverages expert knowledge.
By embedding whole-slide images in a hybrid hyperbolic-Euclidean space, BatMIL unlocks superior classification performance compared to traditional Euclidean-only methods, revealing the importance of geometric awareness in capturing complex tissue organization.
Open-sourcing a VLA model that beats closed-source giants on embodied reasoning tasks could finally make real-world robot deployment practical.
VideoLLMs leak training data: a novel black-box attack recovers membership with surprisingly high accuracy (AUC=0.68) by probing generation brittleness across temperatures.
Forget imperfect inversions – ResetEdit lets you edit generated images with the same latent that created them, unlocking unprecedented precision and control.
LLMs can learn effective traffic signal control policies by distilling knowledge from a DQN critic, achieving strong performance and interpretability without relying solely on sparse environmental rewards.
You can detect prompt injection attacks in screenshot-based web agents with 8x speedup and no extra memory by looking for telltale visual "smoothness" and reversed text polarity.
The fragmented field of world modeling can now be unified under a "levels x laws" taxonomy, revealing critical gaps in autonomous model revision and decision-centric evaluation.
Stop writing incomplete tests: TestGeneralizer can automatically expand your existing tests to cover 31% more scenarios and catch more bugs.
Pocket-sized VLA models can now achieve state-of-the-art robot manipulation performance by pre-training on a curated multimodal dataset and injecting manipulation-relevant representations into the action space.
A low-cost, compact sensor provides continuous vision-tactile feedback, enabling robots to "see" and "feel" their way through dexterous manipulation tasks.
Contact-aware reconstruction transforms how we achieve realistic human-scene interactions in 3D environments, correcting artifacts that have plagued previous methods.
LLM agents suffer from the same Actor-Observer Asymmetry that plagues humans, leading them to make inconsistent judgments about their own and others' failures.
Uncover misleading half-truths by pitting a Politician agent against a Scientist agent in a debate moderated by a Judge, revealing what's left unsaid.
Multimodal LLMs struggle with multi-digit multiplication, with accuracy plummeting as arithmetic complexity increases, revealing a critical gap in computational capabilities.
Multi-agent LLM systems for idea generation can backfire, with smarter models and more communication leading to *less* diverse ideas due to structural coupling.
QuantumQA reveals that integrating verifiable, rule-based feedback can dramatically enhance LLM performance in scientific reasoning, achieving results on par with larger proprietary models.
FLASH enables robots to master complex deformable manipulation tasks in minutes using only synthetic data, eliminating the need for labor-intensive real-world training.
LLMs that ace shortest-path planning on small maps completely fall apart when asked to plan routes just a little bit longer.
Forget retraining: this anomaly detection framework adapts to evolving data streams on-the-fly using a hypernetwork to shift parameters, achieving state-of-the-art performance.
Forget brute-force retrieval: hierarchical navigation lets LLMs outperform RAG on enterprise QA by explicitly reasoning about the structure of knowledge.
LLMs can bridge the gap between heterogeneous blockchain data to detect fraud with significantly improved accuracy, even in zero-shot cross-chain scenarios.
LLMs can now predict project-wide code edits with significantly improved accuracy and efficiency by intelligently interleaving neural prediction with existing IDE tools.
Multi-object tracking gets a boost: HyperSSM leverages collaborative reasoning to maintain robust object trajectories, even when visual cues disappear.
The landscape of deep learning optimizers is vast, but this paper cuts through the noise to reveal the fundamental trade-offs and promising future directions for efficient, robust, and trustworthy training.
Forget hand-coded strategies: METRO uses LLMs to automatically learn dialogue strategies from expert transcripts, achieving state-of-the-art results in non-collaborative dialogue.
Reduce testing costs without compromising predictive accuracy by learning cost-optimal sequential decision policies from retrospective data, even with informative missingness.
LLMs often fail to reconcile conflicting information from text and knowledge graphs, instead latching onto a single source based on prompting, highlighting a critical vulnerability in RAG systems.
Fine-tuning VLMs for regional relevance doesn't have to sacrifice global performance: a simple data filtering and model merging technique boosts cultural relevance by 5-15% while barely impacting overall accuracy.
LLMs struggle to maintain context and avoid distraction when reasoning about causality, leading to a significant performance drop as tasks increase in complexity.
Customer service chatbots can be transformed from reactive support tools into proactive business intelligence engines by strategically probing users for information.
Forget complex memory architectures: simple retrieval and generation, when carefully tuned for signal density, can outperform sophisticated methods in conversational agents.
VLMs can regain lost temporal reasoning abilities without retraining, simply by strategically merging the right layers from their text-only LLM backbone.
Text-centric agentic search is out: Deep-Reporter shows how to build multimodal agents that leverage both text and visuals for grounded long-form generation.
DMax unlocks faster diffusion language model decoding by reframing the process as iterative self-correction in embedding space, achieving up to 2x speedup without sacrificing accuracy.