Search papers, labs, and topics across Lattice.

Leading Asian AI research university. Active across NLP, computer vision, and multimodal learning.
32
914
29
Medical multi-agent systems can reason deeply, but fall apart when switching between medical specialties, highlighting a critical need for more robust architectures.
Tangible interaction with robots backfires for users with negative attitudes, who prefer a digitally mediated interface as a social buffer.
By closing the loop with explicit planning and feedback, SPIRAL overcomes the temporal drift and weak semantic grounding plaguing one-shot video generation models.
LLMs can generate better recommendations if they pause to verify their reasoning steps, rather than reasoning in one long chain.
Accelerate video generation by 45% without retraining, simply by pruning redundant latent patches and cleverly recovering attention scores.
Forget unimodal tasks—UniM throws down the gauntlet for truly unified multimodal AI, demanding models juggle any combination of text, image, audio, video, code, documents, and 3D inputs and outputs in a single, interleaved stream.
Even after removing names and other PII, LLMs still exhibit significant demographic biases in resume screening, favoring candidates based on subtle sociocultural markers like language and hobbies.
Unlock combinatorial generalization in dual-arm robots by disentangling single-arm skills, enabling reuse and boosting success rates from 0% to 51%.
Visual attention, including eye movements and attention switching, can be modeled as the result of rational decision-making under perceptual, memory, and time constraints, offering a unified computational account.
Finally, interpretable medical text embeddings that rival black-box models in performance, thanks to ontology-grounded question generation and a training-free approach.
Forget A/B testing: simulated readers can now optimize augmented reading interfaces in real-time, offering adaptive and explainable designs.
Slash gas costs for decentralized federated learning by using optimistic execution and validity proofs, scaling to 800 participants without compromising trust.
Code-generating LLMs may ace static benchmarks, but developers are actually *slower* when using them because they disrupt mental flow, highlighting the need for benchmarks that capture the temporal dynamics of coding.
A confidence-based gating mechanism lets a 14B parameter reward model outperform 70B parameter models, achieving a new accuracy-efficiency Pareto frontier.
The trustworthiness of LLM-enabled applications hinges not on further model improvements, but on establishing system-level threat monitoring to detect post-deployment anomalies.
Forget benchmarks, CoXAM offers a cognitive model that finally explains *why* some XAI techniques resonate with users better than others.
Agentic AI can automate complex optical systems control with near-perfect success rates, leaving code-generation approaches in the dust.
LLMs can be made significantly safer by steering their latent space trajectories with Control Barrier Functions, preventing unsafe outputs without retraining.
Forget collecting real L2 speech data: this accent normalization method trains on synthetic L2 speech generated from text, achieving better content preservation and naturalness than models trained on real data.
Achieve SOTA joint audio-video generation with JavisDiT++ using just 1M public training examples, rivaling performance of models trained on proprietary datasets.
Control hybrid rigid-soft robots with the ease of AR teleoperation, thanks to a new pipeline that accurately models the soft robot's real-world behavior in simulation.
Multi-expert systems can suffer from *worse* performance than single-expert systems due to an inherent underfitting problem that arises from the difficulty of identifying the correct expert to defer to.
Self-evolving LLM agents can be persistently compromised by injecting malicious payloads into their long-term memory, turning them into "zombie agents" that execute unauthorized actions across sessions.
dVoting unlocks significant reasoning gains for diffusion LMs at test time by iteratively refining only the most uncertain tokens, sidestepping the computational bottleneck of full re-sampling.
Pinpointing mismatches between architectural simulators and RTL implementations is now far easier, thanks to a new benchmark generation methodology that isolates single microarchitectural features.
LLM agents can now achieve near-perfect accuracy in end-to-end web testing by symbolizing GUI elements and inferring pre/post-condition oracles, blowing away previous approaches.
As AI research concentrates in private labs, universities must shift from maximizing discovery to ensuring knowledge trustworthiness to maintain academic authority.
A drone can now autonomously replan its path in response to detected environmental changes, using a UNet+CBAM change detection model and DQN-based path planning.
LLM-powered program repair tools can memorize bug fixes, but struggle with minor syntactic changes and applying fixes in real-world projects, revealing a critical gap in their reasoning abilities.
LLMs can now navigate complex multi-agent pathfinding scenarios with superhuman efficiency, thanks to a neural algorithmic reasoning module that injects graph-aware intelligence.
Current LLMs and VLMs struggle with multi-step reasoning in long videos, often failing to maintain temporal coherence and procedural validity, as revealed by a new benchmark of hour-long narratives.
Open-source multimodal models just leveled up: InternVL3 rivals closed-source titans like GPT-4o by pre-training vision and language together from the start.