Search papers, labs, and topics across Lattice.

Alibaba's global research initiative. Publishes actively on NLP, multimodal models, and AI systems.
30
0
0
Transform unstructured audio-visual signals into machine-readable structured knowledge with the Logics-Parsing-Omni model, which enforces strict alignment between high-level semantics and low-level facts.
Foley-Flow achieves state-of-the-art video-to-audio generation by aligning audio-visual representations with masked modeling, enabling rhythmic synchronization that was previously lacking.
LLMs can generate better recommendations if they pause to verify their reasoning steps, rather than reasoning in one long chain.
Achieve stable and competitive quantization for multimodal LLMs by explicitly accounting for modality-specific characteristics and cross-modal computational differences.
Datacenter networks are haunted by "ghosts"—topology knowledge failures due to link flaps that occur every 48 seconds at 2025 cluster scale—and existing mitigations are insufficient, but Open Atomic Ethernet offers a potential exorcism.
Finally, a CVR prediction dataset with labels from multiple attribution mechanisms, revealing that multi-attribution learning consistently boosts performance, but only with careful architecture and objective selection.
Despite achieving comparable overall scores, top-performing medical LLMs exhibit surprising differences in reasoning, evidence use, and longitudinal follow-up when evaluated on a new Chinese medical benchmark, revealing critical gaps in clinically actionable treatment planning.
LLMs still struggle with PhD-level scanning probe microscopy tasks, but SPM-Bench offers a new automated pipeline to generate challenging scientific benchmarks and quantify model "personalities" like "Conservative" or "Gambler."
Alibaba's FuxiShuffle dynamically adapts to workload and resource fluctuations in ultra-large distributed data processing, slashing job completion times and resource consumption where prior systems falter.
Forget interaction-driven next-item prediction: SIGMA uses instruction-following and semantic grounding to create a generative recommender that adapts to evolving trends and diverse tasks on AliExpress.
Achieve both long-term scene consistency and precise camera control in world models with UCM, a novel framework sidestepping explicit 3D reconstruction.
LLMs can handle basic route planning, but fall apart when user preferences enter the mix, as shown by a new benchmark based on real-world queries.
LLMs can generate unbiased pseudo-labels for unexposed items in pre-ranking, boosting click-through rate by 3.07% in production while improving diversity.
Taobao's recommender system just got a 1.65% CTR boost by compressing ultra-long user behavior sequences with a hierarchical codebook and sparse attention, proving that personalized interest centers can be learned efficiently.
LLM knowledge distillation and cross-user preference mining can significantly boost search relevance and CTR prediction, even for cold-start users.
LLMs can uncover previously hidden vulnerabilities in database management systems by intelligently fuzzing obscure, system-level features that traditional fuzzers miss.
Taobao's new LTV ranking framework boosts long-term user engagement by learning nuanced video influence and creator-driven re-engagement, all while fitting within existing industrial constraints.
CoT reasoning can hurt recommender performance by drowning out important ID signals – unless you compress reasoning chains and use bias-subtracted contrastive decoding to realign the inference subspace.
LLM code copilots are put to the test with SecCodeBench-V2, a new benchmark revealing their security vulnerabilities across 22 CWE categories and five programming languages.
Achieve diverse and stylistically consistent long-form piano accompaniments by explicitly planning style at the measure level and retrieving suitable patterns from a corpus.
By unifying contrastive learning with pose-conditioned generative modeling, BindCLIP produces interaction-aware embeddings that substantially improve virtual screening, especially in challenging out-of-distribution scenarios.
LLM benchmark accuracy jumps 10% when evaluated on a cleaned-up version of Humanity's Last Exam, highlighting the significant impact of dataset noise on performance metrics.
Overcome "intent myopia" in trigger-based recommendations with DAIAN, a network that adaptively learns user intent from click correlations and hybrid ID/semantic similarity, boosting CTR in e-commerce.
Ditch the black-box reward function: this new rubric-based RL framework uses LLMs to judge responses against interpretable criteria, offering a more robust and transparent approach to alignment.
RynnBrain leapfrogs existing embodied foundation models, offering a unified, open-source spatiotemporal model that excels at physically grounded reasoning and planning across a wide range of benchmarks.
LLMs can overcome "tunnel vision" in multi-turn search scenarios by using information gain to guide dynamic prompting interventions, leading to more efficient and accurate reasoning.
Key contribution not extracted.
Forget huge models: parameter-efficient fine-tuning turns tiny language models into code-generating powerhouses that outperform larger, untuned counterparts.
Failure-driven post-training, combined with a meticulously curated 10M token STEM dataset, unlocks a 4.68% performance boost in LLM reasoning, proving that strategic data synthesis around model weaknesses is a powerful path to improvement.
LLM safety guardrails are far less robust than benchmarks suggest, with accuracy dropping by as much as 57% on novel adversarial attacks, and some even generating harmful content in a "helpful mode" jailbreak.