Search papers, labs, and topics across Lattice.
67 papers published across 1 lab.
Training LLMs on temporally partitioned data reveals a practical method for mitigating lookahead bias, enabling more reliable financial forecasting.
Achieve controllable and scalable speech generation with MOSS-TTS, enabling zero-shot voice cloning and long-form synthesis.
Unlock the power of your favorite classifier for ordinal data: Classifier Pooling consistently beats standard methods, especially when data is scarce or categories are numerous.
YouTube's platform defenses are a house of cards: circumventing one control often triggers a cascade of failures, demanding constant architectural adaptation for large-scale content replication.
LLMs can get a massive multilingual boost, especially in low-resource languages, by offloading translation to specialized models and carefully aligning their representations.
Achieve controllable and scalable speech generation with MOSS-TTS, enabling zero-shot voice cloning and long-form synthesis.
Unlock the power of your favorite classifier for ordinal data: Classifier Pooling consistently beats standard methods, especially when data is scarce or categories are numerous.
YouTube's platform defenses are a house of cards: circumventing one control often triggers a cascade of failures, demanding constant architectural adaptation for large-scale content replication.
LLMs can get a massive multilingual boost, especially in low-resource languages, by offloading translation to specialized models and carefully aligning their representations.
LLMs encode hierarchical semantic relations asymmetrically, with hypernymy being far more robust and redundantly represented than hyponymy.
Ruyi2.5 achieves comparable performance to Qwen3-VL on general multimodal benchmarks while significantly outperforming it in privacy-constrained surveillance, demonstrating the effectiveness of its edge-cloud architecture.
Current CRL benchmarks often fail to provide a holistic view of model performance, hindering progress, but a new aggregate metric could change that.
ManiDreams lets robots handle real-world uncertainty in manipulation tasks without retraining, outperforming standard RL baselines under various perturbations.
Tackle previously intractable open quantum systems simulations with TENSO, a new open-source package that efficiently handles complex environments via tree tensor networks.
LLMs can be drastically compressed without retraining because the relative ordering of weights matters far more than their exact values, opening the door to efficient, training-free compression techniques.
LLMs can mimic human lexical patterns, but larger models act like stereotypical humans, sacrificing diversity for typicality in word associations, a trade-off tunable by temperature.
A 4B parameter model can nearly match the privilege escalation performance of a state-of-the-art closed LLM like Claude Opus, while being fully local and 100x cheaper to run.
Standardized, modular GenAI teaching units in GUIDE offer a practical path to integrating cutting-edge AI tools into digital design education.
Security patch detectors trained on standard vulnerability databases are practically useless in the real world, losing up to 90% F1-score when deployed on in-the-wild data.
This Italian LLM punches way above its weight, matching the performance of models trained on 6-10x more data while using only 3B active parameters during inference.
A small, synthetically generated dataset can dramatically improve LLM performance on low-resource languages, even when the data is noisy and imperfect.
Early-career researchers in experimental physics report significant gaps in training for software and machine learning tools crucial to their work, highlighting a critical need for improved educational resources.
Achieve sub-microsecond decoding-feedback latency in a scalable, open-source QEC system, bringing fault-tolerant quantum computation closer to reality.
A new 1.25B-word Pashto corpus boosts NER performance by 10% and slashes training variance nearly 7x, highlighting the disproportionate value of Wikipedia data.
CRAG's retrieval evaluator surprisingly relies on named entity alignment, not semantic similarity, to judge document quality.
Current time series foundation models struggle with millisecond-resolution 5G network data, revealing a critical gap in their ability to generalize to high-frequency real-world applications.
Code LLMs can achieve SOTA performance in agentic tasks by explicitly modeling the dynamic evolution of software logic across different training stages.
Open-source LLMs can grade UML diagrams with near-human accuracy on individual criteria, paving the way for AI-assisted teaching without relying on proprietary models.
Get competitive multilingual ASR performance with 6x smaller models and 200x less training cost by using balanced fine-tuning and implicit language learning.
A new 32B code LLM trained specifically for industrial tasks crushes existing models on specialized domains like chip design and GPU kernel optimization, while remaining competitive on general coding benchmarks.
A new dataset of 2.56 million verses of Arabic lyrics and poetry opens the door for large-scale computational analysis of Arabic language evolution, cultural trends, and artistic expression.
Identity-based software signing may reduce key management burdens, but it relocates complexity to verification, configuration, and deployment, creating new usability challenges.
A graph neural network can learn accurate force field parameters from scratch, rivaling manually-developed force fields and opening avenues for automated force field discovery.
LoRA fine-tuning beats prompting and RAG for adapting smaller language models to domain-specific code generation tasks, offering a path to higher accuracy and domain alignment.
Say goodbye to ad-hoc scripts: this automated workflow slashes manual intervention in NEB calculations, ensuring reproducible reaction path optimization across platforms.
TinyML for agriculture is trending towards localized inference on microcontrollers, but inconsistent resource reporting is slowing down real-world deployment.
A fine-tuned RoBERTa model with only 125M parameters can match the CVE-to-CWE classification accuracy of models 64x larger, proving that strategic fine-tuning and data curation can close the gap between small and large language models.
Stockfish's chess heuristics stumble in the 3D world of Dragonchess, but evolutionary adaptation can bridge the gap, opening new avenues for transferring AI knowledge across structurally different domains.
LM Arena's model anonymity is more vulnerable than previously thought: a new attack, INTERPOL, leverages interpolated preference learning to expose deep stylistic patterns and manipulate rankings.
Latvian NLP gets a boost: a new 111M parameter model outperforms larger multilingual baselines, proving that targeted pretraining still matters for low-resource languages.
Stylometric features, combined with modern multilingual language models, significantly boost the performance of machine-generated text detection, often surpassing language-specific models.
Engineering design research lacks benchmark datasets, but this framework and prototype promise to change that by mapping the data landscape and revealing critical gaps.
Control LLM personality on a continuous spectrum, not just discrete categories, by dynamically fusing LoRA adapters with a reinforcement learning policy.
NLLB-200 can be effectively fine-tuned for low-resource languages like Efik, even with a relatively small, community-curated dataset, achieving surprisingly strong translation performance.
OpenSeeker proves that frontier-level search agents can be achieved with surprisingly little data, outperforming even heavily optimized industrial systems.
Unlock robust feature importance analysis with `xplainfi`, an R package that fills critical gaps by offering conditional importance methods and statistical inference for diverse ML models.
The RIGHT framework offers a new lens for evaluating the validity of human-facing research software, moving beyond just reliability and FAIR principles.
Ditch the tokenizer: this new LLM architecture processes text at the byte level, offering better compression, spelling robustness, and multilingual performance.
Despite high static quality scores, YARA rules in the wild suffer from significant noise, low recall, and a bias towards legacy threats, exposing a "double penalty" for defenders.
ITKIT offers a streamlined CT image analysis pipeline that democratizes access to deep learning-based segmentation, even for researchers with limited computational resources.
Training SLMs for low-resource Indic languages just got easier: a new synthetic dataset of children's stories offers a large, localized, and simple corpus.
The pursuit of "open search" risks being co-opted by powerful corporations unless it shifts focus from technical openness to the actual capabilities afforded to users.
Stop silent capability escalation: this framework uses cryptographic binding and reproducibility verification to ensure AI agents only do what they're authorized to do.
LLMs can classify biomedical articles surprisingly well, rivaling traditional methods like Naive Bayes and Random Forests, especially when using output token probabilities.
RISC-V's memory model can tank SD card performance by 6x, but clever driver tweaks can recover it.
Pinpoint exactly which client leaked your federated model with a black-box watermark that's robust to fine-tuning, pruning, and quantization.
Achieve a 50% inference speedup on a large language model for European languages by compressing it to 7.35B parameters, while retaining 90% of the original 11B parameter model's performance.
Polish language understanding gets a long-context boost: a new encoder model handles sequences up to 8192 tokens, outperforming existing models on long documents while remaining competitive on shorter texts.
Training LLMs on temporally partitioned data reveals a practical method for mitigating lookahead bias, enabling more reliable financial forecasting.
Forget scaling laws: this humanoid robot model crushes benchmarks using 10x less data by cleverly pre-training on human videos and then fine-tuning on robot-specific movements.
Open-source LLMs can help write Japanese pathology reports, but pathologists strongly disagree on which model provides the best explanations.
LLMs can gain 40% in knowledge transfer efficiency by mining skills from open-source agent repositories, without needing retraining.
RAG with small language models (<8B parameters) can be a net negative, as they often ignore retrieved context and even "forget" existing knowledge.
Forget brute-force scaling: Tiny Aya proves a 3B parameter model can achieve state-of-the-art multilingual performance with clever training and region-aware specialization.
Ditch the video: InSpatio-WorldFM achieves real-time spatial intelligence by generating frames independently, offering a low-latency alternative to video-based world models.
Turn your Jupyter notebooks into one-click installable desktop apps with LabConstrictor, democratizing access to computational methods for researchers without DevOps expertise.
Despite their general prowess, open-source LLMs still lag behind proprietary models in the nuanced task of dating texts, even after fine-tuning.
Can a dedicated research program keep a smaller, local LLM competitive against global giants in the rapidly evolving AI landscape?
An AI-integrated agile education platform accelerates practice-relevant AI research by closing the theory-practice gap in software development.
Single-domain watermarks are fundamentally insufficient against modern adversarial toolsets, as spatial and latent watermarks exhibit orthogonal vulnerabilities to generative and geometric attacks, respectively.
A fully open-source speech understanding model, OSUM-Pangu, proves that competitive performance is achievable on non-CUDA hardware, challenging the dominance of GPU-centric ecosystems.
Speech-aware LLMs are surprisingly bad at speaker verification, but a simple embedding injection trick closes the gap with dedicated systems while preserving the LLM's language abilities.