Search papers, labs, and topics across Lattice.
100 papers published across 4 labs.
Requirements volatility doesn't just delay projects; it directly undermines software architecture, leading to technical debt and scheduling nightmares.
Unlock geometric algebra's performance potential in neural networks and spatial computing by compiling directly from multi-way relationships, eliminating manual specialization and ensuring geometric correctness.
Semantic sorting in LLMs can be twice as fast with no loss in accuracy by strategically combining listwise ranking algorithms.
Current LMMs can't reliably turn complex images into code, failing to preserve structural integrity even in relatively simple scenarios, as shown by the new Omni-I2C benchmark.
Software architecture, a critical but underspecified domain, finally gets a unified benchmarking platform with ArchBench, enabling standardized evaluation of LLMs on complex system design tasks.
Semantic sorting in LLMs can be twice as fast with no loss in accuracy by strategically combining listwise ranking algorithms.
Current LMMs can't reliably turn complex images into code, failing to preserve structural integrity even in relatively simple scenarios, as shown by the new Omni-I2C benchmark.
Software architecture, a critical but underspecified domain, finally gets a unified benchmarking platform with ArchBench, enabling standardized evaluation of LLMs on complex system design tasks.
Requirements volatility doesn't just delay projects; it directly undermines software architecture, leading to technical debt and scheduling nightmares.
LLMs can now automatically generate bug-detection patterns for scientific code, offering a scalable solution to the growing problem of methodology errors in AI-driven research.
Achieve up to 2.4x speedup over OpenBLAS on RISC-V by using MLIR and xDSL to generate optimized RVV code, finally unlocking the potential of RISC-V vector extensions.
Simply prompting for test-driven development can *increase* regressions in AI coding agents; instead, focus on surfacing contextual information about which tests are most relevant.
LLMs can read datasheets, but still can't design circuits, failing at basic physical intuition despite showing promise in documentation understanding.
Random walks and equitable partitions offer a fresh perspective on bounding the smoothing parameter in code-based cryptography, potentially surpassing Fourier transform-based methods.
Automated injection of realistic vulnerabilities and synthesis of PoV exploits finally makes scalable, precisely labeled, repository-level vulnerability datasets a reality.
Forget complex multi-agent systems: Skele-Code's no-code interface slashes token costs by shifting agent involvement to code generation only, enabling subject matter experts to build agentic workflows directly.
Unlock geometric algebra's performance potential in neural networks and spatial computing by compiling directly from multi-way relationships, eliminating manual specialization and ensuring geometric correctness.
Despite the ease of integrating ML cloud services, developers are widely misusing them, leading to quality and maintainability issues that MLmisFinder can now automatically detect with high accuracy.
Forget about chasing the perfect model architecture – this work suggests the real key to better AI agents lies in crafting more precise and complete specifications, since the implementation can always be re-generated.
LLMs can't reason their way through Rust verification, struggling to complete proofs even with substantial hints, revealing a critical gap in their ability to handle the rigorous demands of secure software development.
A 4B parameter model can nearly match the privilege escalation performance of a state-of-the-art closed LLM like Claude Opus, while being fully local and 100x cheaper to run.
Standardized, modular GenAI teaching units in GUIDE offer a practical path to integrating cutting-edge AI tools into digital design education.
Secure enclave updates and migrations, previously missing from RISC-V TEEs, are now practical thanks to a novel toolkit that adds minimal overhead.
LLMs struggle with code comprehension, but a simple RNN pass over their embeddings can boost accuracy by over 5%.
Finally, a software energy profiler achieves both high accuracy and cross-platform portability, enabling practical algorithmic energy optimization across diverse languages and hardware.
Forget prompt engineering: AgentFactory lets LLM agents self-evolve by accumulating and refining executable Python subagents, making task re-execution more reliable and efficient.
Turning past programming failures into reusable knowledge boosts automated repair performance by 3.7% on a multimodal benchmark.
Security patch detectors trained on standard vulnerability databases are practically useless in the real world, losing up to 90% F1-score when deployed on in-the-wild data.
Genetic programming can discover unconventional multigrid cycles that outperform hand-tuned methods, suggesting automated algorithm design can unlock untapped performance in classical numerical solvers.
Federated Computing as Code lets you enforce data sovereignty in federated systems with cryptographic guarantees, moving beyond runtime policies and trust assumptions.
Forget specialized tools: a standard Unix terminal and clever RL are all you need to beat much larger LLMs at code search.
LLMs can now generate Verilog code that's not just correct, but also optimized for real-world hardware constraints like power, performance, and area, thanks to a novel multi-agent system with evolving memory.
Even when given identical data and research questions, autonomous AI coding agents exhibit surprisingly high variability in their empirical findings, raising concerns about the reliability of AI-driven research.
SseRex finds hundreds of potential bugs in Solana smart contracts that existing tools miss, revealing that subtle, easily overlooked issues are often the root cause of severe exploits.
Even without pre-loaded database schemas, a new RL agent matches or beats state-of-the-art text-to-SQL models that have full schema access.
Feature models, often treated as static configuration spaces, reveal hidden structural patterns and domain-specific deviations when viewed through the lens of network analysis.
LLMs can now reliably translate natural language into executable option trading strategies, thanks to a new domain-specific language that constrains their output to verifiable semantic parses.
Finally, a unified software framework promises to tame the wild west of quantum dot device tuning, enabling researchers to share and adapt characterization routines across labs.
Software traceability research is severely imbalanced, with code-related links dominating and 95% of tools stuck in academia.
RepoReviewer tackles the complexity of repository-level code review with a multi-agent architecture, breaking down the monolithic process into manageable stages for more relevant and efficient feedback.
Software energy consumption isn't just an aggregate number – it's a path-dependent journey, and this new model reveals hidden optimization opportunities that can slash energy use by up to 705x.
Early-career researchers in experimental physics report significant gaps in training for software and machine learning tools crucial to their work, highlighting a critical need for improved educational resources.
Symbolic planning unlocks significant gains in RTL synthesis and summarization, boosting LLM performance by 20% without fine-tuning.
Forget generic code generation – this work shows that structure-aware retrieval of domain-specific examples slashes the debugging needed to get LLMs to produce working scientific visualization pipelines.
A multi-agent system that mimics rubber-duck debugging slashes critical path delay by 25% and power consumption by 22% in RTL code, outperforming LLM-based baselines.
Prompts aren't just instructions; they're evolving requirement artifacts that blend intent and implementation, demanding a new approach to software engineering.
Coding agents struggle to maintain faithfulness to specifications that emerge gradually over long interactions, losing significant implementation fidelity compared to single-shot specifications.
GitOps can transform CTF management, enabling automated deployments, enhanced collaboration, and cost-effective scaling.
CodeScan achieves 97%+ accuracy in detecting data poisoning attacks in code generation LLMs by identifying structural similarities across generations, even when semantics are expressed in diverse syntactic forms.
AI-generated code's fluency masks a critical flaw: it often fails to deliver what users actually intend, highlighting the urgent need for "intent formalization" to bridge the gap between informal requirements and precise program behavior.
Software engineering students are most likely to misuse LLMs on programming assignments and documentation, especially when they feel squeezed for time or lack clear guidance.
Novice programmers can boost code comprehension by 31% thanks to automated refactoring that minimizes cognitive load.
By explicitly exposing the model's reasoning process during SVG generation, CTRL-S achieves higher task success rates, superior SVG code quality, and exceptional visual fidelity compared to existing methods.
Code LLMs can achieve SOTA performance in agentic tasks by explicitly modeling the dynamic evolution of software logic across different training stages.
GenAI is already halving the time developers spend on boilerplate and documentation, but its real potential lies in shifting focus from routine coding to higher-level tasks like specification quality and architectural reasoning.
Open-source LLMs can grade UML diagrams with near-human accuracy on individual criteria, paving the way for AI-assisted teaching without relying on proprietary models.
A new rank-based phenotypic characterization scheme slashes the computational cost of genetic programming for dynamic project scheduling, enabling faster discovery of high-quality heuristic rules.
LLMs can now write better quantitative trading algorithms than humans, thanks to a new framework that turns unstructured financial reports into executable code.
Even when LLMs translate code correctly, over 20% of the time it's surprisingly inefficient due to algorithmic flaws, poor language choices, or resource mismanagement.
LLMs can parse almost any SQL dialect by segmenting queries into clauses and expressions, then validating with grammar-based methods.
A new 32B code LLM trained specifically for industrial tasks crushes existing models on specialized domains like chip design and GPU kernel optimization, while remaining competitive on general coding benchmarks.
Current telemetry falls woefully short in detecting advanced software supply chain attacks, with even the best single source capturing less than 40% of the attack chain, underscoring the critical need for multi-source data fusion.
LLMs struggle to formalize program post-conditions from natural language, with even the best models failing to correctly formalize all tasks, highlighting a critical gap in their ability to bridge natural language understanding and formal verification.
Identity-based software signing may reduce key management burdens, but it relocates complexity to verification, configuration, and deployment, creating new usability challenges.
Imagine a compiler that understands the size and lifetime of your data so well it can automatically optimize memory allocation and representation, giving you design-time insights into performance.
Security scanners flag nearly half of AI agent skills as malicious, but adding GitHub repository context reveals that the true number is closer to 0.5%.
A Qwen3-8B model, trained with a new SFT+RLAIF recipe on a challenging new benchmark, SWE-QA-Pro, beats GPT-4o in repository-level code understanding.
Vectorizing Verilog designs slashes memory consumption by over 50% in formal verification, even without changing the underlying hardware.
Pinpointing the exact line of code causing a test failure boosts code generation performance by 3%, without needing a critic or extra training.
LoRA fine-tuning beats prompting and RAG for adapting smaller language models to domain-specific code generation tasks, offering a path to higher accuracy and domain alignment.
LLM-generated code, while fast, is often subtly wrong, and VibeContract offers a way to make "vibe coding" more predictable and trustworthy by adding explicit, verifiable contracts.
Scaling LLM-based multi-agent systems doesn't just need better prompts or models, but a whole new software engineering approach focused on managing runtime entropy.
Algorithms used in product line tools are only half as fast as the best alternative for finding backbones in configurable software systems, but picking the right algorithm requires predicting the optimal chunk size, which remains an open problem.
Hybrid ClojureScript lets you visually code geometric ideas directly within your text, opening up new possibilities for domain-specific language design.
Say goodbye to ad-hoc scripts: this automated workflow slashes manual intervention in NEB calculations, ensuring reproducible reaction path optimization across platforms.
AI can now semi-autonomously formalize complex mathematical theorems like the Vlasov-Maxwell-Landau equilibrium, even outpacing traditional mathematical research.
LLMs can now assist in the challenging task of writing temporal logic properties for model-based development, potentially streamlining the formal specification process.
LLMs struggle to effectively use private library APIs even when provided with the correct documentation, but PriCoder can boost their performance by over 20% through targeted training data synthesis.
LLMs can boost code clone detection accuracy by selectively arbitrating only 0.2% of uncertain cases flagged by a multimodal fusion model, achieving a 0.3% absolute Macro-F1 gain.
LLM coding agents still fall short when optimizing real-world codebases, especially when balancing multiple objectives like performance and correctness, as revealed by the new FormulaCode benchmark.
AI code review agents may scale defect screening, but their suggestions are adopted less often and, when adopted, can actually *worsen* code quality, underscoring the critical need for human oversight.
Unlock agentic software development by transforming institutional knowledge into actionable, AI-consumable Atomic Knowledge Units, enabling agents to perform tasks correctly without needing to reconstruct organizational context.
By strategically guiding self-play with challenging real-world examples, GASP unlocks a 2.5% performance boost in coding LLMs and conquers previously unsolvable problems.
LLMs can generate syntactically correct tests, but their ability to *reason* about code faults is surprisingly poor, hindering autonomous debugging.
Stop throwing away valuable context: Lore lets you turn ordinary Git commits into structured knowledge troves for AI coding agents, capturing the "Decision Shadow" behind every code change.
Forget massive models: fine-tuning a small 4B parameter model on carefully curated data can match or even approach the performance of 100B+ parameter LLMs in program verification tasks.
Forget tedious exam creation: ChatGPT, guided by prompt engineering, can generate programming questions as good as (or better than) human-crafted ones.
Most "agent skills" hyped for boosting LLMs in software engineering provide almost no benefit in real-world tasks, with 80% yielding zero pass-rate improvement.
Pythonistas can now easily formalize legal reasoning thanks to PYTHEN, a new framework that brings the power of PROLEG-style defeasible logic to the Python ecosystem.
Achieve near-perfect decompilation-to-compilation by hot-swapping LLM-repaired functions into original binaries and using runtime feedback to eliminate semantic hallucinations.
Despite the promise of AI-powered tools, developer experience still trumps AI assistance when it comes to writing secure code.
LLMs can now orchestrate system-wide optimizations across microservices, boosting throughput by 36% and slashing response times by 27%.
End-to-end MLLMs struggle with visual reasoning, but a program synthesis approach that explicitly represents compositional logic dramatically improves accuracy and transparency.
By adversarially co-evolving code and test LLMs, Code-A1 achieves code generation performance on par with human-annotated training, while simultaneously boosting the LLM's ability to find bugs.
Forget fixed workflows: SEMAG's self-evolving agents dynamically adapt their coding process and even upgrade their backbone LLM, leading to state-of-the-art code generation performance.
Scientific software projects struggle to resolve high-priority technical debt that propagates across multiple code artifacts, suggesting a need for better tooling and monitoring.
The rise of GitHub Actions is inadvertently marginalizing test code review, with PRs involving tests often receiving no reviews or comments post-GHA adoption.
Software metrics are failing engineers because they're built on shaky measurement science, not real-world decisions.
LLMs can be guided to discover better solutions in open-ended scientific tasks by identifying and reasoning about causal factors that influence the evolutionary process.
Despite high static quality scores, YARA rules in the wild suffer from significant noise, low recall, and a bias towards legacy threats, exposing a "double penalty" for defenders.
LLMs can now be tested on their ability to formally verify real-world cryptographic assembly code, not just competition math problems.
Hands-on robotics education gets a boost from a project-based learning framework that teaches students ROS through the automation of build block disassembly.
Quantum programs can be optimized by automatically uncomputing intermediate values based on their semantic lifetime, leading to reduced circuit depth and qubit usage.
Turns out, AI agents debug better when you ask nicely: trust-based prompts boost debugging depth by 59% compared to fear-based approaches, which show no improvement over baseline.
Despite being the most widely recognized testing qualifications, ISTQB certifications are under fire for being too theoretical and not keeping pace with agile and automation, raising questions about their real-world value.