Search papers, labs, and topics across Lattice.
AI-driven code generation, program synthesis, automated debugging, and software engineering with LLMs.
#9 of 24
1
LLMs can now generate high-performance CUDA attention kernels that outperform hand-optimized code, thanks to a novel lift-transfer-lower approach that leverages expert knowledge.
An agentic pipeline can autonomously discover and verify real-world privilege escalation vulnerabilities in Windows COM binaries, outperforming both static analysis and existing coding agents.
LLMs can generate GPU kernels, but they're surprisingly bad at it: 72% of fusion tasks fail across all methods, and nearly half of the "correct" kernels are actually slower than PyTorch.
LLMs can now generate neural architectures with 75% less code and higher accuracy by learning to write code "diffs" instead of building from scratch.
Attention-based models for programming knowledge tracing might not be as effective as previously thought; careful experimental design reveals that their gains over simpler models are often overstated.
LLM agents can now autonomously design complex hardware like an LLM inference accelerator with hard-wired TurboQuant support in just 80 hours.
Verifier-driven executable world models can solve complex reasoning tasks like ARC-AGI-3 without game-specific code, hinting at a path towards more generalizable AI agents.
Stop brittle, undeployable AI-generated code: this retrieval-augmented scaffolding method bakes in architectural constraints from the start.
LLM-guided code evolution, when combined with runtime feedback and MCTS, can reliably achieve 15x speedups on real-world Java code, surpassing naive LLM-based optimization.
Agent-repair leaderboards are more fragile than we thought: methods that peek at the evaluator's signals to guide internal repair choices can cause drastic reordering when the evaluator changes.
E-graphs can help AI learn the unwritten rules of jazz harmony, mirroring how human musicians internalize complex musical patterns.
Developer-style keyword searches completely nullify the advantage of even the best code embedding models, highlighting a critical gap in current code search techniques.
AI coding assistants' Terms of Service overwhelmingly place responsibility for code correctness, safety, and legal compliance on the user, creating a potential accountability gap as these tools become more autonomous.
Tool-using SQL agents can learn to be more efficient and accurate by getting feedback on *how* they reason, not just *what* they output.
Mixing tasks with different safety levels in automotive ECUs can compromise critical functions, highlighting the need for careful task allocation strategies.
A clever routing strategy lets a tiny 3B code model outperform a massive 480B model on routine code completion tasks, slashing accelerator usage by 58%.
Pinpointing minimal "conflict essences" reveals precisely how graph transformation rules interfere, even with complex nested conditions.
LLM agents that autonomously explore code repositories can match the classification accuracy of simpler LLMs with hand-crafted context, hinting at a future where agents surpass human-labeled data in complex software understanding tasks.
Developers overwhelmingly trust and directly apply LLM-generated code refactoring suggestions, but when they don't, the changes are surprisingly drastic and predictable.
Bug localization tool adoption hinges on more than just accuracy: developers need tools that mesh with their workflows and leverage contextual information.
GenAI coding assistants boost developer productivity, but the gains shrink outside the lab and don't translate to better learning.
Turns out, chunking code by function is the *worst* way to do retrieval-augmented code completion.
Automating UVM testbench generation with LLMs slashes verification time from days to hours, achieving near-complete code coverage.
"Vibe coding" platforms promise effortless app creation, but SWE-WebDevBench reveals they often deliver visually appealing frontends with broken backends, struggle with security, and require significant human effort to reach production readiness.
Proving semantic equivalence between LLVM IR and RISC-V code is now possible within a single framework, thanks to a new formal RISC-V semantics built on Interaction Trees.
Guaranteeing safety in autonomous systems gets a boost: this work enables formal verification of hybrid system code that directly controls physical processes.
Discrete diffusion, with carefully designed transition matrices for commands and parameters, unlocks superior CAD generation compared to continuous diffusion baselines.
Stop squinting at Nsight Compute profiles: KEET uses LLMs to automatically diagnose GPU kernel bottlenecks and suggest optimizations in plain English.
Forget the heavy transformers: surprisingly effective LLM-generated code detection can be achieved with lightweight stylometric features and decision trees, offering near-instant inference.
Rose-SQL achieves state-of-the-art multi-turn Text-to-SQL performance with small models, outperforming larger fine-tuned models without any training.
Stack Overflow code quality varies significantly across US states, with major tech hubs surprisingly not producing the highest quality code.
Forget the heavyweight deep learning approaches – surprisingly effective vulnerability detection can be achieved with simple TF-IDF token features and basic code metrics, offering a fast and transparent baseline for human triage.
LLM-based vulnerability repair can be significantly improved by focusing on root cause analysis, leading to more robust and less superficial patches than current methods.
Securely onboarding third-party apps in Open RAN just got easier: a new zero-trust rubric offers explicit Accept/Escalate/Block decisions.
LLMs can now automatically generate effective proof-of-vulnerability tests for complex software, uncovering real-world attack vectors with minimal human intervention.
Innocuous-looking coding tasks, when chained together, trick even the best coding agents into creating exploitable code with alarming frequency.
LLMs struggle to formally verify real-world code, but KVerus's self-adaptive approach closes the gap, enabling verification of complex, evolving Rust systems with significantly improved success rates.
LLMs can cheaply generate malware variants that are structurally diverse yet functionally identical, posing a significant challenge to signature-based detection methods.
LLMs can achieve surprisingly high precision in smart contract vulnerability detection, but only with vulnerability-specific prompts and AST-based context.
Zorya can now automatically find previously undetected vulnerabilities in compiled Go binaries, even silent integer overflows that other tools miss.
Current package managers are surprisingly vulnerable: a single misconfiguration can silently allow attackers to inject malicious dependencies, a problem solved by this paper's cryptographically enforced provenance system.
Upskilling internal "AI Advocates" can be a surprisingly effective catalyst for driving cultural and technical transformation in software development squads.
LLM agent skills are needlessly brittle and insecure: SkCC compiles them into a portable, hardened format that boosts performance by 50% and proactively blocks attacks.
Sometimes, giving an agent more information actually *hurts* its ability to solve a problem, especially when its default behavior is already pretty good.
Java developers drowning in unfixed bugs, rejoice: automated reproduction test generation is now a viable option, thanks to a new benchmark and adapted generator.
Software testing tools share surprisingly consistent visual patterns, offering a blueprint for designing more intuitive and informative testing interfaces.
Formal reasoning about programmable memory hierarchies is now possible, thanks to a new ISA-level memory consistency model that tames the complexity of architectures like t\"{a}k\={o}.
Stop wrestling with disparate tools and languages for music performance analysis: Cosmodoit offers a unified Python pipeline for efficient, large-scale feature extraction.
LLMs can't reliably orchestrate multi-step manufacturing workflows, but this physics-grounded multi-agent system can, boosting tool execution success by 87.5% while ensuring traceable, risk-aware decisions.
RAG's reputation for being ineffective in reasoning tasks is shattered by showing that retrieving the right data – intermediate "thinking traces" – unlocks substantial performance gains, even for state-of-the-art models.