Search papers, labs, and topics across Lattice.
69 papers published across 2 labs.
Current package managers are surprisingly vulnerable: a single misconfiguration can silently allow attackers to inject malicious dependencies, a problem solved by this paper's cryptographically enforced provenance system.
LLMs can now generate high-performance CUDA attention kernels that outperform hand-optimized code, thanks to a novel lift-transfer-lower approach that leverages expert knowledge.
An agentic pipeline can autonomously discover and verify real-world privilege escalation vulnerabilities in Windows COM binaries, outperforming both static analysis and existing coding agents.
LLMs can generate GPU kernels, but they're surprisingly bad at it: 72% of fusion tasks fail across all methods, and nearly half of the "correct" kernels are actually slower than PyTorch.
LLMs can now generate neural architectures with 75% less code and higher accuracy by learning to write code "diffs" instead of building from scratch.
LLMs can now generate high-performance CUDA attention kernels that outperform hand-optimized code, thanks to a novel lift-transfer-lower approach that leverages expert knowledge.
An agentic pipeline can autonomously discover and verify real-world privilege escalation vulnerabilities in Windows COM binaries, outperforming both static analysis and existing coding agents.
LLMs can generate GPU kernels, but they're surprisingly bad at it: 72% of fusion tasks fail across all methods, and nearly half of the "correct" kernels are actually slower than PyTorch.
LLMs can now generate neural architectures with 75% less code and higher accuracy by learning to write code "diffs" instead of building from scratch.
Attention-based models for programming knowledge tracing might not be as effective as previously thought; careful experimental design reveals that their gains over simpler models are often overstated.
LLM agents can now autonomously design complex hardware like an LLM inference accelerator with hard-wired TurboQuant support in just 80 hours.
Verifier-driven executable world models can solve complex reasoning tasks like ARC-AGI-3 without game-specific code, hinting at a path towards more generalizable AI agents.
Stop brittle, undeployable AI-generated code: this retrieval-augmented scaffolding method bakes in architectural constraints from the start.
LLM-guided code evolution, when combined with runtime feedback and MCTS, can reliably achieve 15x speedups on real-world Java code, surpassing naive LLM-based optimization.
Agent-repair leaderboards are more fragile than we thought: methods that peek at the evaluator's signals to guide internal repair choices can cause drastic reordering when the evaluator changes.
E-graphs can help AI learn the unwritten rules of jazz harmony, mirroring how human musicians internalize complex musical patterns.
Developer-style keyword searches completely nullify the advantage of even the best code embedding models, highlighting a critical gap in current code search techniques.
AI coding assistants' Terms of Service overwhelmingly place responsibility for code correctness, safety, and legal compliance on the user, creating a potential accountability gap as these tools become more autonomous.
Tool-using SQL agents can learn to be more efficient and accurate by getting feedback on *how* they reason, not just *what* they output.
Mixing tasks with different safety levels in automotive ECUs can compromise critical functions, highlighting the need for careful task allocation strategies.
A clever routing strategy lets a tiny 3B code model outperform a massive 480B model on routine code completion tasks, slashing accelerator usage by 58%.
Pinpointing minimal "conflict essences" reveals precisely how graph transformation rules interfere, even with complex nested conditions.
LLM agents that autonomously explore code repositories can match the classification accuracy of simpler LLMs with hand-crafted context, hinting at a future where agents surpass human-labeled data in complex software understanding tasks.
Developers overwhelmingly trust and directly apply LLM-generated code refactoring suggestions, but when they don't, the changes are surprisingly drastic and predictable.
Bug localization tool adoption hinges on more than just accuracy: developers need tools that mesh with their workflows and leverage contextual information.
GenAI coding assistants boost developer productivity, but the gains shrink outside the lab and don't translate to better learning.
Turns out, chunking code by function is the *worst* way to do retrieval-augmented code completion.
Automating UVM testbench generation with LLMs slashes verification time from days to hours, achieving near-complete code coverage.
"Vibe coding" platforms promise effortless app creation, but SWE-WebDevBench reveals they often deliver visually appealing frontends with broken backends, struggle with security, and require significant human effort to reach production readiness.
Proving semantic equivalence between LLVM IR and RISC-V code is now possible within a single framework, thanks to a new formal RISC-V semantics built on Interaction Trees.
Guaranteeing safety in autonomous systems gets a boost: this work enables formal verification of hybrid system code that directly controls physical processes.
Discrete diffusion, with carefully designed transition matrices for commands and parameters, unlocks superior CAD generation compared to continuous diffusion baselines.
Stop squinting at Nsight Compute profiles: KEET uses LLMs to automatically diagnose GPU kernel bottlenecks and suggest optimizations in plain English.
Forget the heavy transformers: surprisingly effective LLM-generated code detection can be achieved with lightweight stylometric features and decision trees, offering near-instant inference.
Rose-SQL achieves state-of-the-art multi-turn Text-to-SQL performance with small models, outperforming larger fine-tuned models without any training.
Stack Overflow code quality varies significantly across US states, with major tech hubs surprisingly not producing the highest quality code.
Forget the heavyweight deep learning approaches – surprisingly effective vulnerability detection can be achieved with simple TF-IDF token features and basic code metrics, offering a fast and transparent baseline for human triage.
LLM-based vulnerability repair can be significantly improved by focusing on root cause analysis, leading to more robust and less superficial patches than current methods.
Securely onboarding third-party apps in Open RAN just got easier: a new zero-trust rubric offers explicit Accept/Escalate/Block decisions.
LLMs can now automatically generate effective proof-of-vulnerability tests for complex software, uncovering real-world attack vectors with minimal human intervention.
Innocuous-looking coding tasks, when chained together, trick even the best coding agents into creating exploitable code with alarming frequency.
LLMs struggle to formally verify real-world code, but KVerus's self-adaptive approach closes the gap, enabling verification of complex, evolving Rust systems with significantly improved success rates.
LLMs can cheaply generate malware variants that are structurally diverse yet functionally identical, posing a significant challenge to signature-based detection methods.
LLMs can achieve surprisingly high precision in smart contract vulnerability detection, but only with vulnerability-specific prompts and AST-based context.
Zorya can now automatically find previously undetected vulnerabilities in compiled Go binaries, even silent integer overflows that other tools miss.
Current package managers are surprisingly vulnerable: a single misconfiguration can silently allow attackers to inject malicious dependencies, a problem solved by this paper's cryptographically enforced provenance system.
Upskilling internal "AI Advocates" can be a surprisingly effective catalyst for driving cultural and technical transformation in software development squads.
LLM agent skills are needlessly brittle and insecure: SkCC compiles them into a portable, hardened format that boosts performance by 50% and proactively blocks attacks.
Sometimes, giving an agent more information actually *hurts* its ability to solve a problem, especially when its default behavior is already pretty good.
Java developers drowning in unfixed bugs, rejoice: automated reproduction test generation is now a viable option, thanks to a new benchmark and adapted generator.
Software testing tools share surprisingly consistent visual patterns, offering a blueprint for designing more intuitive and informative testing interfaces.
Formal reasoning about programmable memory hierarchies is now possible, thanks to a new ISA-level memory consistency model that tames the complexity of architectures like t\"{a}k\={o}.
Stop wrestling with disparate tools and languages for music performance analysis: Cosmodoit offers a unified Python pipeline for efficient, large-scale feature extraction.
LLMs can't reliably orchestrate multi-step manufacturing workflows, but this physics-grounded multi-agent system can, boosting tool execution success by 87.5% while ensuring traceable, risk-aware decisions.
RAG's reputation for being ineffective in reasoning tasks is shattered by showing that retrieving the right data – intermediate "thinking traces" – unlocks substantial performance gains, even for state-of-the-art models.
Grounding software engineering theories in empirical evidence just got easier: this paper offers a systematic, replicable procedure for translating abstract concepts into testable hypotheses.
Sustainable scientific software isn't just about the code; it's about consistent testing and clear links between code quality and tests, a pattern often missing in unsustainable projects.
LLMs can now collaboratively pinpoint root causes in microservices using a tree-structured search, but production environments reveal the limitations of this approach when faced with polyglot stacks and inconsistent logging.
LLMs can generate formally correct postconditions for code, but they often miss crucial details, especially in complex, real-world scenarios.
Injecting graph representations of code directly into LLM internals unlocks a 16% BLEU boost in code generation, suggesting that structural awareness is key to next-gen code models.
LLMs can partially reverse engineer legacy code, but don't expect them to fully understand your spaghetti code just yet.
LLMs can't rebuild software from scratch, even for widely used programs like FFmpeg and SQLite, revealing a critical gap in their ability to make high-level software architecture decisions.
Software stability isn't about how much code you commit, but how far ahead you're thinking: fractal analysis reveals long-range planning in commit patterns predicts stability better than commit volume alone.
LLMs can catch more bugs in quantum code than traditional rule-based linters, suggesting a new path to more reliable quantum software.
A new Brick-Circuit generator achieves higher expressibility and entanglement in quantum program testing, outperforming existing methods with shallower circuits.
Rust developers can slash the noise in static analysis alerts by over 50% using an RL agent that learns to suppress false positives, outperforming even LLM-based methods.
Guaranteeing software stability during remodularization doesn't require sacrificing performance; a multi-agent consensus protocol can match state-of-the-art optimizers while acting as a "circuit breaker" for strict stability constraints.
Microsoft's EngThrive framework reveals how aligning developer productivity metrics with genuine improvement can drive sustained, system-level gains in Speed, Ease, and Quality.
Today's best AI agents can only solve 55% of real-world academic tasks that university students find challenging, revealing a significant gap between current AI capabilities and the demands of academic workflows.
Hands-on experience with Raspberry Pi clusters and student-driven learning can effectively bridge the HPC skills gap in undergraduate engineering education.
Treating agentic AI systems as token economies reveals that current designs, which optimize token usage locally, lead to predictable global misallocations and inefficiencies.
Meta's risk assessment of its Code World Model (CWM) gives it a clean bill of health, concluding it poses no *new* catastrophic risks beyond those already present in the AI landscape.
Current code reward models are myopic, mostly rewarding functional correctness, but Themis-RM learns to score code across multiple criteria and languages, opening the door to more nuanced and useful code generation.
LLMs can now generate 70% syntactically correct and geometrically consistent 3D objects from text, thanks to retrieval-augmented code synthesis.