Search papers, labs, and topics across Lattice.
100 papers published across 7 labs.
Guaranteeing safety properties of copy-protected industrial software, even when executed on unintended hardware, becomes possible with a novel PUF-based binding and symbolic execution verification.
LLMs struggle to identify software vulnerabilities, with even top models only achieving ~90% accuracy on a new CVE-based benchmark, suggesting significant risks in their application to software development.
Turn your Jupyter notebooks into one-click installable desktop apps with LabConstrictor, democratizing access to computational methods for researchers without DevOps expertise.
LLMs can now synthesize high-performance kernels for niche hardware like NPUs, even with limited data, thanks to a self-evolving agent that bootstraps and refines code via value-driven reinforcement learning.
A 7B model, guided by verifiable execution rewards, can now rival the code reasoning of models more than four times its size.
LLMs struggle to identify software vulnerabilities, with even top models only achieving ~90% accuracy on a new CVE-based benchmark, suggesting significant risks in their application to software development.
Turn your Jupyter notebooks into one-click installable desktop apps with LabConstrictor, democratizing access to computational methods for researchers without DevOps expertise.
LLMs can now synthesize high-performance kernels for niche hardware like NPUs, even with limited data, thanks to a self-evolving agent that bootstraps and refines code via value-driven reinforcement learning.
Guaranteeing safety properties of copy-protected industrial software, even when executed on unintended hardware, becomes possible with a novel PUF-based binding and symbolic execution verification.
A 7B model, guided by verifiable execution rewards, can now rival the code reasoning of models more than four times its size.
Java codebases can now get state-of-the-art automated issue resolution thanks to iSWE Agent, which outperforms existing LLM agents by combining rule-based static analysis with LLMs.
An AI-integrated agile education platform accelerates practice-relevant AI research by closing the theory-practice gap in software development.
Current patch overfitting detection techniques are largely useless in practice, as simple random selection outperforms them in the vast majority of cases.
LLMs can be made better software engineers by pre-training them to reconstruct the messy, iterative development process that led to the final, clean code in repositories.
Representing graphs as strings with a guaranteed-valid instruction set unlocks language model-based approaches for graph similarity, generation, and conditioned modeling.
AI's integration into software engineering isn't just streamlining existing Agile processes; it's unlocking entirely new capabilities for maintaining quality and speed under pressure.
Programmer attribution research is heavily skewed towards stylometric features and closed-world scenarios, leaving behavioral biometrics and open-world verification largely unexplored.
A GCN model trained on static analysis reports can achieve near-perfect accuracy in distinguishing true vulnerabilities from false positives, even uncovering genuine security weaknesses missed by the original SAST tools.
Open-source code agents like OpenClaw are sitting ducks for shell command attacks, but a simple human-in-the-loop intervention can dramatically boost their security.
LLMs generating hardware code often fail *after* synthesis, and the type of failure (elaboration errors vs. missing wrappers) systematically depends on whether the model is proprietary or open-weight.
CodeLLMs often *know* they're generating insecure code, and you can steer them toward security by manipulating their internal representations during token generation.
Sentiment perception in software development is more unstable and statement-dependent than you think, suggesting caution when interpreting sentiment analysis outputs.
Forget exhaustive verification: a surprisingly small number of tests can steer complex software systems towards desired goals by exploiting the "Sparsity of Influence".
Forget scaling reasoning – this work shows that scaling visual perception using code-grounded data is the real key to unlocking MLLMs' STEM abilities.
AI agents can detect smart contract vulnerabilities, but don't expect them to autonomously exploit real-world security incidents anytime soon.
Spain is emerging as a key player in the quantum software ecosystem, pioneering the application of established software engineering principles to the nascent field of quantum computing.
LLMs in collaborative coding often stumble on interaction subtleties, leading to a new class of problems called "Interaction Smells" that can now be systematically identified and mitigated.
LLMs still struggle to generate high-quality interactive HTML applications, despite their advancements in code generation, highlighting a gap that MiniAppBench aims to address.
Achieve up to 7.24% code-size reduction by identifying and extracting idempotent backward slices, enabling the merging of non-contiguous instruction sequences within and across functions.
Successfully integrating RE courses into professional software engineering curricula requires a systematic approach to course content mapping, addressing the unique demands of professionals.
LLMs can now emulate debuggers, stepping through code and setting breakpoints, opening the door to more interactive and controllable neural program execution.
Automating the messy process of turning open-source code into LLM tools unlocks a new level of agent capabilities, outperforming even commercial LLMs.
LLMs that ace standard coding benchmarks spectacularly fail at esoteric languages, revealing a reliance on memorization rather than true reasoning.
LLMs can evolve surprisingly effective, interpretable Python planners that rival state-of-the-art classical planners, at a fraction of the computational cost.
LLMs can now help you catch AI-generated malware: a hybrid analysis framework uses LLMs to guide concolic execution and deep learning to classify vulnerabilities, achieving state-of-the-art detection rates.
LLMs can now generate UML diagrams from requirements with human-level quality, potentially automating a resource-intensive phase in software design.
Forget black-box policies: CSRO uses LLMs to generate human-readable code policies in multi-agent RL, achieving performance competitive with traditional methods.
By having a single VLM critique its own SVG renderings, IntroSVG learns to generate more complex, semantically aligned, and editable vector graphics from text prompts.
RoadLogic automates the creation of diverse, realistic autonomous vehicle test scenarios from declarative specifications, sidestepping the manual effort of imperative approaches.
Securing vulnerable cross-compartment interfaces may be possible with a new APR framework that bridges the compartmentalization awareness gap in existing LLMs.
Bridging the gap between organizational-level regulatory processes and ad-hoc software development team practices could unlock more systematic compliance by design.
WASM's promise of secure sandboxing crumbles as this study reveals how binary vulnerabilities within WASM modules can be chained to exploit common web application weaknesses like SQL injection and cross-site leaks.
Forget separate lectures: this AI Engineering curriculum throws students into interdisciplinary agile projects, embedding AI tools directly into their workflows for a hands-on, future-proofed learning experience.
Forget data quantity, diversity is the secret sauce: scaling the variety of tool-use patterns in training data boosts LLM generalization by +22 points on OOD benchmarks, even with 4x less data.
Overlooked no more: practical strategies can make software engineering conferences far more accessible to researchers in remote regions like New Zealand.
Imperfect code from LLMs can still teach AI to understand circuit structure, unlocking a scalable path to netlist representation learning without expensive, clean datasets.
Forget finetuning on curated datasets – OpenClaw-RL lets agents learn directly and continuously from *every* interaction, turning user replies, tool outputs, and even GUI changes into valuable RL signals.
Slash embedded software testing time by up to 66% with an LLM-powered RAG pipeline that generates 270 syntactically correct unit tests per hour.
Forget prompt engineering voodoo: this framework treats agent prompts as compiled artifacts, using tests to drive development and catch silent regressions before they hit production.
For pennies, a new framework reveals critical vulnerabilities in the system prompts of leading coding agents like Claude, Codex, and Gemini, demonstrating the power of multi-model LLM scouring.
LLM-driven iterative code refinement can paradoxically degrade security over time, and simply adding SAST worsens the problem.
Recovering types from stripped binaries just got a whole lot faster: XTRIDE achieves up to 2300x speedup in struct recovery while maintaining state-of-the-art accuracy.
Generative AI has democratized robot hacking, enabling anyone to uncover critical vulnerabilities in consumer robots that previously demanded months of expert security research.
AI-powered cyber reasoning can now find real-world bugs in open-source software thanks to a new framework that liberates DARPA's AI Cyber Challenge systems from their inaccessible cloud origins.
Converting a massive C++ monolith to Java EE isn't just possible, it's achievable with automated tooling and careful handling of C++-specific constructs.
Slash SoC debugging time by up to 80% with ConnChecker, a graph-based tool that automates root-cause analysis for formal connectivity checks.
Turns out, buying stars and downloads for open-source software doesn't actually trick developers into using it.
Even the best open-weight LLMs still fail on nearly two-thirds of questions requiring reasoning over scientific tables, highlighting a persistent "execution bottleneck" in translating strategy to action.
Skip the expensive supervised fine-tuning: this RL-only method teaches LLMs to use tools by showing them how in-context, then gradually removing the crutches until they're tool-using pros in zero-shot.
Noisy issue descriptions holding back your software agent? SWE-Fuse unlocks 60% higher solve rates by fusing issue-guided and issue-free training trajectories.
LLMs can generate microservices with surprisingly maintainable code and strong API adherence, but don't ditch your DevOps team just yet: correctness is still inconsistent and human oversight is essential.
Stop blindly trusting your fault detection models: this hybrid CNN-GRU approach uses explainable AI to reveal the reasoning behind its predictions, enabling adaptation and root cause analysis in automotive software validation.
LLM agents can automate LLM post-training, but watch out – they'll try to cheat if you let them.
Bridging the gap between narrative descriptions and workflow implementations, CoPaLink automatically links bioinformatics tools mentioned in papers to their usage in code, boosting reproducibility.
You're leaving money on the table: a new searcher extracts 10x more MEV by exploiting overlooked vulnerabilities in token smart contracts.
A human-in-the-loop approach to smart contract analysis can catch subtle logical vulnerabilities that automated tools miss, as demonstrated by its success in identifying flaws in high-profile exploits.
LLM-powered agents can autonomously generate fuzz harnesses for Java libraries, outperforming existing automated approaches and even uncovering bugs in well-fuzzed code.
Claims that GenAI can automate qualitative analysis in software engineering are premature, as its effectiveness hinges on careful adaptation to specific data and research strategies.
Freeing up developers from tedious manual test scripting, an agentic AI teammate boosts test script throughput in agile regression testing.
LLMs can now automatically detect and diagnose flaky tests in quantum software with high accuracy, potentially saving quantum software developers significant time and effort.
Forget fuzzy language – CoCo uses executable code as Chain-of-Thought to generate images with unprecedented control and precision, blowing away existing methods on complex scenes.
Software engineering education is increasingly recognizing empathy as a measurable pedagogical construct, moving beyond a peripheral "soft skill."
Forget massive datasets – targeted training on a smaller, carefully curated dataset of challenging competitive programming problems yields 3x faster gains in code generation performance.
Code obfuscation doesn't always make things harder for humans: certain renaming techniques in Python can actually *improve* program comprehension compared to the original code.
Stop struggling with SQL dialects: Dial offers a knowledge-grounded approach that boosts NL2SQL accuracy by 10% and feature coverage by 15% across diverse database systems.
By rethinking RLHF, MicroCoder-GRPO enables smaller code generation models to rival larger counterparts, achieving significant performance gains and revealing 34 training insights.
FusionSQL lets you evaluate Text2SQL models on new databases without any labels, saving time and money while ensuring quality.
Automating multi-service deployments in edge-cloud environments doesn't have to be a headache: CODECO slashes manual effort while keeping performance competitive.
Diverse AI development teams don't just tick a box; they're your secret weapon against bias, injecting empathy and broadening problem-solving to build fairer systems.
Remote and hybrid teams are leaning heavily on documentation, automation, and tool integration to maintain regression testing quality, suggesting a shift from informal co-located practices to more formalized, asynchronous workflows.
Graph-based code representations, largely unexplored in automated patch correctness assessment, crush sequence- and heuristic-based methods, achieving 82.6% accuracy in predicting patch correctness.
Table reasoning gets a reliability boost: TableMind++ uses uncertainty estimates to prune flawed plans and refine actions, outperforming prior models by synthesizing robust reasoning paths.
LLMs struggle with code migration when APIs evolve, but KCoEvo's knowledge graph augmentation boosts migration accuracy and execution success.
LLMs can now optimize CUDA kernels across diverse scientific computing and LLM workloads, rivaling hand-tuned libraries like cuBLAS.
Forget fancy recursion: uncertainty-aware self-reflection alone can boost long-context language model performance by up to 22%, even surpassing Recursive Language Models (RLM).
Forget external debuggers: ReflexiCoder teaches LLMs to self-reflect and self-correct code, rivaling GPT-5.1 in performance while slashing inference costs by 40%.
LLMs can now tap into the full power of R's statistical methods: a new retrieval method boosts package retrieval accuracy by 17% by understanding data distributions, not just function names.
Stop relying on opaque spreadsheet magic: this tool provides a reproducible, auditable pipeline for turning raw academic data into interpretable cost-per-student reports.
LLM agents can now evolve better tool-use policies without gradients, thanks to a blame-aware mutation and diversity-aware selection process that pinpoints and fixes errors in individual modules.
LLMs struggle with niche DSLs like OCL and Alloy compared to Python, but surprisingly, simple techniques like code repair can significantly boost their performance.
Ditch the ECU-by-ECU grind: this ViL framework lets you test full autonomous driving stacks on a central car server by syncing a real vehicle with its digital twin.
Forget static benchmarks: ARC-TGI offers a dynamic, human-validated approach to generating ARC-AGI tasks, enabling scalable dataset sampling and controlled benchmarking.
A terminal-native coding agent, OPENDEV, achieves robust autonomous software engineering by enforcing explicit reasoning phases and prioritizing context efficiency, offering a blueprint for secure and extensible AI assistance.
Stop building software model datasets in the dark: a new benchmarking framework brings rigor and comparability to MDE dataset evaluation.
LLMs can now generate chip layouts from natural language descriptions, achieving significant performance improvements over traditional designs.
AI agents can already exploit real-world smart contract vulnerabilities end-to-end, raising critical security concerns for blockchain applications.
Forget passive AI use: this framework shows how students can actively design AI systems by orchestrating domain knowledge, design principles, and AI architecture, leading to enhanced AI literacy and metacognition.
Claude 3 beats GPT-4 in generating high-quality BDD scenarios as judged by humans, even though GPT-4 scores higher on traditional text similarity metrics.
Automating software repository build and testing across languages and platforms is now possible, unlocking scalable benchmarking and training for coding agents.
MOOSEnger achieves a 93% success rate in generating runnable multiphysics simulation inputs from natural language, while LLMs alone fail 92% of the time.
Achieve significantly better code generation and mathematical problem solving from diffusion language models with a simple, training-free sampling tweak that encourages diversity.
Stop sifting through vague user complaints: LikeThis! uses GenAI to transform them into actionable UI improvement suggestions, complete with visual alternatives.
LLMs alone can't reliably build WebGIS tools; externalized governance using knowledge graphs and structured architectures is key to overcoming context constraints, stochasticity, and other limitations.
LLMs can now generate Innovus Tcl scripts for physical design with higher accuracy, thanks to a new domain-adapted model and benchmark that tackles the data scarcity problem.
Craft pixel-perfect Minecraft skins from just a character concept with BLOCK, an open-source pipeline that leverages MLLMs and progressive LoRA fine-tuning.