Search papers, labs, and topics across Lattice.
AI-driven code generation, program synthesis, automated debugging, and software engineering with LLMs.
#11 of 24
5
LLMs can achieve state-of-the-art code generation by learning to interleave reasoning steps with code generation, adaptively allocating effort where it's most needed.
Forget hand-crafted features: DistilBERT can automatically identify parallelizable loops in code with >99% accuracy, opening the door to more efficient automatic parallelization.
Pythonistas rejoice: aggregate programming, a powerful paradigm for distributed systems, finally gets a first-class, easy-to-use library in your favorite language.
LLMs can semi-autonomously solve complex, unpublished problems in mathematical physics, even discovering unique structures in integrable models.
Automated medical coding finally gets explainable: Symphony's agentic approach provides span-level evidence, linking each predicted code to the supporting text.
AI agents are far better at automating data engineering tasks than previously thought, but flawed benchmarks are obscuring their true potential.
LLMs can bootstrap their code generation abilities by focusing on problems where they show diverse solution attempts and then reinforcing solutions that exhibit behavioral consensus.
GPT-4 can automatically generate FSMs from textual requirements, but expert-guided mutation and testing are crucial for repairing imperfections.
A human-in-the-loop AI assistant can provide scalable, high-quality coding education support in resource-constrained African contexts, even with limited infrastructure.
Instructors and students are often on different planets when it comes to understanding why cheating happens in CS courses.
Forget killer robots: GenAI's impact on cybercrime is currently more "vibe coding" than world-ending, mainly assisting skilled actors in existing scams rather than unleashing a wave of autonomous cyberattacks.
LLMs aren't the only path to vulnerability detection: a GNN-based model achieves near-parity with 100x less overhead.
Uncover hidden bottlenecks in your software development pipeline: Bloomberg's BayesInsights uses Bayesian Networks to reveal causal dependencies in engineering data, helping teams pinpoint root causes and anticipate the impact of changes.
LLM agents actually perform *better* when you strip away the majority of the boilerplate in their skill descriptions, suggesting current context windows are overloaded with irrelevant information.
Run code LLMs 10x faster and with 6x less memory on your laptop: Ditto compiles them into lean, mean, local executables.
Multimodal repair isn't always better: selectively escalating to multimodal prompting based on runtime signals in Scratch yields a superior success-cost-energy tradeoff compared to uniformly applied multimodal approaches.
LLMs can now reproduce Android app bugs with 87% accuracy, thanks to pre-assessing the visual effects of UI actions.
LLM agents leapfrog traditional methods for identifying bug-introducing commits, boosting F1-score by 17 points by intelligently searching for patterns in code changes.
Stop optimizing LLM logs for human readability – runtime-guided, task-oriented logs dramatically improve downstream debugging performance.
Guaranteeing that erasing "erasable" function arguments provably preserves program behavior opens the door to more efficient and verifiable code optimization.
GPT-5 can only solve 37% of PhD-level 3D geometry coding problems, suggesting AI can't reliably automate complex scientific coding tasks yet.
Quantum circuit compilation, a major bottleneck, can be sped up by over 15x with minimal overhead using a new parallelization technique validated on 8000 large-scale, configurable random circuits.
LLMs can now automatically verify imperative code during generation, achieving state-of-the-art results on complex algorithms and opening the door to large-scale datasets of verified code.
LLMs can pinpoint semantic bugs with surprising accuracy when their reasoning is structured and grounded, outperforming traditional coverage-based methods by a significant margin.
Unlock new insights into rapid software development and collaboration with a massive dataset of over 100,000 hackathon projects.
Sparse autoencoders' failure to generalize compositionally isn't due to amortized inference, but because they learn lousy dictionaries in the first place.
Forget hand-designed RL algorithms – LLMs can evolve competitive learners from scratch, even when forced to invent completely new update rules.
Forget hand-crafted environments: COvolve uses LLMs to automatically co-evolve challenging environments and robust policies, paving the way for open-ended learning.
Gemini 3 flash can answer introductory programming questions better than typical educators, suggesting a path to scalable, personalized feedback in CS1 courses.
Stop hand-coding your LLM harnesses: Meta-Harness can automatically discover harnesses that outperform state-of-the-art systems while using fewer context tokens and generalizing across models.
Instead of forcing a single interpretation, this work embraces the inherent ambiguity of natural language to generate multiple plausible STL formulas from a single NL task description.
A task-specific, lightweight transformer can outperform state-of-the-art reasoning LLMs and commercial tools in C code vulnerability detection, at a fraction of the inference cost.
LLMs can generate better code by treating tests as noisy signals to be refined, rather than ground truth, unlocking performance gains even with smaller models.
REST API fuzzing, a critical component of modern software development, suffers from significant flakiness issues that can now be reliably detected and mitigated.
LLMs fix more bugs when you feed them *less* code, thanks to a new compression technique that distills context to the minimal, crucial snippets.
Voice control, previously insufficient for block-based programming, can now enable children with motor disabilities to effectively use Scratch, thanks to a novel multi-stage speech recognition pipeline.
AI coding assistants are racking up technical debt in real-world projects, with nearly a quarter of the code quality issues they introduce sticking around long-term.
Sentence embeddings beat prompted LLMs at extracting API semantics from documentation, achieving >82% recall and >79% precision in data-flow and alias relation inference.
LLMs can now translate C to Rust with near-perfect syntactic correctness and significantly improved semantic correctness by incorporating program structure information into the translation process.
A lightweight 6B model, when harnessed within the GEMS agent framework, leapfrogs state-of-the-art models in multimodal generation, suggesting architectural innovations in agents can compensate for raw parameter count.
LLMs can now automatically evolve and optimize GPU kernels to beat hand-tuned and proprietary models like Gemini and Claude.
Stop AI-driven malware and data leaks by embedding hidden, verifiable "canaries" in your documents that expose unauthorized LLM processing, even after adversarial attacks.
Smart contract vulnerability detection gets a 39% accuracy boost and adversarial robustness with ORACAL, a framework that uses RAG-enhanced LLMs to inject expert security context into heterogeneous graphs.
Even with perfect bug localization, repository-level program repair fails more than half the time, revealing that better context and interface design are the next big levers to pull.
Pinpointing root causes in distributed systems just got easier: Lumos automatically exposes the computational history of bugs with low overhead, even with limited bug occurrences.
Current architecture documentation frameworks leave AI-augmented systems dangerously undocumented, but RAD-AI closes the gap and boosts EU AI Act compliance from 36% to 93%.
LLMs like Copilot can outperform experienced architects in identifying risks and tradeoffs in software architecture scenarios, hinting at a future where AI significantly streamlines design evaluation.
Forget hand-coding adapters: this middleware uses LLMs to automatically bridge REST APIs, GraphQL endpoints, and IoT devices with a 90% success rate.
Stop treating software requirements as independent entities: modeling their interconnectedness via user feedback boosts prioritization performance.
LLM API calls are breaking your program analysis tools, but this new taxonomy of information flow across the NL/PL boundary offers a way to fix them.