Search papers, labs, and topics across Lattice.
57 papers published across 3 labs.
Unlock geometric algebra's performance potential in neural networks and spatial computing by compiling directly from multi-way relationships, eliminating manual specialization and ensuring geometric correctness.
Multi-party function secret sharing just got a whole lot more practical: a new DDH-based scheme slashes key sizes by up to 10x.
DPWFL privacy doesn't have to diverge: this work proves it can converge to a constant even with non-convex objectives and gradient clipping.
Cut inference verification costs by 1000x with a sampling-based cryptographic approach that catches adversarial attacks on Llama-2-7B in milliseconds.
Active probing reveals backdoors that passive defenses miss in decentralized federated learning.
DPWFL privacy doesn't have to diverge: this work proves it can converge to a constant even with non-convex objectives and gradient clipping.
Cut inference verification costs by 1000x with a sampling-based cryptographic approach that catches adversarial attacks on Llama-2-7B in milliseconds.
Active probing reveals backdoors that passive defenses miss in decentralized federated learning.
Ditch manual huge page configuration: TurboMem's lock-free design and transparent huge page auto-merging can boost packet throughput by up to 28% in DPDK.
Current methods to protect satellites from radiation drain batteries and interrupt service, but a new routing protocol can minimize both.
Securing legacy industrial protocols with modern encryption like ChaCha20-Poly1305 is far more practical than previously thought, adding single-digit percentage overhead to latency-sensitive applications.
Accurately simulate LLM inference power consumption at scale – from individual GPUs to entire datacenters – with a framework that learns from real-world traces and generalizes to unseen configurations.
Forget massive SRAMs: this work shows that clever data streaming and compute/transfer overlap can yield 22x speedups for transformer inference, even with standard PCIe interconnects.
Training multi-turn LLM agents just got easier: ProRL Agent offers a scalable, API-driven rollout service that streamlines RL training across diverse tasks.
LLMs can now write the code to solve your combinatorial optimization problems, thanks to a new GPU-accelerated framework accessible through a pure-Python API.
By jointly optimizing onboard computing and data routing, iSatCR slashes data transmission needs in LEO satellite networks, outperforming traditional routing-only approaches, especially under heavy load.
Confidential databases can be 78x faster by ditching crypto in the query path.
DRAM's vulnerability to bit flips isn't uniform; it's a complex, context-dependent landscape that attackers can exploit to predict memory contents and break security systems.
Achieve nearly 3x faster LLM inference by intelligently splitting the workload between edge devices and the cloud, without any training.
Existing operational data analytics frameworks leave significant gaps when applied to the complexities of modern, large-scale graph processing ecosystems, motivating a new holistic approach.
Even with malicious clients flipping labels, FedTrident recovers federated learning performance to near attack-free levels, outperforming existing defenses by up to 9.49% in critical metrics.
Existing hardware masking verification tools can lead to false positives when applied to HLS-generated designs, but MaskedHLSVerif avoids this by performing state-wise formal verification of controller datapath RTL.
Decentralized MPC with control barrier functions lets multi-robot quadrupeds safely navigate complex environments in real-time, achieving performance on par with centralized approaches but with significantly reduced computation.
Gradient misalignment across devices in parallel split learning can be tamed with a novel gradient alignment strategy, leading to faster convergence and higher accuracy in heterogeneous federated learning.
Secure Wi-Fi ranging, despite standardization efforts, remains riddled with vulnerabilities due to configuration pitfalls and hardware limitations, making it far from a drop-in replacement for UWB.
Julia can now hang with the big dogs: KernelForge.jl proves that portable, JIT-compiled GPU primitives can achieve vendor-level performance (matching or exceeding CUB and cuBLAS) without sacrificing generality.
Quantum chemistry's killer app isn't just about solving the unsolvable; it's about making routine calculations faster and more accessible.
Agentic AI systems are still far from maximizing hardware potential: SOL-ExecBench reveals a significant gap between current GPU kernel performance and analytically derived Speed-of-Light bounds across a wide range of AI models.
Multi-modal federated learning can be made communication-efficient and robust to outliers by learning a shared latent space, even with heterogeneous client architectures.
Federated learning can adapt to asynchronous data drift with up to 83% less retraining cost by using a Mixture-of-Experts architecture to selectively update local parameters.
LLM endpoints can appear "healthy" according to traditional metrics while undergoing subtle behavioral shifts detectable by monitoring output distributions, highlighting a critical gap in current reliability practices.
Pre-trained models unlock surprisingly aggressive quantization in federated learning, slashing communication costs by 40% without sacrificing accuracy on MNIST and CIFAR-100.
Sedna, a promising consensus protocol, is surprisingly vulnerable to cartel attacks that can stall block production and extract MEV, but a clever bounty mechanism can restore its security.
Network coding, often overlooked in robotics, can drastically improve the reliability and timeliness of multi-robot communication, outperforming traditional retransmission methods in safety-critical scenarios.
Quantum computers could finally unlock the full potential of machine learning for drug discovery by directly generating the quantum chemistry data that classical computers struggle to produce.
Federated recommendation systems can now better adapt to evolving user preferences without sacrificing privacy, thanks to a novel approach that retains historical knowledge and transfers insights between similar users.
YouTube's platform defenses are a house of cards: circumventing one control often triggers a cascade of failures, demanding constant architectural adaptation for large-scale content replication.
Ergodic control lets swarms of robots cooperatively manufacture micro-patterned surfaces, unlocking scalable production of materials with enhanced physical properties.
Forget buying new GPUs – clever context-length routing can boost your LLM inference energy efficiency by 2.5x, dwarfing the 1.7x gain from upgrading to a B200.
Automatically tracking causality across actors exposes hidden behavioral violations in real-world Erlang systems, without requiring manual code modifications.
NNVMC's promise for solving quantum many-body problems is currently bottlenecked by surprisingly mundane issues: low-intensity elementwise operations and data movement on GPUs.
Achieve up to 2.4x speedup over OpenBLAS on RISC-V by using MLIR and xDSL to generate optimized RVV code, finally unlocking the potential of RISC-V vector extensions.
Forget painstakingly tuning quantization for each LLM – RAMP learns a quantization policy that generalizes across architectures, often outperforming target-specific training.
Lossless compression can actually *speed up* LLM inference on GPUs, not just shrink model size, thanks to ZipServ's hardware-aware design.
Unlock geometric algebra's performance potential in neural networks and spatial computing by compiling directly from multi-way relationships, eliminating manual specialization and ensuring geometric correctness.
Forget centralized control: this algorithm lets swarms of robots build complex shapes with only local communication and no global positioning.
Achieve significant latency and energy savings in memory systems with an RL-based controller that also provides insights into *why* its decisions are optimal.
Ditch backprop's limitations: this synthesizable RTL implementation brings predictive coding networks to life in fully distributed hardware.
Running robotic manipulation workloads entirely onboard kills robot batteries, but offloading to the cloud tanks accuracy due to network latency, revealing a critical compute placement trade-off.
SpiderCam shatters power consumption barriers for FPGA-based 3D cameras, achieving sub-Watt operation while maintaining real-time performance.
By federating distributional critics and using a Wasserstein barycenter trust region, TR-FedDistRL avoids the dangerous "mean-smearing" that can make federated RL unsafe in critical applications.
Independent sampling of graph partitions is now a practical alternative to MCMC, offering a new path for generating diverse redistricting plans.
Secure enclave updates and migrations, previously missing from RISC-V TEEs, are now practical thanks to a novel toolkit that adds minimal overhead.
Multi-party function secret sharing just got a whole lot more practical: a new DDH-based scheme slashes key sizes by up to 10x.
Finally, a software energy profiler achieves both high accuracy and cross-platform portability, enabling practical algorithmic energy optimization across diverse languages and hardware.
Ditch the polar decomposition: MUD offers a surprisingly simple and efficient alternative for momentum whitening, speeding up transformer training by up to 50% compared to AdamW and Muon.
Even without architectural modifications, a new gradient inversion attack, ARES, can reconstruct high-fidelity training samples in federated learning, exposing a significant privacy risk.
Reproducibility in hardware reverse engineering is shockingly low, with only 4% of evaluated artifacts from 187 papers yielding reproducible results.
Federated Computing as Code lets you enforce data sovereignty in federated systems with cryptographic guarantees, moving beyond runtime policies and trust assumptions.
LLM serving systems can boost Time-To-First-Token (TTFT) attainment by up to 2.4x simply by prioritizing network flows based on a novel approximation of Least-Laxity-First scheduling.
Forget slow, single-SSD paging: Swarm unlocks 2.7x higher bandwidth for LLM KV-cache offloading by exploiting stable co-activation patterns to parallelize I/O across multiple SSDs.
ROS 2's real-time performance gets a major boost with ReDAG-RT, a user-space scheduler that cuts deadline misses by up to 30% without touching the core ROS 2 API.