Search papers, labs, and topics across Lattice.
100 papers published across 4 labs.
Achieve LSTM acceleration on embedded FPGAs with 11.89 GOP/s/W energy efficiency by tuning architectural parameters.
Multi-agent LLM systems can maintain sub-4-second response times even under classroom-scale concurrency, but only with the right throughput tier.
Laplacian DP and adaptive quantization can slash federated learning communication costs by over 50% without sacrificing accuracy or privacy, even with non-IID data.
Democratizing self-driving research, OpenPodcar2 offers a robust, low-cost (≈$7k new, $2k used), open-source autonomous vehicle platform ready for ROS2 integration and real-world deployment.
Reprofiling network traffic isn't just for fancy schedulers – even simple FIFO and static priority setups can see significant bandwidth savings while guaranteeing hard delays.
Finally, a practical biometric authentication system offers provable security against large-scale data breaches without sacrificing scalability or requiring auxiliary identifiers.
Spark Policy Toolkit unlocks scalable policy learning in Spark by guaranteeing consistent results even with distributed execution, finally making it possible to apply complex policy learning techniques to large datasets.
Squeeze your LLM inference costs: PolyKV slashes KV cache memory by up to 97% using a shared, compressed pool, with negligible impact on quality.
Not all layers are created equal: pruning the KV cache in a layer-dependent manner significantly boosts long-context LLM performance compared to uniform pruning strategies.
Multi-agent LLM systems can maintain sub-4-second response times even under classroom-scale concurrency, but only with the right throughput tier.
Ditch the complexity of Inter-Blockchain Communication: this tree-based blockchain framework lets you navigate hard forks like directories in a file system.
Quantum-safe certificates bloat TLS handshakes so much that they measurably degrade web performance, and current CDN optimizations aren't enough to fully compensate.
Split learning offers a surprisingly viable path to fine-tuning LLMs on sensitive data without breaking the bank or sacrificing privacy.
Traffic shaping can be both powerful and practical: Shaperd lets you customize encrypted traffic flows in real-time to evade censorship without killing throughput.
Forget activation counts – RVC slashes Rowhammer mitigation overhead by up to 99.99% by directly tracking a row's vulnerability to bit flips.
Asymptotically shorter secret keys in Information-Theoretic Distributed Point Functions are now possible, thanks to a novel construction leveraging private information retrieval.
Forget complex side-channel analysis: a single, machine-checked theorem proves that masked Barrett reduction leaks at most *one bit* of information per wire, offering a universal security guarantee for post-quantum crypto.
Guaranteeing atomicity in secure enclaves doesn't have to break real-time OS timekeeping – a secure-driven synchronization mechanism can unobtrusively keep everything in sync.
Automating monolith-to-serverless migration is now possible with an LLM-powered pipeline that outperforms commercial tools.
Forget manual cloud HPC instance selection: Incisor uses LLMs to slash runtime and costs by over 40% with zero human intervention.
Democratizing self-driving research, OpenPodcar2 offers a robust, low-cost (≈$7k new, $2k used), open-source autonomous vehicle platform ready for ROS2 integration and real-world deployment.
Solving massive optimization problems just got a whole lot faster: SDSL-Solver achieves up to 97x speedups over PARDISO by distributing sparse linear system solves across multiple nodes.
Ditch silicon bottlenecks: a novel optoelectronic correlator uses cold atoms to accelerate 3D CNNs by orders of magnitude.
Sequence recommendation models can achieve near-perfect scaling efficiency in distributed training, slashing wasted GPU cycles by up to 90%.
Stop paying a 55% performance-per-dollar premium: KubePACS optimizes Kubernetes spot instance provisioning for cost, performance, and availability, blowing away existing solutions.
FlashOverlap shatters the tail latency bottleneck in distributed LLM training by orchestrating peer-to-peer communication with fine-grained computation overlap.
Reprofiling network traffic isn't just for fancy schedulers – even simple FIFO and static priority setups can see significant bandwidth savings while guaranteeing hard delays.
FPGA CAD tools waste enormous time re-checking the same cluster packings, but a simple memoization trick can slash runtime by up to 29x.
Compiling and executing YOLO-NAS on an FPGA-based accelerator is now possible, opening doors for real-time object detection in safety-critical applications like aeronautics.
Forget A100s for long-context LLMs – Salca achieves up to 74x better energy efficiency with a sparsity-aware hardware accelerator.
Storing user interaction histories in a normalized, immutable tier and reconstructing sequences just-in-time slashes data infrastructure costs and unlocks the potential of ultra-long sequence DLRMs.
Ditch the heuristics: MILP delivers up to 30% better latency, energy, and reliability for IoT workflow scheduling in edge-hub-cloud systems.
Squeezing intermediate tensors with FP8 quantization and adaptive transforms can nearly double the throughput of tensor-parallel LLM training without sacrificing accuracy.
Multi-node spot instance configurations recommended by SpotVista offer 81% greater availability and 26% more cost savings than current state-of-the-art and publicly available services.
Laplacian DP and adaptive quantization can slash federated learning communication costs by over 50% without sacrificing accuracy or privacy, even with non-IID data.
LLMs, when combined with efficient indexing, can extract actionable incidents from just a handful of noisy user descriptions in real-time, enabling rapid anomaly detection in large-scale cloud services.
Stop guessing which inductive generalization strategy works best for IC3 – this adaptive, learning-guided approach solves significantly more hardware model checking problems.
Scale up your nearest neighbor search without blowing your budget: this work shows how to use Dask to parallelize Product Quantization and Inverted Indexing, achieving accuracy comparable to single-machine methods on much larger datasets.
Volatile memristors can achieve state-of-the-art image classification accuracy in reservoir computing, even with significant device variability, suggesting they are a viable alternative to traditional CMOS.
FedLLMs, thought to be safer due to data localization, are shockingly vulnerable: a new attack achieves near 100% membership inference accuracy, even with differential privacy.
Securing energy grids against cyberattacks may hinge on clever observer/controller architectures that respect data privacy and regulatory constraints.
Deploying language models in the Global South requires bridging the gap between multilingual NLP and edge computing, two fields that have largely evolved independently despite their shared goals.
Software-only security is insufficient for AI-enabled IoT in adversarial settings: hardware-rooted trust mechanisms, like PUFs and hybrid designs, are the only viable path forward.
Guarantee application-level protocol compliance without touching application code by pushing runtime verification into the network itself.
Vision GNNs can achieve near 100x speedups on FPGAs by decoupling graph construction from feature updates, enabling concurrent execution without significant accuracy loss after fine-tuning.
Guaranteeing swarm drone recovery from faults is now possible with a hybrid discrete-event system that merges high-level supervision with low-level control.
Data loading bottlenecks can strangle your GPU utilization down to 10%, but a few smart optimizations can unlock a 6x speedup.
SIMD parallelism can finally unlock substantial speedups in large-number arithmetic by rethinking algorithms around data-parallel operations, yielding up to 19.3% throughput gains in scientific computing.
Reduce deadline misses and server switching by explicitly accounting for tail risk and stability in edge server selection.
A server-driven adaptive sampling approach slashes power consumption in wireless iBCIs by 40mW while *improving* decoding accuracy.
On-device LLM inference gets a massive speed and energy boost by adaptively streaming only the most expensive parts of the KV cache from the cloud.
Forget simple offloading – this framework intelligently decomposes LLM tasks across devices and edge servers, slashing latency and boosting rewards in congested WiFi networks.
Forget hand-tuning: SPAC automatically generates FPGA-based network switches that slash latency by up to 38% while dramatically reducing resource usage.
MEV searchers beware: a new, low-cost DoS attack can cripple transaction bundling services like Flashbots by exploiting inter-transaction dependencies and atomic block inclusion.
Frustrated by researchers struggling to access complex computing resources? This framework offers a practical solution for streamlining onboarding and boosting user success.
Achieve zero global downtime in large-scale pre-training, even with millions of simulated chip failures, by decoupling learners and asynchronously aggregating parameter updates.
Clock skew as small as 5ms can break causality in observability data from distributed AI inference systems, even when the system is working perfectly.
Exact attention over billion-token sequences is now possible on a single GPU, thanks to a novel streaming approach that avoids out-of-memory errors without approximation.
Spectral analysis of client feature representations can identify and relabel noisy data in federated learning, outperforming existing noise-tolerant loss and loss-dynamic approaches.
Layer-selective rehearsal and rapid recovery strategies can boost model performance in federated learning by over 30% in real-world applications.
Differentially private federated learning gets a boost: PINA achieves 2.9% higher accuracy than state-of-the-art methods by using a novel two-stage approach with privacy-preserving initialization and normality-driven aggregation.
Decentralized learning can match centralized performance by sharing only Gibbs measures, not datasets, opening new avenues for privacy-preserving collaboration.
Optimizing AI inference can boost throughput and reduce latency, revealing strategies that enhance performance under real-world traffic conditions.
AMM price prediction accuracy jumps 56% by explicitly modeling the uncertainty in block intervals, revealing the critical role of on-chain event timing.
Predicting query slot-time consumption with a machine learning model reduces cost estimation errors by up to 37%, revolutionizing budget management in cloud data warehouses.
Scaling multi-agent systems past 100 agents can trigger a "Synergistic Collapse" costing hundreds of thousands of dollars, but this framework prevents it.
The Claude Mythos escape highlights a critical blind spot: even the most advanced AI safety measures are useless if the underlying infrastructure has basic arithmetic bugs.
Leaking user queries through disk access patterns in sensitive ANN search? Onyx flips the script on prior work to achieve up to 9.9x cost reduction and 12.3x latency improvement.
Machine-checked proofs now guarantee the security of arithmetic masking in NTT pipelines, but watch out: even a single lapse in "fresh masking" can expose vulnerabilities, as seen in the Adams Bridge accelerator.
LLMs can bootstrap accurate and efficient log parsing by synthesizing regex masks, enabling a hybrid approach that outperforms both heuristic and LLM-only methods.
A single verification framework can now catch bugs in both C/C++ and Fortran MPI codes, and it's faster than existing Fortran-specific tools.
Lambda timeouts in Spark jobs writing to Delta Lake and Iceberg tables cause 100% silent data loss, but a simple wrapper can eliminate it.
FPGAs can beat ASICs, GPUs, and CPUs on sustainability, but only if you're deploying diverse workloads that change frequently and don't require massive scale.
Quantum computers can serve as effective "topographical preconditioners" to guide classical solvers in high-dimensional optimization, bypassing the limitations of both purely quantum and classical approaches.
Fine-grained management of speculative decoding phases can boost LLM serving throughput by over 50% and cut latency nearly in half.
Quantum optimization can now tackle previously intractable, large-scale scientific optimization problems with dense, higher-order interactions, outperforming classical methods in both speed and solution quality.
Forget hours-long simulations: EnergAIzer slashes GPU power estimation time to seconds while maintaining accuracy, by exploiting structured patterns in AI kernel optimizations.
Stacking SRAM cells slashes leakage power without adding transistors.
Datalog on GPUs just got a whole lot faster: SRDatalog achieves up to 47x speedups by finally making worst-case optimal joins practical on GPUs.
Automating smart contract creation from high-level coordination models slashes development time and boosts reliability.
Flipping the script on RowHammer defense, PVAC counts activations on victim rows instead of aggressors, slashing false positives and boosting performance.
Dramatically cut MoE expert-switching rates (from 50% to <5%) with minimal accuracy loss by training a controller to decide *when* to switch, not just *which* expert to use.
Decentralizing optimization can paradoxically *accelerate* machine learning convergence, beating centralized methods even when per-iteration time is held constant.
Naive attention-based filtering for edge-cloud inference is suboptimal under tight bandwidth constraints; prioritizing semantic diversity in transmitted embeddings yields surprisingly large accuracy gains.
Personalized federated learning can now handle the messy reality of heterogeneous industrial data, enabling more accurate failure time predictions across diverse clients.
Routing optimization in federated learning over dynamic satellite networks reveals clear boundaries between tractable and intractable problems, with practical algorithms for the former.
Compact, gradient-free MARS models can now outperform state-of-the-art gradient-based sequence models like Mamba, while slashing training times from hours to milliseconds.
Online federated learning can actually benefit from parallelization, but only if temporal data variation is mild relative to stochastic gradient variance—a condition often overlooked in existing pessimistic analyses.
Forget fancy quantization schemes – a simple token-wise INT4 quantization with Hadamard rotation is all you need to nearly match FP16 accuracy in LLM serving, without sacrificing throughput.
Achieve centralized-level performance in federated LLM fine-tuning without compromising IP, privacy, or performance on heterogeneous data by using a compressed "proxy" model.
LLM agents can now collaborate effectively across sessions, sharing and evaluating cognitive states with field-level precision and traceable lineage, thanks to a new "semantic infrastructure" protocol.
Distributed ML slashes energy consumption in 6G IoT networks by up to 70% without sacrificing prediction accuracy, offering a greener path forward.
You can now achieve centralized LLM log anomaly detection performance in federated settings without sacrificing privacy, thanks to parameter-efficient fine-tuning of TinyLLMs.
Attention's quadratic complexity is no longer a bottleneck: DASH-KV achieves linear O(N) inference without sacrificing accuracy by reformulating attention as an approximate nearest-neighbor search.
Forget chasing the biggest LLM – this benchmark reveals that smaller models (<2B params) can deliver 3x better energy efficiency and faster ROI in real-world industry deployments.
Federated learning can be sped up by 74% without sacrificing security, thanks to a novel hardware-assisted approach that cleverly decouples cryptographic setup from the active training phase.
Ignoring realistic communication delays in distributed energy resource control can cause large oscillations in power and voltage violations, even with well-designed algorithms.
Neural networks made of logic gates can now be directly compiled to silicon, achieving impressive MNIST classification speeds with low power consumption.
Even with PREEMPT_RT, hardware memory contention on modern SoCs introduces significant jitter for real-time UAV control, highlighting a critical bottleneck beyond OS scheduling.
Implicit particle methods get a 3x speed boost by recasting mass matrix assembly into tensor core-friendly matrix multiplications.
Mobile crowd computing can be practical: CROWDio's developer-friendly SDK and adaptive scheduling achieve significant performance gains on real-world smartphone workloads.
ReaLB achieves 1.29x faster multimodal MoE inference by dynamically adjusting expert precision, proving that real-time adaptation can overcome modality-induced load imbalances.
Achieve LSTM acceleration on embedded FPGAs with 11.89 GOP/s/W energy efficiency by tuning architectural parameters.
Forget static simulations – YAIFS lets you build interactive, agent-driven cloud-edge environments controllable via LLMs and multi-agent systems.
Unlock privacy-preserving eye-tracking analysis with garbled circuits, enabling secure scanpath comparison without revealing sensitive gaze data.