Search papers, labs, and topics across Lattice.
DisNet Lab
6
1
6
Achieve near-instantaneous LLM pipeline parallelism reconfiguration – going from seconds of downtime to under 10ms – by borrowing techniques from live virtual machine migration.
Resource allocation is the unsung hero of multi-model LLM routing: get it wrong, and you could be leaving up to 87% of your output quality on the table.
Achieve up to 50% energy savings and 80% latency reduction in edge-based object detection by intelligently balancing load across heterogeneous devices, even with a minor accuracy trade-off.
Achieve near-instant (<50ms) service downtime when dynamically reconfiguring LLM inference pipelines across heterogeneous GPUs in serverless environments.
Circuit cutting introduces substantial end-to-end overheads in quantum neural network training, with reconstruction dominating per-query time, but surprisingly, test accuracy and robustness can be preserved or even improved.
Current service orchestration solutions fall short of achieving autonomous, resilient, and scalable performance in the Computing Continuum, highlighting the urgent need for standardized evaluation environments.