Search papers, labs, and topics across Lattice.
DisNet Lab
3
0
3
6
Achieve near-instantaneous LLM pipeline parallelism reconfiguration – going from seconds of downtime to under 10ms – by borrowing techniques from live virtual machine migration.
Resource allocation is the unsung hero of multi-model LLM routing: get it wrong, and you could be leaving up to 87% of your output quality on the table.
Achieve up to 50% energy savings and 80% latency reduction in edge-based object detection by intelligently balancing load across heterogeneous devices, even with a minor accuracy trade-off.