Search papers, labs, and topics across Lattice.
5
0
9
Z-Reward achieves nearly 90% human preference accuracy by transforming subjective visual preferences into nuanced score distributions, outperforming traditional reward models.
Resource-constrained edge devices can achieve Pareto-optimal trade-offs between inference accuracy, latency, and energy consumption in federated learning by using a constrained multi-objective reinforcement learning approach.
Fine-tuning efficient few-step diffusion models no longer requires sacrificing their speed, thanks to a self-distillation approach that preserves inference capabilities.
Forget reward function dependencies – this new approach to contextual bandits with latent state dynamics achieves stronger regret bounds by directly modeling hidden state dependencies and adaptively estimating HMM parameters.
Pre-training with Dual Latent World Models unlocks significant performance gains in autonomous driving tasks by learning holistic Gaussian-centric representations.