Search papers, labs, and topics across Lattice.
5
0
9
Concave multi-objective RL suffers from a previously unaddressed gradient bias that doubles the sample complexity, but this can be fixed with multi-level Monte Carlo or, surprisingly, vanishes entirely with smooth scalarization functions.
Stop leaking your face to generative AI apps: PRIVATEEDIT lets you edit images while keeping biometric data on your device.
Even with noisy or misspecified preference feedback, LLMs can be robustly aligned online by penalizing sensitivity to oracle uncertainty.
Unlock $O(T^{1/2})$ regret for non-monotone DR-submodular maximization with just one gradient query per round by cleverly linearizing the problem.
First regret bound for partition-based submodular welfare under bandit feedback is achieved by a novel multi-agent combinatorial bandit framework.