Search papers, labs, and topics across Lattice.
3
0
3
SAM's implicit bias in deep linear networks flips the script on feature learning, prioritizing minor data coordinates early in training before amplifying major ones, a behavior unseen in gradient descent.
SignSGD can outperform SGD in linear regression when noise dominates, thanks to a unique "noise-reshaping" effect that steepens its compute-optimal scaling law.
Statistically efficient online RLHF is now possible in high-dimensional settings, thanks to a novel analysis leveraging strong convexity and skew-symmetry in generalized bilinear preference models.