Search papers, labs, and topics across Lattice.
1
0
3
8
xLSTM models can now effectively learn from large attention-based models, even outperforming their teachers on some tasks through a novel distillation and merging pipeline.