Search papers, labs, and topics across Lattice.
1
0
3
2
Transformers struggle with state tracking even in-distribution, requiring far more data than RNNs as sequence length grows, and failing to generalize learned mechanisms across sequence lengths.