Search papers, labs, and topics across Lattice.
2
0
3
Linear RNNs achieve transformer-like parallelization because they're essentially log-depth arithmetic circuits, while nonlinear RNNs are fundamentally limited by their ability to solve computationally harder problems.
Forget massive transformers: tiny hybrid models can achieve state-of-the-art zero-shot time series forecasting with 100x fewer parameters.