Search papers, labs, and topics across Lattice.
12 papers published across 2 labs.
Prompt-based jailbreak attacks aren't just effective, they're shockingly efficient, outperforming optimization-based methods by more effectively navigating the prompt space.
AI electricity demand won't necessarily explode as AI scales – whether it does or doesn't hinges on sustained efficiency improvements outpacing income-driven demand.
Row-normalized optimizers can match Muon's performance on large language models while being faster in large-token and low-loss regimes, offering a practical alternative for pre-training.
Forget parameter counts – the true memorization capacity of deep ReLU networks is fundamentally bounded by the product of squared width and squared depth, $W^2L^2$, scaling linearly with data size.
Language models often disregard provided context, choosing instead to rely on potentially outdated or conflicting information learned during pre-training, revealing a critical flaw in their knowledge integration.
Prompt-based jailbreak attacks aren't just effective, they're shockingly efficient, outperforming optimization-based methods by more effectively navigating the prompt space.
AI electricity demand won't necessarily explode as AI scales – whether it does or doesn't hinges on sustained efficiency improvements outpacing income-driven demand.
Row-normalized optimizers can match Muon's performance on large language models while being faster in large-token and low-loss regimes, offering a practical alternative for pre-training.
Forget parameter counts – the true memorization capacity of deep ReLU networks is fundamentally bounded by the product of squared width and squared depth, $W^2L^2$, scaling linearly with data size.
Language models often disregard provided context, choosing instead to rely on potentially outdated or conflicting information learned during pre-training, revealing a critical flaw in their knowledge integration.
Chasing marginal MSE/MAE improvements on leaderboards may be blinding researchers to the real goal of time series forecasting: capturing temporal structure and supporting downstream decisions.
Forget elegant compression and unifying principles: AGI might just be a vast, brittle archipelago of specialized modules, mirroring how human experts actually operate.
FineRMoE achieves 6x higher parameter efficiency, 281x lower prefill latency, and 136x higher decoding throughput compared to strong baselines, demonstrating a significant leap in MoE performance.
Protein language models finally scale predictably: Reverse Distillation unlocks consistent gains by distilling large models into nested, Matryoshka-style embeddings guided by smaller, capacity-constrained models.
Multi-task learning's generalization boost comes from implicit regularization, effectively postponing the dreaded double descent.
You can accurately predict the NDCG of a 1B-parameter reranking model by only training models up to 400M parameters, unlocking massive compute savings.
By strategically warming up residual connections layer-by-layer, ProRes unlocks faster and more stable pretraining for language models.