Search papers, labs, and topics across Lattice.
The authors used an autonomous agent to search for optimal transformer architectures for SMILES, protein sequences, and English text, finding that architecture search is counterproductive for SMILES data compared to hyperparameter tuning alone. While distinct architectures were discovered for each domain, innovations were transferable across all three with minimal performance degradation. This suggests that observed architectural differences are due to search path dependence rather than inherent domain-specific requirements.
Autonomous architecture search for molecular transformers is surprisingly fruitless: you're better off just tuning learning rates.
Deep learning models for drug-like molecules and proteins overwhelmingly reuse transformer architectures designed for natural language, yet whether molecular sequences benefit from different designs has not been systematically tested. We deploy autonomous architecture search via an agent across three sequence types (SMILES, protein, and English text as control), running 3,106 experiments on a single GPU. For SMILES, architecture search is counterproductive: tuning learning rates and schedules alone outperforms the full search (p = 0.001). For natural language, architecture changes drive 81% of improvement (p = 0.009). Proteins fall between the two. Surprisingly, although the agent discovers distinct architectures per domain (p = 0.004), every innovation transfers across all three domains with <1% degradation, indicating that the differences reflect search-path dependence rather than fundamental biological requirements. We release a decision framework and open-source toolkit for molecular modeling teams to choose between autonomous architecture search and simple hyperparameter tuning.