University of New OrleansJan 24, 2026

ESMDisPred: A Structure-Aware CNN-Transformer Architecture for Intrinsically Disordered Protein Prediction

Md. Wasi Ul Kabir, Ayon Dey, Farzeen Nafees, Md. Tamjidul Hoque

AI Summary

The authors introduce ESMDisPred, a novel deep learning architecture for predicting intrinsically disordered proteins (IDPs) that leverages ESM2 embeddings and structural information from the PDB. By integrating sequence embeddings with structural features and employing feature engineering techniques like terminal residue encoding, the model achieves state-of-the-art performance on CAID3 benchmarks. The core of ESMDisPred is a hybrid CNN-Transformer architecture designed to capture both local sequence motifs and long-range dependencies.

Key Contribution

By fusing protein language model embeddings with structural data, ESMDisPred achieves state-of-the-art accuracy in predicting intrinsically disordered proteins, a feat that could accelerate drug discovery and structural biology.

Abstract

Intrinsically disordered proteins (IDPs) lack stable three-dimensional structures, yet play vital roles in key biological processes, including signaling, transcription regulation, and molecular scaffolding. Their structural flexibility presents significant challenges for experimental characterization and contributes to diseases such as cancer and neurodegenerative disorders. Accurate computational prediction of IDPs is important for advancing research and drug discovery, structural biology, and protein engineering. In this study, we introduce ESMDisPred, a novel structure-aware disorder predictor that builds on the representational power of Evolutionary Scale Modeling-2 (ESM2) protein language models. ESMDisPred integrates sequence embeddings with structural information from the Protein Data Bank (PDB) to deliver state-of-the-art prediction accuracy. Model performance is further enhanced through feature engineering strategies, including terminal residue encoding, statistical summarization, and sliding-window analysis. To capture both local sequence motifs and long-range dependencies, we designed a hybrid CNN-Transformer architecture that balances convolutional efficiency with the representational power of self-attention. On CAID3 benchmarks, our latest model achieves ROC-AUC 0.895, AP 0.778, and a max F1 of 0.759, outperforming recent methods. Our results highlight the importance of integrating protein language model embeddings with explicit structural information for improved disorder prediction.

Architecture Design (Transformers, SSMs, MoE)Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References52

Year2026

VenuebioRxiv

Related Papers

Finding related papers...

Search

ESMDisPred: A Structure-Aware CNN-Transformer Architecture for Intrinsically Disordered Protein Prediction

Related Papers