Search papers, labs, and topics across Lattice.
This paper addresses the challenge of predicting disease outcomes in autosomal recessive polycystic kidney disease (ARPKD) by leveraging machine learning and protein structure prediction. The authors created a database of patient variant combinations and disease outcomes from literature and used AlphaFold to predict the 3D structure of the PKHD1 protein, Fibrocystin (FPC). By comparing the 3D structure to known protein structures, the study identifies potential functional activities of FPC and highlights structural similarities to proteins involved in cell migration and organization.
AlphaFold reveals structural similarities between the ARPKD-associated protein Fibrocystin and proteins involved in cell migration and organization, hinting at previously unknown functional activities.
Autosomal recessive polycystic kidney disease (ARPKD) is a rare inherited disease that affects 1:20,000 children globally. The disease is characterised by progressive cystic kidney disease and liver fibrosis, with variable presentation even between related individuals. ARPKD is most commonly caused by mutations in polycystic kidney and hepatic disease 1 (PKHD1), with only a few familial cases being linked to other genes. Although most cases of ARPKD will feature mutations in PKHD1, there is considerable clinical variability in the presentation of the disease. Many patients will present with severe kidney and liver disease. Some patients will present with severe disease in one of the two organs or mild disease in both. Efforts to address relationships between genotype and phenotype have so far only highlighted a relationship between patients with dual stop gain variants and severe presentations of ARPKD, linked to perinatal/neonatal death. A complication of linking mutations to disease severity in ARPKD is that, outside of a few founder and hotspot mutations, most families have a unique combination of PKHD1 mutations. To elucidate relationships between genotype and phenotype, this project has sought to use machine learning to highlight relationships between variant position and disease presentation. It aims to predict disease outcomes in ARPKD patients, allowing for the prioritisation of high-risk candidates for limited resources, such as organ transplants. A database of patient variant combinations (genotype) and disease outcomes (phenotype) was created from existing literature on PKHD1 mutations using available resources and data from journals published in the period 2003 – 2022. Machine learning was applied to the disease outcomes to identify relationships between genotype and phenotype. Data from published resources (2010 – 2020) and the AregPKD database was used as test data. The program AlphaFold, developed by Google's DeepMind, was used to estimate the 3D structure of the PKHD1 protein Fibrocystin (FPC) using the UseGalaxy.eu servers. Domain predictions were performed using the FoldSeek webserver. Variant combinations from the genotype—phenotype database was compared to the 3D structure to highlight relationships between protein structure and disease outcomes. AlphaFold predicted the complete 4,074 amino acid protein Fibrocystin. Regions linked to the homology predicted structures (IPT, PA14, PBH and G8 domains) with high (pLDDT >70) confidence. The region encompassing the most prolific ARPKD mutation (T36M) and the intracellular tail were predicted with low confidence (pLDDT <50). Comparing the 3D structure to known protein structures suggests similarities to human Fibrillin-2 (E-value = 0.00e+0), Laminin subunit alpha-1 (E = 0.00e+0) and Nesprin-2 (E = 3.72e−76), two of which are proteins involved in either migration, cell organisation or the actin cytoskeleton. Other structural similarity involved the human TMEM2 ectodomains (E-value = 2.97e−21), calmodulin-binding transcription activator 1 (E = 2.79e−2) and mouse Plexin A1 extracellular domains (E = 5.27e−9). AlphaFold has been useful in predicting the tertiary structure of FPC confirming the structural domains previously highlighted by homologous modelling. Additionally, AlphaFold has been useful in highlighting potential functional activities of FPC by comparing it to the tertiary structure of other known proteins. Future work will involve mapping previously reported disease-causing variants in PKHD1 to highlight a relationship between the tertiary structure of FPC and disease outcomes. Additionally, a categorical model will be trained on previously reported mutation combinations to further define genotype–phenotype relationships.