Cargando…
A Novel Algorithm for Rare Disease Gene Prediction Based on Phenotypic Similarity
Genetic studies have yielded only a limited number of genes clearly implicated in endocrine disorders, in large part due to two current knowledge gaps. First, genome wide association studies (GWAS) of common diseases have yielded many associations that are hard to translate to causal genes and pathw...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8090663/ http://dx.doi.org/10.1210/jendso/bvab048.1017 |
_version_ | 1783687337590390784 |
---|---|
author | Fan, Yibo Flannick, Jason |
author_facet | Fan, Yibo Flannick, Jason |
author_sort | Fan, Yibo |
collection | PubMed |
description | Genetic studies have yielded only a limited number of genes clearly implicated in endocrine disorders, in large part due to two current knowledge gaps. First, genome wide association studies (GWAS) of common diseases have yielded many associations that are hard to translate to causal genes and pathways. Second, whole exome sequencing (WES) studies have transformed diagnosis of rare diseases but often yield many variants of unknown significance that cannot yet be reliably prioritized for disease causality. We hypothesized that phenotypically similar diseases are more likely to share causal genes and pathways. Thus, genes implicated in a (rare or common) disease should be strong candidates to also contribute to a phenotypically similar disease. To test this hypothesis, we aggregated genes (a) for 3,209 rare diseases from OMIM and (b) nearby GWAS signals for 2,316 common diseases from the NHGRI/EBI GWAS catalog. We measured phenotypic similarity based on proximity in the Experimental Factor Ontology (EFO). Across ~2.7M common disease pairs, the number of genes shared increased with phenotypic similarity (Spearman p < 0.1). Similarly, across ~7.4M common and rare disease pairs and ~5.1M rare disease pairs, phenotypic similarity was significantly higher for disease pairs with at least one shared gene compared to those with no shared genes (T-test p < 0.05). We next developed an algorithm to predict genes for a rare disease based on its phenotypic similarity to other diseases and their known genes. Given a rare disease, the algorithm (a) identifies nearby diseases in the EFO; (b) collates their known genes and groups them into gene ontology (GO) terms; and (c) predicts the genes that occur in the most frequently observed GO term as potentially novel disease genes. We evaluated algorithm performance via cross-validation on rare diseases in OMIM. Across 140 rare endocrine diseases, the algorithm predicted on average 4.84 candidate genes with the correct (known but hidden by cross-validation) disease gene within the candidates 23.6% of the time; performance (5.11 candidates, 13.1% success rate) was similar for the other 3,069 rare diseases in OMIM. Examples include Leprechaunism (known gene INSR), for which genes INSR and TWIST2 were predicted based on phenotypic similarity to diseases Barber-Say syndrome, Rabson-Mendenhall syndrome and Gingival fibromatosis-hypertrichosis syndrome. Lubinsky syndrome (no known genes), for which genes ABCD1, LMNA, CNBP were predicted based on phenotypic similarity to diseases Ricker syndrome, X-ALD, DM1, Malouf syndrome, and Noonan syndrome. These data suggest that known phenotypic relationships and disease-gene databases can increase our ability to predict novel genes for less well-studied diseases, potentially speeding the biological translation of GWAS associations for common diseases and increasing the diagnostic yield of WES for rare diseases. |
format | Online Article Text |
id | pubmed-8090663 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-80906632021-05-12 A Novel Algorithm for Rare Disease Gene Prediction Based on Phenotypic Similarity Fan, Yibo Flannick, Jason J Endocr Soc Genetics and Development (including Gene Regulation) Genetic studies have yielded only a limited number of genes clearly implicated in endocrine disorders, in large part due to two current knowledge gaps. First, genome wide association studies (GWAS) of common diseases have yielded many associations that are hard to translate to causal genes and pathways. Second, whole exome sequencing (WES) studies have transformed diagnosis of rare diseases but often yield many variants of unknown significance that cannot yet be reliably prioritized for disease causality. We hypothesized that phenotypically similar diseases are more likely to share causal genes and pathways. Thus, genes implicated in a (rare or common) disease should be strong candidates to also contribute to a phenotypically similar disease. To test this hypothesis, we aggregated genes (a) for 3,209 rare diseases from OMIM and (b) nearby GWAS signals for 2,316 common diseases from the NHGRI/EBI GWAS catalog. We measured phenotypic similarity based on proximity in the Experimental Factor Ontology (EFO). Across ~2.7M common disease pairs, the number of genes shared increased with phenotypic similarity (Spearman p < 0.1). Similarly, across ~7.4M common and rare disease pairs and ~5.1M rare disease pairs, phenotypic similarity was significantly higher for disease pairs with at least one shared gene compared to those with no shared genes (T-test p < 0.05). We next developed an algorithm to predict genes for a rare disease based on its phenotypic similarity to other diseases and their known genes. Given a rare disease, the algorithm (a) identifies nearby diseases in the EFO; (b) collates their known genes and groups them into gene ontology (GO) terms; and (c) predicts the genes that occur in the most frequently observed GO term as potentially novel disease genes. We evaluated algorithm performance via cross-validation on rare diseases in OMIM. Across 140 rare endocrine diseases, the algorithm predicted on average 4.84 candidate genes with the correct (known but hidden by cross-validation) disease gene within the candidates 23.6% of the time; performance (5.11 candidates, 13.1% success rate) was similar for the other 3,069 rare diseases in OMIM. Examples include Leprechaunism (known gene INSR), for which genes INSR and TWIST2 were predicted based on phenotypic similarity to diseases Barber-Say syndrome, Rabson-Mendenhall syndrome and Gingival fibromatosis-hypertrichosis syndrome. Lubinsky syndrome (no known genes), for which genes ABCD1, LMNA, CNBP were predicted based on phenotypic similarity to diseases Ricker syndrome, X-ALD, DM1, Malouf syndrome, and Noonan syndrome. These data suggest that known phenotypic relationships and disease-gene databases can increase our ability to predict novel genes for less well-studied diseases, potentially speeding the biological translation of GWAS associations for common diseases and increasing the diagnostic yield of WES for rare diseases. Oxford University Press 2021-05-03 /pmc/articles/PMC8090663/ http://dx.doi.org/10.1210/jendso/bvab048.1017 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the Endocrine Society. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) ), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Genetics and Development (including Gene Regulation) Fan, Yibo Flannick, Jason A Novel Algorithm for Rare Disease Gene Prediction Based on Phenotypic Similarity |
title | A Novel Algorithm for Rare Disease Gene Prediction Based on Phenotypic Similarity |
title_full | A Novel Algorithm for Rare Disease Gene Prediction Based on Phenotypic Similarity |
title_fullStr | A Novel Algorithm for Rare Disease Gene Prediction Based on Phenotypic Similarity |
title_full_unstemmed | A Novel Algorithm for Rare Disease Gene Prediction Based on Phenotypic Similarity |
title_short | A Novel Algorithm for Rare Disease Gene Prediction Based on Phenotypic Similarity |
title_sort | novel algorithm for rare disease gene prediction based on phenotypic similarity |
topic | Genetics and Development (including Gene Regulation) |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8090663/ http://dx.doi.org/10.1210/jendso/bvab048.1017 |
work_keys_str_mv | AT fanyibo anovelalgorithmforrarediseasegenepredictionbasedonphenotypicsimilarity AT flannickjason anovelalgorithmforrarediseasegenepredictionbasedonphenotypicsimilarity AT fanyibo novelalgorithmforrarediseasegenepredictionbasedonphenotypicsimilarity AT flannickjason novelalgorithmforrarediseasegenepredictionbasedonphenotypicsimilarity |