Cargando…
Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus
BACKGROUND: Approximately 4–8% of the world suffers from a rare disease. Rare diseases are often difficult to diagnose, and many do not have approved therapies. Genetic sequencing has the potential to shorten the current diagnostic process, increase mechanistic understanding, and facilitate research...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10433598/ https://www.ncbi.nlm.nih.gov/pubmed/37587458 http://dx.doi.org/10.1186/s12864-023-09561-5 |
_version_ | 1785091684605886464 |
---|---|
author | Lyons, Erica L. Watson, Daniel Alodadi, Mohammad S. Haugabook, Sharie J. Tawa, Gregory J. Hannah-Shmouni, Fady Porter, Forbes D. Collins, Jack R. Ottinger, Elizabeth A. Mudunuri, Uma S. |
author_facet | Lyons, Erica L. Watson, Daniel Alodadi, Mohammad S. Haugabook, Sharie J. Tawa, Gregory J. Hannah-Shmouni, Fady Porter, Forbes D. Collins, Jack R. Ottinger, Elizabeth A. Mudunuri, Uma S. |
author_sort | Lyons, Erica L. |
collection | PubMed |
description | BACKGROUND: Approximately 4–8% of the world suffers from a rare disease. Rare diseases are often difficult to diagnose, and many do not have approved therapies. Genetic sequencing has the potential to shorten the current diagnostic process, increase mechanistic understanding, and facilitate research on therapeutic approaches but is limited by the difficulty of novel variant pathogenicity interpretation and the communication of known causative variants. It is unknown how many published rare disease variants are currently accessible in the public domain. RESULTS: This study investigated the translation of knowledge of variants reported in published manuscripts to publicly accessible variant databases. Variants, symptoms, biochemical assay results, and protein function from literature on the SLC6A8 gene associated with X-linked Creatine Transporter Deficiency (CTD) were curated and reported as a highly annotated dataset of variants with clinical context and functional details. Variants were harmonized, their availability in existing variant databases was analyzed and pathogenicity assignments were compared with impact algorithm predictions. 24% of the pathogenic variants found in PubMed articles were not captured in any database used in this analysis while only 65% of the published variants received an accurate pathogenicity prediction from at least one impact prediction algorithm. CONCLUSIONS: Despite being published in the literature, pathogenicity data on patient variants may remain inaccessible for genetic diagnosis, therapeutic target identification, mechanistic understanding, or hypothesis generation. Clinical and functional details presented in the literature are important to make pathogenicity assessments. Impact predictions remain imperfect but are improving, especially for single nucleotide exonic variants, however such predictions are less accurate or unavailable for intronic and multi-nucleotide variants. Developing text mining workflows that use natural language processing for identifying diseases, genes and variants, along with impact prediction algorithms and integrating with details on clinical phenotypes and functional assessments might be a promising approach to scale literature mining of variants and assigning correct pathogenicity. The curated variants list created by this effort includes context details to improve any such efforts on variant curation for rare diseases. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-023-09561-5. |
format | Online Article Text |
id | pubmed-10433598 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-104335982023-08-18 Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus Lyons, Erica L. Watson, Daniel Alodadi, Mohammad S. Haugabook, Sharie J. Tawa, Gregory J. Hannah-Shmouni, Fady Porter, Forbes D. Collins, Jack R. Ottinger, Elizabeth A. Mudunuri, Uma S. BMC Genomics Research BACKGROUND: Approximately 4–8% of the world suffers from a rare disease. Rare diseases are often difficult to diagnose, and many do not have approved therapies. Genetic sequencing has the potential to shorten the current diagnostic process, increase mechanistic understanding, and facilitate research on therapeutic approaches but is limited by the difficulty of novel variant pathogenicity interpretation and the communication of known causative variants. It is unknown how many published rare disease variants are currently accessible in the public domain. RESULTS: This study investigated the translation of knowledge of variants reported in published manuscripts to publicly accessible variant databases. Variants, symptoms, biochemical assay results, and protein function from literature on the SLC6A8 gene associated with X-linked Creatine Transporter Deficiency (CTD) were curated and reported as a highly annotated dataset of variants with clinical context and functional details. Variants were harmonized, their availability in existing variant databases was analyzed and pathogenicity assignments were compared with impact algorithm predictions. 24% of the pathogenic variants found in PubMed articles were not captured in any database used in this analysis while only 65% of the published variants received an accurate pathogenicity prediction from at least one impact prediction algorithm. CONCLUSIONS: Despite being published in the literature, pathogenicity data on patient variants may remain inaccessible for genetic diagnosis, therapeutic target identification, mechanistic understanding, or hypothesis generation. Clinical and functional details presented in the literature are important to make pathogenicity assessments. Impact predictions remain imperfect but are improving, especially for single nucleotide exonic variants, however such predictions are less accurate or unavailable for intronic and multi-nucleotide variants. Developing text mining workflows that use natural language processing for identifying diseases, genes and variants, along with impact prediction algorithms and integrating with details on clinical phenotypes and functional assessments might be a promising approach to scale literature mining of variants and assigning correct pathogenicity. The curated variants list created by this effort includes context details to improve any such efforts on variant curation for rare diseases. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-023-09561-5. BioMed Central 2023-08-16 /pmc/articles/PMC10433598/ /pubmed/37587458 http://dx.doi.org/10.1186/s12864-023-09561-5 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Lyons, Erica L. Watson, Daniel Alodadi, Mohammad S. Haugabook, Sharie J. Tawa, Gregory J. Hannah-Shmouni, Fady Porter, Forbes D. Collins, Jack R. Ottinger, Elizabeth A. Mudunuri, Uma S. Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus |
title | Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus |
title_full | Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus |
title_fullStr | Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus |
title_full_unstemmed | Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus |
title_short | Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus |
title_sort | rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10433598/ https://www.ncbi.nlm.nih.gov/pubmed/37587458 http://dx.doi.org/10.1186/s12864-023-09561-5 |
work_keys_str_mv | AT lyonserical rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus AT watsondaniel rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus AT alodadimohammads rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus AT haugabookshariej rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus AT tawagregoryj rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus AT hannahshmounifady rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus AT porterforbesd rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus AT collinsjackr rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus AT ottingerelizabetha rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus AT mudunuriumas rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus |