Cargando…

Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus

BACKGROUND: Approximately 4–8% of the world suffers from a rare disease. Rare diseases are often difficult to diagnose, and many do not have approved therapies. Genetic sequencing has the potential to shorten the current diagnostic process, increase mechanistic understanding, and facilitate research...

Descripción completa

Detalles Bibliográficos
Autores principales: Lyons, Erica L., Watson, Daniel, Alodadi, Mohammad S., Haugabook, Sharie J., Tawa, Gregory J., Hannah-Shmouni, Fady, Porter, Forbes D., Collins, Jack R., Ottinger, Elizabeth A., Mudunuri, Uma S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10433598/
https://www.ncbi.nlm.nih.gov/pubmed/37587458
http://dx.doi.org/10.1186/s12864-023-09561-5
_version_ 1785091684605886464
author Lyons, Erica L.
Watson, Daniel
Alodadi, Mohammad S.
Haugabook, Sharie J.
Tawa, Gregory J.
Hannah-Shmouni, Fady
Porter, Forbes D.
Collins, Jack R.
Ottinger, Elizabeth A.
Mudunuri, Uma S.
author_facet Lyons, Erica L.
Watson, Daniel
Alodadi, Mohammad S.
Haugabook, Sharie J.
Tawa, Gregory J.
Hannah-Shmouni, Fady
Porter, Forbes D.
Collins, Jack R.
Ottinger, Elizabeth A.
Mudunuri, Uma S.
author_sort Lyons, Erica L.
collection PubMed
description BACKGROUND: Approximately 4–8% of the world suffers from a rare disease. Rare diseases are often difficult to diagnose, and many do not have approved therapies. Genetic sequencing has the potential to shorten the current diagnostic process, increase mechanistic understanding, and facilitate research on therapeutic approaches but is limited by the difficulty of novel variant pathogenicity interpretation and the communication of known causative variants. It is unknown how many published rare disease variants are currently accessible in the public domain. RESULTS: This study investigated the translation of knowledge of variants reported in published manuscripts to publicly accessible variant databases. Variants, symptoms, biochemical assay results, and protein function from literature on the SLC6A8 gene associated with X-linked Creatine Transporter Deficiency (CTD) were curated and reported as a highly annotated dataset of variants with clinical context and functional details. Variants were harmonized, their availability in existing variant databases was analyzed and pathogenicity assignments were compared with impact algorithm predictions. 24% of the pathogenic variants found in PubMed articles were not captured in any database used in this analysis while only 65% of the published variants received an accurate pathogenicity prediction from at least one impact prediction algorithm. CONCLUSIONS: Despite being published in the literature, pathogenicity data on patient variants may remain inaccessible for genetic diagnosis, therapeutic target identification, mechanistic understanding, or hypothesis generation. Clinical and functional details presented in the literature are important to make pathogenicity assessments. Impact predictions remain imperfect but are improving, especially for single nucleotide exonic variants, however such predictions are less accurate or unavailable for intronic and multi-nucleotide variants. Developing text mining workflows that use natural language processing for identifying diseases, genes and variants, along with impact prediction algorithms and integrating with details on clinical phenotypes and functional assessments might be a promising approach to scale literature mining of variants and assigning correct pathogenicity. The curated variants list created by this effort includes context details to improve any such efforts on variant curation for rare diseases. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-023-09561-5.
format Online
Article
Text
id pubmed-10433598
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-104335982023-08-18 Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus Lyons, Erica L. Watson, Daniel Alodadi, Mohammad S. Haugabook, Sharie J. Tawa, Gregory J. Hannah-Shmouni, Fady Porter, Forbes D. Collins, Jack R. Ottinger, Elizabeth A. Mudunuri, Uma S. BMC Genomics Research BACKGROUND: Approximately 4–8% of the world suffers from a rare disease. Rare diseases are often difficult to diagnose, and many do not have approved therapies. Genetic sequencing has the potential to shorten the current diagnostic process, increase mechanistic understanding, and facilitate research on therapeutic approaches but is limited by the difficulty of novel variant pathogenicity interpretation and the communication of known causative variants. It is unknown how many published rare disease variants are currently accessible in the public domain. RESULTS: This study investigated the translation of knowledge of variants reported in published manuscripts to publicly accessible variant databases. Variants, symptoms, biochemical assay results, and protein function from literature on the SLC6A8 gene associated with X-linked Creatine Transporter Deficiency (CTD) were curated and reported as a highly annotated dataset of variants with clinical context and functional details. Variants were harmonized, their availability in existing variant databases was analyzed and pathogenicity assignments were compared with impact algorithm predictions. 24% of the pathogenic variants found in PubMed articles were not captured in any database used in this analysis while only 65% of the published variants received an accurate pathogenicity prediction from at least one impact prediction algorithm. CONCLUSIONS: Despite being published in the literature, pathogenicity data on patient variants may remain inaccessible for genetic diagnosis, therapeutic target identification, mechanistic understanding, or hypothesis generation. Clinical and functional details presented in the literature are important to make pathogenicity assessments. Impact predictions remain imperfect but are improving, especially for single nucleotide exonic variants, however such predictions are less accurate or unavailable for intronic and multi-nucleotide variants. Developing text mining workflows that use natural language processing for identifying diseases, genes and variants, along with impact prediction algorithms and integrating with details on clinical phenotypes and functional assessments might be a promising approach to scale literature mining of variants and assigning correct pathogenicity. The curated variants list created by this effort includes context details to improve any such efforts on variant curation for rare diseases. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-023-09561-5. BioMed Central 2023-08-16 /pmc/articles/PMC10433598/ /pubmed/37587458 http://dx.doi.org/10.1186/s12864-023-09561-5 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Lyons, Erica L.
Watson, Daniel
Alodadi, Mohammad S.
Haugabook, Sharie J.
Tawa, Gregory J.
Hannah-Shmouni, Fady
Porter, Forbes D.
Collins, Jack R.
Ottinger, Elizabeth A.
Mudunuri, Uma S.
Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus
title Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus
title_full Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus
title_fullStr Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus
title_full_unstemmed Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus
title_short Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus
title_sort rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10433598/
https://www.ncbi.nlm.nih.gov/pubmed/37587458
http://dx.doi.org/10.1186/s12864-023-09561-5
work_keys_str_mv AT lyonserical rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus
AT watsondaniel rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus
AT alodadimohammads rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus
AT haugabookshariej rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus
AT tawagregoryj rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus
AT hannahshmounifady rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus
AT porterforbesd rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus
AT collinsjackr rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus
AT ottingerelizabetha rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus
AT mudunuriumas rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus