Cargando…

The consequences of data dispersion in genomics: a comparative analysis of data sources for precision medicine

BACKGROUND: Genomics-based clinical diagnosis has emerged as a novel medical approach to improve diagnosis and treatment. However, advances in sequencing techniques have increased the generation of genomics data dramatically. This has led to several data management problems, one of which is data dis...

Descripción completa

Detalles Bibliográficos
Autores principales: Costa, Mireia, García S., Alberto, Pastor, Oscar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10636939/
https://www.ncbi.nlm.nih.gov/pubmed/37946154
http://dx.doi.org/10.1186/s12911-023-02342-w
_version_ 1785146473803939840
author Costa, Mireia
García S., Alberto
Pastor, Oscar
author_facet Costa, Mireia
García S., Alberto
Pastor, Oscar
author_sort Costa, Mireia
collection PubMed
description BACKGROUND: Genomics-based clinical diagnosis has emerged as a novel medical approach to improve diagnosis and treatment. However, advances in sequencing techniques have increased the generation of genomics data dramatically. This has led to several data management problems, one of which is data dispersion (i.e., genomics data is scattered across hundreds of data repositories). In this context, geneticists try to remediate the above-mentioned problem by limiting the scope of their work to a single data source they know and trust. This work has studied the consequences of focusing on a single data source rather than considering the many different existing genomics data sources. METHODS: The analysis is based on the data associated with two groups of disorders (i.e., oncology and cardiology) accessible from six well-known genomic data sources (i.e., ClinVar, Ensembl, GWAS Catalog, LOVD, CIViC, and CardioDB). Two dimensions have been considered in this analysis, namely, completeness and concordance. Completeness has been evaluated at two levels. First, by analyzing the information provided by each data source with regard to a conceptual schema data model (i.e., the schema level). Second, by analyzing the DNA variations provided by each data source as related to any of the disorders selected (i.e., the data level). Concordance has been evaluated by comparing the consensus among the data sources regarding the clinical relevance of each variation and disorder. RESULTS: The data sources with the highest completeness at the schema level are ClinVar, Ensembl, and CIViC. ClinVar has the highest completeness at the data level data source for the oncology and cardiology disorders. However, there are clinically relevant variations that are exclusive to other data sources, and they must be considered in order to provide the best clinical diagnosis. Although the information available in the data sources is predominantly concordant, discordance among the analyzed data exist. This can lead to inaccurate diagnoses. CONCLUSION: Precision medicine analyses using a single genomics data source leads to incomplete results. Also, there are concordance problems that threaten the correctness of the genomics-based diagnosis results.
format Online
Article
Text
id pubmed-10636939
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-106369392023-11-15 The consequences of data dispersion in genomics: a comparative analysis of data sources for precision medicine Costa, Mireia García S., Alberto Pastor, Oscar BMC Med Inform Decis Mak Research BACKGROUND: Genomics-based clinical diagnosis has emerged as a novel medical approach to improve diagnosis and treatment. However, advances in sequencing techniques have increased the generation of genomics data dramatically. This has led to several data management problems, one of which is data dispersion (i.e., genomics data is scattered across hundreds of data repositories). In this context, geneticists try to remediate the above-mentioned problem by limiting the scope of their work to a single data source they know and trust. This work has studied the consequences of focusing on a single data source rather than considering the many different existing genomics data sources. METHODS: The analysis is based on the data associated with two groups of disorders (i.e., oncology and cardiology) accessible from six well-known genomic data sources (i.e., ClinVar, Ensembl, GWAS Catalog, LOVD, CIViC, and CardioDB). Two dimensions have been considered in this analysis, namely, completeness and concordance. Completeness has been evaluated at two levels. First, by analyzing the information provided by each data source with regard to a conceptual schema data model (i.e., the schema level). Second, by analyzing the DNA variations provided by each data source as related to any of the disorders selected (i.e., the data level). Concordance has been evaluated by comparing the consensus among the data sources regarding the clinical relevance of each variation and disorder. RESULTS: The data sources with the highest completeness at the schema level are ClinVar, Ensembl, and CIViC. ClinVar has the highest completeness at the data level data source for the oncology and cardiology disorders. However, there are clinically relevant variations that are exclusive to other data sources, and they must be considered in order to provide the best clinical diagnosis. Although the information available in the data sources is predominantly concordant, discordance among the analyzed data exist. This can lead to inaccurate diagnoses. CONCLUSION: Precision medicine analyses using a single genomics data source leads to incomplete results. Also, there are concordance problems that threaten the correctness of the genomics-based diagnosis results. BioMed Central 2023-11-09 /pmc/articles/PMC10636939/ /pubmed/37946154 http://dx.doi.org/10.1186/s12911-023-02342-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Costa, Mireia
García S., Alberto
Pastor, Oscar
The consequences of data dispersion in genomics: a comparative analysis of data sources for precision medicine
title The consequences of data dispersion in genomics: a comparative analysis of data sources for precision medicine
title_full The consequences of data dispersion in genomics: a comparative analysis of data sources for precision medicine
title_fullStr The consequences of data dispersion in genomics: a comparative analysis of data sources for precision medicine
title_full_unstemmed The consequences of data dispersion in genomics: a comparative analysis of data sources for precision medicine
title_short The consequences of data dispersion in genomics: a comparative analysis of data sources for precision medicine
title_sort consequences of data dispersion in genomics: a comparative analysis of data sources for precision medicine
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10636939/
https://www.ncbi.nlm.nih.gov/pubmed/37946154
http://dx.doi.org/10.1186/s12911-023-02342-w
work_keys_str_mv AT costamireia theconsequencesofdatadispersioningenomicsacomparativeanalysisofdatasourcesforprecisionmedicine
AT garciasalberto theconsequencesofdatadispersioningenomicsacomparativeanalysisofdatasourcesforprecisionmedicine
AT pastoroscar theconsequencesofdatadispersioningenomicsacomparativeanalysisofdatasourcesforprecisionmedicine
AT costamireia consequencesofdatadispersioningenomicsacomparativeanalysisofdatasourcesforprecisionmedicine
AT garciasalberto consequencesofdatadispersioningenomicsacomparativeanalysisofdatasourcesforprecisionmedicine
AT pastoroscar consequencesofdatadispersioningenomicsacomparativeanalysisofdatasourcesforprecisionmedicine