Cargando…

Measuring quality of DNA sequence data via degradation

We formulate and apply a novel paradigm for characterization of genome data quality, which quantifies the effects of intentional degradation of quality. The rationale is that the higher the initial quality, the more fragile the genome and the greater the effects of degradation. We demonstrate that t...

Descripción completa

Detalles Bibliográficos
Autores principales: Karr, Alan F., Hauzel, Jason, Porter, Adam A., Schaefer, Marcel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9348684/
https://www.ncbi.nlm.nih.gov/pubmed/35921272
http://dx.doi.org/10.1371/journal.pone.0271970
_version_ 1784761967416705024
author Karr, Alan F.
Hauzel, Jason
Porter, Adam A.
Schaefer, Marcel
author_facet Karr, Alan F.
Hauzel, Jason
Porter, Adam A.
Schaefer, Marcel
author_sort Karr, Alan F.
collection PubMed
description We formulate and apply a novel paradigm for characterization of genome data quality, which quantifies the effects of intentional degradation of quality. The rationale is that the higher the initial quality, the more fragile the genome and the greater the effects of degradation. We demonstrate that this phenomenon is ubiquitous, and that quantified measures of degradation can be used for multiple purposes, illustrated by outlier detection. We focus on identifying outliers that may be problematic with respect to data quality, but might also be true anomalies or even attempts to subvert the database.
format Online
Article
Text
id pubmed-9348684
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-93486842022-08-04 Measuring quality of DNA sequence data via degradation Karr, Alan F. Hauzel, Jason Porter, Adam A. Schaefer, Marcel PLoS One Research Article We formulate and apply a novel paradigm for characterization of genome data quality, which quantifies the effects of intentional degradation of quality. The rationale is that the higher the initial quality, the more fragile the genome and the greater the effects of degradation. We demonstrate that this phenomenon is ubiquitous, and that quantified measures of degradation can be used for multiple purposes, illustrated by outlier detection. We focus on identifying outliers that may be problematic with respect to data quality, but might also be true anomalies or even attempts to subvert the database. Public Library of Science 2022-08-03 /pmc/articles/PMC9348684/ /pubmed/35921272 http://dx.doi.org/10.1371/journal.pone.0271970 Text en © 2022 Karr et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Karr, Alan F.
Hauzel, Jason
Porter, Adam A.
Schaefer, Marcel
Measuring quality of DNA sequence data via degradation
title Measuring quality of DNA sequence data via degradation
title_full Measuring quality of DNA sequence data via degradation
title_fullStr Measuring quality of DNA sequence data via degradation
title_full_unstemmed Measuring quality of DNA sequence data via degradation
title_short Measuring quality of DNA sequence data via degradation
title_sort measuring quality of dna sequence data via degradation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9348684/
https://www.ncbi.nlm.nih.gov/pubmed/35921272
http://dx.doi.org/10.1371/journal.pone.0271970
work_keys_str_mv AT karralanf measuringqualityofdnasequencedataviadegradation
AT hauzeljason measuringqualityofdnasequencedataviadegradation
AT porteradama measuringqualityofdnasequencedataviadegradation
AT schaefermarcel measuringqualityofdnasequencedataviadegradation