Cargando…

Statistical guidelines for quality control of next-generation sequencing techniques

More and more next-generation sequencing (NGS) data are made available every day. However, the quality of this data is not always guaranteed. Available quality control tools require profound knowledge to correctly interpret the multiplicity of quality features. Moreover, it is usually difficult to k...

Descripción completa

Detalles Bibliográficos
Autores principales: Sprang, Maximilian, Krüger, Matteo, Andrade-Navarro, Miguel A, Fontaine, Jean-Fred
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Life Science Alliance LLC 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8408346/
https://www.ncbi.nlm.nih.gov/pubmed/34462322
http://dx.doi.org/10.26508/lsa.202101113
_version_ 1783746806670163968
author Sprang, Maximilian
Krüger, Matteo
Andrade-Navarro, Miguel A
Fontaine, Jean-Fred
author_facet Sprang, Maximilian
Krüger, Matteo
Andrade-Navarro, Miguel A
Fontaine, Jean-Fred
author_sort Sprang, Maximilian
collection PubMed
description More and more next-generation sequencing (NGS) data are made available every day. However, the quality of this data is not always guaranteed. Available quality control tools require profound knowledge to correctly interpret the multiplicity of quality features. Moreover, it is usually difficult to know if quality features are relevant in all experimental conditions. Therefore, the NGS community would highly benefit from condition-specific data-driven guidelines derived from many publicly available experiments, which reflect routinely generated NGS data. In this work, we have characterized well-known quality guidelines and related features in big datasets and concluded that they are too limited for assessing the quality of a given NGS file accurately. Therefore, we present new data-driven guidelines derived from the statistical analysis of many public datasets using quality features calculated by common bioinformatics tools. Thanks to this approach, we confirm the high relevance of genome mapping statistics to assess the quality of the data, and we demonstrate the limited scope of some quality features that are not relevant in all conditions. Our guidelines are available at https://cbdm.uni-mainz.de/ngs-guidelines.
format Online
Article
Text
id pubmed-8408346
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Life Science Alliance LLC
record_format MEDLINE/PubMed
spelling pubmed-84083462021-09-17 Statistical guidelines for quality control of next-generation sequencing techniques Sprang, Maximilian Krüger, Matteo Andrade-Navarro, Miguel A Fontaine, Jean-Fred Life Sci Alliance Research Articles More and more next-generation sequencing (NGS) data are made available every day. However, the quality of this data is not always guaranteed. Available quality control tools require profound knowledge to correctly interpret the multiplicity of quality features. Moreover, it is usually difficult to know if quality features are relevant in all experimental conditions. Therefore, the NGS community would highly benefit from condition-specific data-driven guidelines derived from many publicly available experiments, which reflect routinely generated NGS data. In this work, we have characterized well-known quality guidelines and related features in big datasets and concluded that they are too limited for assessing the quality of a given NGS file accurately. Therefore, we present new data-driven guidelines derived from the statistical analysis of many public datasets using quality features calculated by common bioinformatics tools. Thanks to this approach, we confirm the high relevance of genome mapping statistics to assess the quality of the data, and we demonstrate the limited scope of some quality features that are not relevant in all conditions. Our guidelines are available at https://cbdm.uni-mainz.de/ngs-guidelines. Life Science Alliance LLC 2021-08-30 /pmc/articles/PMC8408346/ /pubmed/34462322 http://dx.doi.org/10.26508/lsa.202101113 Text en © 2021 Sprang et al. https://creativecommons.org/licenses/by/4.0/This article is available under a Creative Commons License (Attribution 4.0 International, as described at https://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Articles
Sprang, Maximilian
Krüger, Matteo
Andrade-Navarro, Miguel A
Fontaine, Jean-Fred
Statistical guidelines for quality control of next-generation sequencing techniques
title Statistical guidelines for quality control of next-generation sequencing techniques
title_full Statistical guidelines for quality control of next-generation sequencing techniques
title_fullStr Statistical guidelines for quality control of next-generation sequencing techniques
title_full_unstemmed Statistical guidelines for quality control of next-generation sequencing techniques
title_short Statistical guidelines for quality control of next-generation sequencing techniques
title_sort statistical guidelines for quality control of next-generation sequencing techniques
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8408346/
https://www.ncbi.nlm.nih.gov/pubmed/34462322
http://dx.doi.org/10.26508/lsa.202101113
work_keys_str_mv AT sprangmaximilian statisticalguidelinesforqualitycontrolofnextgenerationsequencingtechniques
AT krugermatteo statisticalguidelinesforqualitycontrolofnextgenerationsequencingtechniques
AT andradenavarromiguela statisticalguidelinesforqualitycontrolofnextgenerationsequencingtechniques
AT fontainejeanfred statisticalguidelinesforqualitycontrolofnextgenerationsequencingtechniques