Cargando…
Statistical guidelines for quality control of next-generation sequencing techniques
More and more next-generation sequencing (NGS) data are made available every day. However, the quality of this data is not always guaranteed. Available quality control tools require profound knowledge to correctly interpret the multiplicity of quality features. Moreover, it is usually difficult to k...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Life Science Alliance LLC
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8408346/ https://www.ncbi.nlm.nih.gov/pubmed/34462322 http://dx.doi.org/10.26508/lsa.202101113 |
_version_ | 1783746806670163968 |
---|---|
author | Sprang, Maximilian Krüger, Matteo Andrade-Navarro, Miguel A Fontaine, Jean-Fred |
author_facet | Sprang, Maximilian Krüger, Matteo Andrade-Navarro, Miguel A Fontaine, Jean-Fred |
author_sort | Sprang, Maximilian |
collection | PubMed |
description | More and more next-generation sequencing (NGS) data are made available every day. However, the quality of this data is not always guaranteed. Available quality control tools require profound knowledge to correctly interpret the multiplicity of quality features. Moreover, it is usually difficult to know if quality features are relevant in all experimental conditions. Therefore, the NGS community would highly benefit from condition-specific data-driven guidelines derived from many publicly available experiments, which reflect routinely generated NGS data. In this work, we have characterized well-known quality guidelines and related features in big datasets and concluded that they are too limited for assessing the quality of a given NGS file accurately. Therefore, we present new data-driven guidelines derived from the statistical analysis of many public datasets using quality features calculated by common bioinformatics tools. Thanks to this approach, we confirm the high relevance of genome mapping statistics to assess the quality of the data, and we demonstrate the limited scope of some quality features that are not relevant in all conditions. Our guidelines are available at https://cbdm.uni-mainz.de/ngs-guidelines. |
format | Online Article Text |
id | pubmed-8408346 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Life Science Alliance LLC |
record_format | MEDLINE/PubMed |
spelling | pubmed-84083462021-09-17 Statistical guidelines for quality control of next-generation sequencing techniques Sprang, Maximilian Krüger, Matteo Andrade-Navarro, Miguel A Fontaine, Jean-Fred Life Sci Alliance Research Articles More and more next-generation sequencing (NGS) data are made available every day. However, the quality of this data is not always guaranteed. Available quality control tools require profound knowledge to correctly interpret the multiplicity of quality features. Moreover, it is usually difficult to know if quality features are relevant in all experimental conditions. Therefore, the NGS community would highly benefit from condition-specific data-driven guidelines derived from many publicly available experiments, which reflect routinely generated NGS data. In this work, we have characterized well-known quality guidelines and related features in big datasets and concluded that they are too limited for assessing the quality of a given NGS file accurately. Therefore, we present new data-driven guidelines derived from the statistical analysis of many public datasets using quality features calculated by common bioinformatics tools. Thanks to this approach, we confirm the high relevance of genome mapping statistics to assess the quality of the data, and we demonstrate the limited scope of some quality features that are not relevant in all conditions. Our guidelines are available at https://cbdm.uni-mainz.de/ngs-guidelines. Life Science Alliance LLC 2021-08-30 /pmc/articles/PMC8408346/ /pubmed/34462322 http://dx.doi.org/10.26508/lsa.202101113 Text en © 2021 Sprang et al. https://creativecommons.org/licenses/by/4.0/This article is available under a Creative Commons License (Attribution 4.0 International, as described at https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Research Articles Sprang, Maximilian Krüger, Matteo Andrade-Navarro, Miguel A Fontaine, Jean-Fred Statistical guidelines for quality control of next-generation sequencing techniques |
title | Statistical guidelines for quality control of next-generation sequencing techniques |
title_full | Statistical guidelines for quality control of next-generation sequencing techniques |
title_fullStr | Statistical guidelines for quality control of next-generation sequencing techniques |
title_full_unstemmed | Statistical guidelines for quality control of next-generation sequencing techniques |
title_short | Statistical guidelines for quality control of next-generation sequencing techniques |
title_sort | statistical guidelines for quality control of next-generation sequencing techniques |
topic | Research Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8408346/ https://www.ncbi.nlm.nih.gov/pubmed/34462322 http://dx.doi.org/10.26508/lsa.202101113 |
work_keys_str_mv | AT sprangmaximilian statisticalguidelinesforqualitycontrolofnextgenerationsequencingtechniques AT krugermatteo statisticalguidelinesforqualitycontrolofnextgenerationsequencingtechniques AT andradenavarromiguela statisticalguidelinesforqualitycontrolofnextgenerationsequencingtechniques AT fontainejeanfred statisticalguidelinesforqualitycontrolofnextgenerationsequencingtechniques |