Cargando…

seqQscorer: automated quality control of next-generation sequencing data using machine learning

Controlling quality of next-generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterize common NGS quality features and develop a novel quality control procedure involving tree-based and deep learning classification algorithms. Predi...

Descripción completa

Detalles Bibliográficos
Autores principales: Albrecht, Steffen, Sprang, Maximilian, Andrade-Navarro, Miguel A., Fontaine, Jean-Fred
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7934511/
https://www.ncbi.nlm.nih.gov/pubmed/33673854
http://dx.doi.org/10.1186/s13059-021-02294-2
Descripción
Sumario:Controlling quality of next-generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterize common NGS quality features and develop a novel quality control procedure involving tree-based and deep learning classification algorithms. Predictive models, validated on internal and external functional genomics datasets, are to some extent generalizable to data from unseen species. The derived statistical guidelines and predictive models represent a valuable resource for users of NGS data to better understand quality issues and perform automatic quality control. Our guidelines and software are available at https://github.com/salbrec/seqQscorer. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-021-02294-2.