Cargando…

PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets

Quality control and read preprocessing are critical steps in the analysis of data sets generated from high-throughput genomic screens. In the most extreme cases, improper preprocessing can negatively affect downstream analyses and may lead to incorrect biological conclusions. Here, we present PathoQ...

Descripción completa

Detalles Bibliográficos
Autores principales: Hong, Changjin, Manimaran, Solaiappan, Johnson, William Evan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4429651/
https://www.ncbi.nlm.nih.gov/pubmed/25983538
http://dx.doi.org/10.4137/CIN.S13890
_version_ 1782371073699872768
author Hong, Changjin
Manimaran, Solaiappan
Johnson, William Evan
author_facet Hong, Changjin
Manimaran, Solaiappan
Johnson, William Evan
author_sort Hong, Changjin
collection PubMed
description Quality control and read preprocessing are critical steps in the analysis of data sets generated from high-throughput genomic screens. In the most extreme cases, improper preprocessing can negatively affect downstream analyses and may lead to incorrect biological conclusions. Here, we present PathoQC, a streamlined toolkit that seamlessly combines the benefits of several popular quality control software approaches for preprocessing next-generation sequencing data. PathoQC provides a variety of quality control options appropriate for most high-throughput sequencing applications. PathoQC is primarily developed as a module in the PathoScope software suite for metagenomic analysis. However, PathoQC is also available as an open-source Python module that can run as a stand-alone application or can be easily integrated into any bioinformatics workflow. PathoQC achieves high performance by supporting parallel computation and is an effective tool that removes technical sequencing artifacts and facilitates robust downstream analysis. The PathoQC software package is available at http://sourceforge.net/projects/PathoScope/.
format Online
Article
Text
id pubmed-4429651
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-44296512015-05-15 PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets Hong, Changjin Manimaran, Solaiappan Johnson, William Evan Cancer Inform Software or Database Review Quality control and read preprocessing are critical steps in the analysis of data sets generated from high-throughput genomic screens. In the most extreme cases, improper preprocessing can negatively affect downstream analyses and may lead to incorrect biological conclusions. Here, we present PathoQC, a streamlined toolkit that seamlessly combines the benefits of several popular quality control software approaches for preprocessing next-generation sequencing data. PathoQC provides a variety of quality control options appropriate for most high-throughput sequencing applications. PathoQC is primarily developed as a module in the PathoScope software suite for metagenomic analysis. However, PathoQC is also available as an open-source Python module that can run as a stand-alone application or can be easily integrated into any bioinformatics workflow. PathoQC achieves high performance by supporting parallel computation and is an effective tool that removes technical sequencing artifacts and facilitates robust downstream analysis. The PathoQC software package is available at http://sourceforge.net/projects/PathoScope/. Libertas Academica 2015-05-12 /pmc/articles/PMC4429651/ /pubmed/25983538 http://dx.doi.org/10.4137/CIN.S13890 Text en © 2014 the author(s), publisher and licensee Libertas Academica Limited This is an open-access article distributed under the terms of the Creative Commons CCCC-BY-NCNC 3.0 License.
spellingShingle Software or Database Review
Hong, Changjin
Manimaran, Solaiappan
Johnson, William Evan
PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets
title PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets
title_full PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets
title_fullStr PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets
title_full_unstemmed PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets
title_short PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets
title_sort pathoqc: computationally efficient read preprocessing and quality control for high-throughput sequencing data sets
topic Software or Database Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4429651/
https://www.ncbi.nlm.nih.gov/pubmed/25983538
http://dx.doi.org/10.4137/CIN.S13890
work_keys_str_mv AT hongchangjin pathoqccomputationallyefficientreadpreprocessingandqualitycontrolforhighthroughputsequencingdatasets
AT manimaransolaiappan pathoqccomputationallyefficientreadpreprocessingandqualitycontrolforhighthroughputsequencingdatasets
AT johnsonwilliamevan pathoqccomputationallyefficientreadpreprocessingandqualitycontrolforhighthroughputsequencingdatasets