Cargando…

ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research

BACKGROUND: Traditional Sanger sequencing has been used as a gold standard method for genetic testing in clinic to perform single gene test, which has been a cumbersome and expensive method to test several genes in heterogeneous disease such as cancer. With the advent of Next Generation Sequencing t...

Descripción completa

Detalles Bibliográficos
Autores principales: Pandey, Ram Vinay, Pabinger, Stephan, Kriegner, Albert, Weinhäusel, Andreas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4735967/
https://www.ncbi.nlm.nih.gov/pubmed/26830926
http://dx.doi.org/10.1186/s12859-016-0915-y
_version_ 1782413180753936384
author Pandey, Ram Vinay
Pabinger, Stephan
Kriegner, Albert
Weinhäusel, Andreas
author_facet Pandey, Ram Vinay
Pabinger, Stephan
Kriegner, Albert
Weinhäusel, Andreas
author_sort Pandey, Ram Vinay
collection PubMed
description BACKGROUND: Traditional Sanger sequencing has been used as a gold standard method for genetic testing in clinic to perform single gene test, which has been a cumbersome and expensive method to test several genes in heterogeneous disease such as cancer. With the advent of Next Generation Sequencing technologies, which produce data on unprecedented speed in a cost effective manner have overcome the limitation of Sanger sequencing. Therefore, for the efficient and affordable genetic testing, Next Generation Sequencing has been used as a complementary method with Sanger sequencing for disease causing mutation identification and confirmation in clinical research. However, in order to identify the potential disease causing mutations with great sensitivity and specificity it is essential to ensure high quality sequencing data. Therefore, integrated software tools are lacking which can analyze Sanger and NGS data together and eliminate platform specific sequencing errors, low quality reads and support the analysis of several sample/patients data set in a single run. RESULTS: We have developed ClinQC, a flexible and user-friendly pipeline for format conversion, quality control, trimming and filtering of raw sequencing data generated from Sanger sequencing and three NGS sequencing platforms including Illumina, 454 and Ion Torrent. First, ClinQC convert input read files from their native formats to a common FASTQ format and remove adapters, and PCR primers. Next, it split bar-coded samples, filter duplicates, contamination and low quality sequences and generates a QC report. ClinQC output high quality reads in FASTQ format with Sanger quality encoding, which can be directly used in down-stream analysis. It can analyze hundreds of sample/patients data in a single run and generate unified output files for both Sanger and NGS sequencing data. Our tool is expected to be very useful for quality control and format conversion of Sanger and NGS data to facilitate improved downstream analysis and mutation screening. CONCLUSIONS: ClinQC is a powerful and easy to handle pipeline for quality control and trimming in clinical research. ClinQC is written in Python with multiprocessing capability, run on all major operating systems and is available at https://sourceforge.net/projects/clinqc. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0915-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4735967
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-47359672016-02-03 ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research Pandey, Ram Vinay Pabinger, Stephan Kriegner, Albert Weinhäusel, Andreas BMC Bioinformatics Software BACKGROUND: Traditional Sanger sequencing has been used as a gold standard method for genetic testing in clinic to perform single gene test, which has been a cumbersome and expensive method to test several genes in heterogeneous disease such as cancer. With the advent of Next Generation Sequencing technologies, which produce data on unprecedented speed in a cost effective manner have overcome the limitation of Sanger sequencing. Therefore, for the efficient and affordable genetic testing, Next Generation Sequencing has been used as a complementary method with Sanger sequencing for disease causing mutation identification and confirmation in clinical research. However, in order to identify the potential disease causing mutations with great sensitivity and specificity it is essential to ensure high quality sequencing data. Therefore, integrated software tools are lacking which can analyze Sanger and NGS data together and eliminate platform specific sequencing errors, low quality reads and support the analysis of several sample/patients data set in a single run. RESULTS: We have developed ClinQC, a flexible and user-friendly pipeline for format conversion, quality control, trimming and filtering of raw sequencing data generated from Sanger sequencing and three NGS sequencing platforms including Illumina, 454 and Ion Torrent. First, ClinQC convert input read files from their native formats to a common FASTQ format and remove adapters, and PCR primers. Next, it split bar-coded samples, filter duplicates, contamination and low quality sequences and generates a QC report. ClinQC output high quality reads in FASTQ format with Sanger quality encoding, which can be directly used in down-stream analysis. It can analyze hundreds of sample/patients data in a single run and generate unified output files for both Sanger and NGS sequencing data. Our tool is expected to be very useful for quality control and format conversion of Sanger and NGS data to facilitate improved downstream analysis and mutation screening. CONCLUSIONS: ClinQC is a powerful and easy to handle pipeline for quality control and trimming in clinical research. ClinQC is written in Python with multiprocessing capability, run on all major operating systems and is available at https://sourceforge.net/projects/clinqc. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0915-y) contains supplementary material, which is available to authorized users. BioMed Central 2016-02-02 /pmc/articles/PMC4735967/ /pubmed/26830926 http://dx.doi.org/10.1186/s12859-016-0915-y Text en © Pandey et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Pandey, Ram Vinay
Pabinger, Stephan
Kriegner, Albert
Weinhäusel, Andreas
ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research
title ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research
title_full ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research
title_fullStr ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research
title_full_unstemmed ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research
title_short ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research
title_sort clinqc: a tool for quality control and cleaning of sanger and ngs data in clinical research
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4735967/
https://www.ncbi.nlm.nih.gov/pubmed/26830926
http://dx.doi.org/10.1186/s12859-016-0915-y
work_keys_str_mv AT pandeyramvinay clinqcatoolforqualitycontrolandcleaningofsangerandngsdatainclinicalresearch
AT pabingerstephan clinqcatoolforqualitycontrolandcleaningofsangerandngsdatainclinicalresearch
AT kriegneralbert clinqcatoolforqualitycontrolandcleaningofsangerandngsdatainclinicalresearch
AT weinhauselandreas clinqcatoolforqualitycontrolandcleaningofsangerandngsdatainclinicalresearch