Cargando…
ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research
BACKGROUND: Traditional Sanger sequencing has been used as a gold standard method for genetic testing in clinic to perform single gene test, which has been a cumbersome and expensive method to test several genes in heterogeneous disease such as cancer. With the advent of Next Generation Sequencing t...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4735967/ https://www.ncbi.nlm.nih.gov/pubmed/26830926 http://dx.doi.org/10.1186/s12859-016-0915-y |
_version_ | 1782413180753936384 |
---|---|
author | Pandey, Ram Vinay Pabinger, Stephan Kriegner, Albert Weinhäusel, Andreas |
author_facet | Pandey, Ram Vinay Pabinger, Stephan Kriegner, Albert Weinhäusel, Andreas |
author_sort | Pandey, Ram Vinay |
collection | PubMed |
description | BACKGROUND: Traditional Sanger sequencing has been used as a gold standard method for genetic testing in clinic to perform single gene test, which has been a cumbersome and expensive method to test several genes in heterogeneous disease such as cancer. With the advent of Next Generation Sequencing technologies, which produce data on unprecedented speed in a cost effective manner have overcome the limitation of Sanger sequencing. Therefore, for the efficient and affordable genetic testing, Next Generation Sequencing has been used as a complementary method with Sanger sequencing for disease causing mutation identification and confirmation in clinical research. However, in order to identify the potential disease causing mutations with great sensitivity and specificity it is essential to ensure high quality sequencing data. Therefore, integrated software tools are lacking which can analyze Sanger and NGS data together and eliminate platform specific sequencing errors, low quality reads and support the analysis of several sample/patients data set in a single run. RESULTS: We have developed ClinQC, a flexible and user-friendly pipeline for format conversion, quality control, trimming and filtering of raw sequencing data generated from Sanger sequencing and three NGS sequencing platforms including Illumina, 454 and Ion Torrent. First, ClinQC convert input read files from their native formats to a common FASTQ format and remove adapters, and PCR primers. Next, it split bar-coded samples, filter duplicates, contamination and low quality sequences and generates a QC report. ClinQC output high quality reads in FASTQ format with Sanger quality encoding, which can be directly used in down-stream analysis. It can analyze hundreds of sample/patients data in a single run and generate unified output files for both Sanger and NGS sequencing data. Our tool is expected to be very useful for quality control and format conversion of Sanger and NGS data to facilitate improved downstream analysis and mutation screening. CONCLUSIONS: ClinQC is a powerful and easy to handle pipeline for quality control and trimming in clinical research. ClinQC is written in Python with multiprocessing capability, run on all major operating systems and is available at https://sourceforge.net/projects/clinqc. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0915-y) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4735967 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-47359672016-02-03 ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research Pandey, Ram Vinay Pabinger, Stephan Kriegner, Albert Weinhäusel, Andreas BMC Bioinformatics Software BACKGROUND: Traditional Sanger sequencing has been used as a gold standard method for genetic testing in clinic to perform single gene test, which has been a cumbersome and expensive method to test several genes in heterogeneous disease such as cancer. With the advent of Next Generation Sequencing technologies, which produce data on unprecedented speed in a cost effective manner have overcome the limitation of Sanger sequencing. Therefore, for the efficient and affordable genetic testing, Next Generation Sequencing has been used as a complementary method with Sanger sequencing for disease causing mutation identification and confirmation in clinical research. However, in order to identify the potential disease causing mutations with great sensitivity and specificity it is essential to ensure high quality sequencing data. Therefore, integrated software tools are lacking which can analyze Sanger and NGS data together and eliminate platform specific sequencing errors, low quality reads and support the analysis of several sample/patients data set in a single run. RESULTS: We have developed ClinQC, a flexible and user-friendly pipeline for format conversion, quality control, trimming and filtering of raw sequencing data generated from Sanger sequencing and three NGS sequencing platforms including Illumina, 454 and Ion Torrent. First, ClinQC convert input read files from their native formats to a common FASTQ format and remove adapters, and PCR primers. Next, it split bar-coded samples, filter duplicates, contamination and low quality sequences and generates a QC report. ClinQC output high quality reads in FASTQ format with Sanger quality encoding, which can be directly used in down-stream analysis. It can analyze hundreds of sample/patients data in a single run and generate unified output files for both Sanger and NGS sequencing data. Our tool is expected to be very useful for quality control and format conversion of Sanger and NGS data to facilitate improved downstream analysis and mutation screening. CONCLUSIONS: ClinQC is a powerful and easy to handle pipeline for quality control and trimming in clinical research. ClinQC is written in Python with multiprocessing capability, run on all major operating systems and is available at https://sourceforge.net/projects/clinqc. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0915-y) contains supplementary material, which is available to authorized users. BioMed Central 2016-02-02 /pmc/articles/PMC4735967/ /pubmed/26830926 http://dx.doi.org/10.1186/s12859-016-0915-y Text en © Pandey et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Pandey, Ram Vinay Pabinger, Stephan Kriegner, Albert Weinhäusel, Andreas ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research |
title | ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research |
title_full | ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research |
title_fullStr | ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research |
title_full_unstemmed | ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research |
title_short | ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research |
title_sort | clinqc: a tool for quality control and cleaning of sanger and ngs data in clinical research |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4735967/ https://www.ncbi.nlm.nih.gov/pubmed/26830926 http://dx.doi.org/10.1186/s12859-016-0915-y |
work_keys_str_mv | AT pandeyramvinay clinqcatoolforqualitycontrolandcleaningofsangerandngsdatainclinicalresearch AT pabingerstephan clinqcatoolforqualitycontrolandcleaningofsangerandngsdatainclinicalresearch AT kriegneralbert clinqcatoolforqualitycontrolandcleaningofsangerandngsdatainclinicalresearch AT weinhauselandreas clinqcatoolforqualitycontrolandcleaningofsangerandngsdatainclinicalresearch |