Cargando…

An Integrated SNP Mining and Utilization (ISMU) Pipeline for Next Generation Sequencing Data

Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may no...

Descripción completa

Detalles Bibliográficos
Autores principales: Azam, Sarwar, Rathore, Abhishek, Shah, Trushar M., Telluri, Mohan, Amindala, BhanuPrakash, Ruperao, Pradeep, Katta, Mohan A. V. S. K., Varshney, Rajeev K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4086967/
https://www.ncbi.nlm.nih.gov/pubmed/25003610
http://dx.doi.org/10.1371/journal.pone.0101754
_version_ 1782324866965307392
author Azam, Sarwar
Rathore, Abhishek
Shah, Trushar M.
Telluri, Mohan
Amindala, BhanuPrakash
Ruperao, Pradeep
Katta, Mohan A. V. S. K.
Varshney, Rajeev K.
author_facet Azam, Sarwar
Rathore, Abhishek
Shah, Trushar M.
Telluri, Mohan
Amindala, BhanuPrakash
Ruperao, Pradeep
Katta, Mohan A. V. S. K.
Varshney, Rajeev K.
author_sort Azam, Sarwar
collection PubMed
description Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone free software.
format Online
Article
Text
id pubmed-4086967
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-40869672014-07-14 An Integrated SNP Mining and Utilization (ISMU) Pipeline for Next Generation Sequencing Data Azam, Sarwar Rathore, Abhishek Shah, Trushar M. Telluri, Mohan Amindala, BhanuPrakash Ruperao, Pradeep Katta, Mohan A. V. S. K. Varshney, Rajeev K. PLoS One Research Article Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone free software. Public Library of Science 2014-07-08 /pmc/articles/PMC4086967/ /pubmed/25003610 http://dx.doi.org/10.1371/journal.pone.0101754 Text en © 2014 Azam et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Azam, Sarwar
Rathore, Abhishek
Shah, Trushar M.
Telluri, Mohan
Amindala, BhanuPrakash
Ruperao, Pradeep
Katta, Mohan A. V. S. K.
Varshney, Rajeev K.
An Integrated SNP Mining and Utilization (ISMU) Pipeline for Next Generation Sequencing Data
title An Integrated SNP Mining and Utilization (ISMU) Pipeline for Next Generation Sequencing Data
title_full An Integrated SNP Mining and Utilization (ISMU) Pipeline for Next Generation Sequencing Data
title_fullStr An Integrated SNP Mining and Utilization (ISMU) Pipeline for Next Generation Sequencing Data
title_full_unstemmed An Integrated SNP Mining and Utilization (ISMU) Pipeline for Next Generation Sequencing Data
title_short An Integrated SNP Mining and Utilization (ISMU) Pipeline for Next Generation Sequencing Data
title_sort integrated snp mining and utilization (ismu) pipeline for next generation sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4086967/
https://www.ncbi.nlm.nih.gov/pubmed/25003610
http://dx.doi.org/10.1371/journal.pone.0101754
work_keys_str_mv AT azamsarwar anintegratedsnpminingandutilizationismupipelinefornextgenerationsequencingdata
AT rathoreabhishek anintegratedsnpminingandutilizationismupipelinefornextgenerationsequencingdata
AT shahtrusharm anintegratedsnpminingandutilizationismupipelinefornextgenerationsequencingdata
AT tellurimohan anintegratedsnpminingandutilizationismupipelinefornextgenerationsequencingdata
AT amindalabhanuprakash anintegratedsnpminingandutilizationismupipelinefornextgenerationsequencingdata
AT ruperaopradeep anintegratedsnpminingandutilizationismupipelinefornextgenerationsequencingdata
AT kattamohanavsk anintegratedsnpminingandutilizationismupipelinefornextgenerationsequencingdata
AT varshneyrajeevk anintegratedsnpminingandutilizationismupipelinefornextgenerationsequencingdata
AT azamsarwar integratedsnpminingandutilizationismupipelinefornextgenerationsequencingdata
AT rathoreabhishek integratedsnpminingandutilizationismupipelinefornextgenerationsequencingdata
AT shahtrusharm integratedsnpminingandutilizationismupipelinefornextgenerationsequencingdata
AT tellurimohan integratedsnpminingandutilizationismupipelinefornextgenerationsequencingdata
AT amindalabhanuprakash integratedsnpminingandutilizationismupipelinefornextgenerationsequencingdata
AT ruperaopradeep integratedsnpminingandutilizationismupipelinefornextgenerationsequencingdata
AT kattamohanavsk integratedsnpminingandutilizationismupipelinefornextgenerationsequencingdata
AT varshneyrajeevk integratedsnpminingandutilizationismupipelinefornextgenerationsequencingdata