Cargando…

An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms

The large amounts of EST sequence data available from a single species of an organism as well as for several species within a genus provide an easy source of identification of intra- and interspecies single nucleotide polymorphisms (SNPs). In the case of model organisms, the data available are numer...

Descripción completa

Detalles Bibliográficos
Autores principales: Jayashree, B., Hanspal, Manindra S., Srinivasan, Rajgopal, Vigneshwaran, R., Varshney, Rajeev K., Spurthi, N., Eshwar, K., Ramesh, N., Chandra, S., Hoisington, David A.
Formato: Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2216057/
https://www.ncbi.nlm.nih.gov/pubmed/18273384
http://dx.doi.org/10.1155/2007/35604
_version_ 1782149102184693760
author Jayashree, B.
Hanspal, Manindra S.
Srinivasan, Rajgopal
Vigneshwaran, R.
Varshney, Rajeev K.
Spurthi, N.
Eshwar, K.
Ramesh, N.
Chandra, S.
Hoisington, David A.
author_facet Jayashree, B.
Hanspal, Manindra S.
Srinivasan, Rajgopal
Vigneshwaran, R.
Varshney, Rajeev K.
Spurthi, N.
Eshwar, K.
Ramesh, N.
Chandra, S.
Hoisington, David A.
author_sort Jayashree, B.
collection PubMed
description The large amounts of EST sequence data available from a single species of an organism as well as for several species within a genus provide an easy source of identification of intra- and interspecies single nucleotide polymorphisms (SNPs). In the case of model organisms, the data available are numerous, given the degree of redundancy in the deposited EST data. There are several available bioinformatics tools that can be used to mine this data; however, using them requires a certain level of expertise: the tools have to be used sequentially with accompanying format conversion and steps like clustering and assembly of sequences become time-intensive jobs even for moderately sized datasets. We report here a pipeline of open source software extended to run on multiple CPU architectures that can be used to mine large EST datasets for SNPs and identify restriction sites for assaying the SNPs so that cost-effective CAPS assays can be developed for SNP genotyping in genetics and breeding applications. At the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), the pipeline has been implemented to run on a Paracel high-performance system consisting of four dual AMD Opteron processors running Linux with MPICH. The pipeline can be accessed through user-friendly web interfaces at http://hpc.icrisat.cgiar.org/PBSWeb and is available on request for academic use. We have validated the developed pipeline by mining chickpea ESTs for interspecies SNPs, development of CAPS assays for SNP genotyping, and confirmation of restriction digestion pattern at the sequence level.
format Text
id pubmed-2216057
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-22160572008-02-13 An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms Jayashree, B. Hanspal, Manindra S. Srinivasan, Rajgopal Vigneshwaran, R. Varshney, Rajeev K. Spurthi, N. Eshwar, K. Ramesh, N. Chandra, S. Hoisington, David A. Comp Funct Genomics Research Article The large amounts of EST sequence data available from a single species of an organism as well as for several species within a genus provide an easy source of identification of intra- and interspecies single nucleotide polymorphisms (SNPs). In the case of model organisms, the data available are numerous, given the degree of redundancy in the deposited EST data. There are several available bioinformatics tools that can be used to mine this data; however, using them requires a certain level of expertise: the tools have to be used sequentially with accompanying format conversion and steps like clustering and assembly of sequences become time-intensive jobs even for moderately sized datasets. We report here a pipeline of open source software extended to run on multiple CPU architectures that can be used to mine large EST datasets for SNPs and identify restriction sites for assaying the SNPs so that cost-effective CAPS assays can be developed for SNP genotyping in genetics and breeding applications. At the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), the pipeline has been implemented to run on a Paracel high-performance system consisting of four dual AMD Opteron processors running Linux with MPICH. The pipeline can be accessed through user-friendly web interfaces at http://hpc.icrisat.cgiar.org/PBSWeb and is available on request for academic use. We have validated the developed pipeline by mining chickpea ESTs for interspecies SNPs, development of CAPS assays for SNP genotyping, and confirmation of restriction digestion pattern at the sequence level. Hindawi Publishing Corporation 2007 2007-12-02 /pmc/articles/PMC2216057/ /pubmed/18273384 http://dx.doi.org/10.1155/2007/35604 Text en Copyright © 2007 B. Jayashree et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Jayashree, B.
Hanspal, Manindra S.
Srinivasan, Rajgopal
Vigneshwaran, R.
Varshney, Rajeev K.
Spurthi, N.
Eshwar, K.
Ramesh, N.
Chandra, S.
Hoisington, David A.
An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms
title An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms
title_full An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms
title_fullStr An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms
title_full_unstemmed An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms
title_short An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms
title_sort integrated pipeline of open source software adapted for multi-cpu architectures: use in the large-scale identification of single nucleotide polymorphisms
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2216057/
https://www.ncbi.nlm.nih.gov/pubmed/18273384
http://dx.doi.org/10.1155/2007/35604
work_keys_str_mv AT jayashreeb anintegratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms
AT hanspalmanindras anintegratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms
AT srinivasanrajgopal anintegratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms
AT vigneshwaranr anintegratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms
AT varshneyrajeevk anintegratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms
AT spurthin anintegratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms
AT eshwark anintegratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms
AT rameshn anintegratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms
AT chandras anintegratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms
AT hoisingtondavida anintegratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms
AT jayashreeb integratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms
AT hanspalmanindras integratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms
AT srinivasanrajgopal integratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms
AT vigneshwaranr integratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms
AT varshneyrajeevk integratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms
AT spurthin integratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms
AT eshwark integratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms
AT rameshn integratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms
AT chandras integratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms
AT hoisingtondavida integratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms