Cargando…
An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms
The large amounts of EST sequence data available from a single species of an organism as well as for several species within a genus provide an easy source of identification of intra- and interspecies single nucleotide polymorphisms (SNPs). In the case of model organisms, the data available are numer...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Hindawi Publishing Corporation
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2216057/ https://www.ncbi.nlm.nih.gov/pubmed/18273384 http://dx.doi.org/10.1155/2007/35604 |
_version_ | 1782149102184693760 |
---|---|
author | Jayashree, B. Hanspal, Manindra S. Srinivasan, Rajgopal Vigneshwaran, R. Varshney, Rajeev K. Spurthi, N. Eshwar, K. Ramesh, N. Chandra, S. Hoisington, David A. |
author_facet | Jayashree, B. Hanspal, Manindra S. Srinivasan, Rajgopal Vigneshwaran, R. Varshney, Rajeev K. Spurthi, N. Eshwar, K. Ramesh, N. Chandra, S. Hoisington, David A. |
author_sort | Jayashree, B. |
collection | PubMed |
description | The large amounts of EST sequence data available from a single species of an organism as well as for several species within a genus provide an easy source of identification of intra- and interspecies single nucleotide polymorphisms (SNPs). In the case of model organisms, the data available are numerous, given the degree of redundancy in the deposited EST data. There are several available bioinformatics tools that can be used to mine this data; however, using them requires a certain level of expertise: the tools have to be used sequentially with accompanying format conversion and steps like clustering and assembly of sequences become time-intensive jobs even for moderately sized datasets. We report here a pipeline of open source software extended to run on multiple CPU architectures that can be used to mine large EST datasets for SNPs and identify restriction sites for assaying the SNPs so that cost-effective CAPS assays can be developed for SNP genotyping in genetics and breeding applications. At the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), the pipeline has been implemented to run on a Paracel high-performance system consisting of four dual AMD Opteron processors running Linux with MPICH. The pipeline can be accessed through user-friendly web interfaces at http://hpc.icrisat.cgiar.org/PBSWeb and is available on request for academic use. We have validated the developed pipeline by mining chickpea ESTs for interspecies SNPs, development of CAPS assays for SNP genotyping, and confirmation of restriction digestion pattern at the sequence level. |
format | Text |
id | pubmed-2216057 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | Hindawi Publishing Corporation |
record_format | MEDLINE/PubMed |
spelling | pubmed-22160572008-02-13 An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms Jayashree, B. Hanspal, Manindra S. Srinivasan, Rajgopal Vigneshwaran, R. Varshney, Rajeev K. Spurthi, N. Eshwar, K. Ramesh, N. Chandra, S. Hoisington, David A. Comp Funct Genomics Research Article The large amounts of EST sequence data available from a single species of an organism as well as for several species within a genus provide an easy source of identification of intra- and interspecies single nucleotide polymorphisms (SNPs). In the case of model organisms, the data available are numerous, given the degree of redundancy in the deposited EST data. There are several available bioinformatics tools that can be used to mine this data; however, using them requires a certain level of expertise: the tools have to be used sequentially with accompanying format conversion and steps like clustering and assembly of sequences become time-intensive jobs even for moderately sized datasets. We report here a pipeline of open source software extended to run on multiple CPU architectures that can be used to mine large EST datasets for SNPs and identify restriction sites for assaying the SNPs so that cost-effective CAPS assays can be developed for SNP genotyping in genetics and breeding applications. At the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), the pipeline has been implemented to run on a Paracel high-performance system consisting of four dual AMD Opteron processors running Linux with MPICH. The pipeline can be accessed through user-friendly web interfaces at http://hpc.icrisat.cgiar.org/PBSWeb and is available on request for academic use. We have validated the developed pipeline by mining chickpea ESTs for interspecies SNPs, development of CAPS assays for SNP genotyping, and confirmation of restriction digestion pattern at the sequence level. Hindawi Publishing Corporation 2007 2007-12-02 /pmc/articles/PMC2216057/ /pubmed/18273384 http://dx.doi.org/10.1155/2007/35604 Text en Copyright © 2007 B. Jayashree et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Jayashree, B. Hanspal, Manindra S. Srinivasan, Rajgopal Vigneshwaran, R. Varshney, Rajeev K. Spurthi, N. Eshwar, K. Ramesh, N. Chandra, S. Hoisington, David A. An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms |
title | An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms |
title_full | An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms |
title_fullStr | An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms |
title_full_unstemmed | An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms |
title_short | An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms |
title_sort | integrated pipeline of open source software adapted for multi-cpu architectures: use in the large-scale identification of single nucleotide polymorphisms |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2216057/ https://www.ncbi.nlm.nih.gov/pubmed/18273384 http://dx.doi.org/10.1155/2007/35604 |
work_keys_str_mv | AT jayashreeb anintegratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms AT hanspalmanindras anintegratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms AT srinivasanrajgopal anintegratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms AT vigneshwaranr anintegratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms AT varshneyrajeevk anintegratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms AT spurthin anintegratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms AT eshwark anintegratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms AT rameshn anintegratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms AT chandras anintegratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms AT hoisingtondavida anintegratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms AT jayashreeb integratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms AT hanspalmanindras integratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms AT srinivasanrajgopal integratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms AT vigneshwaranr integratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms AT varshneyrajeevk integratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms AT spurthin integratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms AT eshwark integratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms AT rameshn integratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms AT chandras integratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms AT hoisingtondavida integratedpipelineofopensourcesoftwareadaptedformulticpuarchitecturesuseinthelargescaleidentificationofsinglenucleotidepolymorphisms |