Cargando…

Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework

BACKGROUND: For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lewis, Steven, Csordas, Attila, Killcoyne, Sarah, Hermjakob, Henning, Hoopmann, Michael R, Moritz, Robert L, Deutsch, Eric W, Boyle, John
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3538679/ https://www.ncbi.nlm.nih.gov/pubmed/23216909 http://dx.doi.org/10.1186/1471-2105-13-324

_version_	1782254991809970176
author	Lewis, Steven Csordas, Attila Killcoyne, Sarah Hermjakob, Henning Hoopmann, Michael R Moritz, Robert L Deutsch, Eric W Boyle, John
author_facet	Lewis, Steven Csordas, Attila Killcoyne, Sarah Hermjakob, Henning Hoopmann, Michael R Moritz, Robert L Deutsch, Eric W Boyle, John
author_sort	Lewis, Steven
collection	PubMed
description	BACKGROUND: For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. RESULTS: We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. CONCLUSION: The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.
format	Online Article Text
id	pubmed-3538679
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35386792013-01-10 Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework Lewis, Steven Csordas, Attila Killcoyne, Sarah Hermjakob, Henning Hoopmann, Michael R Moritz, Robert L Deutsch, Eric W Boyle, John BMC Bioinformatics Software BACKGROUND: For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. RESULTS: We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. CONCLUSION: The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources. BioMed Central 2012-12-05 /pmc/articles/PMC3538679/ /pubmed/23216909 http://dx.doi.org/10.1186/1471-2105-13-324 Text en Copyright ©2012 Lewis et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Software Lewis, Steven Csordas, Attila Killcoyne, Sarah Hermjakob, Henning Hoopmann, Michael R Moritz, Robert L Deutsch, Eric W Boyle, John Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework
title	Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework
title_full	Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework
title_fullStr	Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework
title_full_unstemmed	Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework
title_short	Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework
title_sort	hydra: a scalable proteomic search engine which utilizes the hadoop distributed computing framework
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3538679/ https://www.ncbi.nlm.nih.gov/pubmed/23216909 http://dx.doi.org/10.1186/1471-2105-13-324
work_keys_str_mv	AT lewissteven hydraascalableproteomicsearchenginewhichutilizesthehadoopdistributedcomputingframework AT csordasattila hydraascalableproteomicsearchenginewhichutilizesthehadoopdistributedcomputingframework AT killcoynesarah hydraascalableproteomicsearchenginewhichutilizesthehadoopdistributedcomputingframework AT hermjakobhenning hydraascalableproteomicsearchenginewhichutilizesthehadoopdistributedcomputingframework AT hoopmannmichaelr hydraascalableproteomicsearchenginewhichutilizesthehadoopdistributedcomputingframework AT moritzrobertl hydraascalableproteomicsearchenginewhichutilizesthehadoopdistributedcomputingframework AT deutschericw hydraascalableproteomicsearchenginewhichutilizesthehadoopdistributedcomputingframework AT boylejohn hydraascalableproteomicsearchenginewhichutilizesthehadoopdistributedcomputingframework

Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework

Ejemplares similares