Cargando…

Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics

While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequ...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ragothaman, Anjani, Boddu, Sairam Chowdary, Kim, Nayong, Feinstein, Wei, Brylinski, Michal, Jha, Shantenu, Kim, Joohyun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi Publishing Corporation 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4066679/ https://www.ncbi.nlm.nih.gov/pubmed/24995285 http://dx.doi.org/10.1155/2014/348725

_version_	1782322194598068224
author	Ragothaman, Anjani Boddu, Sairam Chowdary Kim, Nayong Feinstein, Wei Brylinski, Michal Jha, Shantenu Kim, Joohyun
author_facet	Ragothaman, Anjani Boddu, Sairam Chowdary Kim, Nayong Feinstein, Wei Brylinski, Michal Jha, Shantenu Kim, Joohyun
author_sort	Ragothaman, Anjani
collection	PubMed
description	While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread—a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.
format	Online Article Text
id	pubmed-4066679
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Hindawi Publishing Corporation
record_format	MEDLINE/PubMed
spelling	pubmed-40666792014-07-03 Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics Ragothaman, Anjani Boddu, Sairam Chowdary Kim, Nayong Feinstein, Wei Brylinski, Michal Jha, Shantenu Kim, Joohyun Biomed Res Int Research Article While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread—a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure. Hindawi Publishing Corporation 2014 2014-06-09 /pmc/articles/PMC4066679/ /pubmed/24995285 http://dx.doi.org/10.1155/2014/348725 Text en Copyright © 2014 Anjani Ragothaman et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Ragothaman, Anjani Boddu, Sairam Chowdary Kim, Nayong Feinstein, Wei Brylinski, Michal Jha, Shantenu Kim, Joohyun Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics
title	Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics
title_full	Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics
title_fullStr	Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics
title_full_unstemmed	Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics
title_short	Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics
title_sort	developing ethread pipeline using saga-pilot abstraction for large-scale structural bioinformatics
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4066679/ https://www.ncbi.nlm.nih.gov/pubmed/24995285 http://dx.doi.org/10.1155/2014/348725
work_keys_str_mv	AT ragothamananjani developingethreadpipelineusingsagapilotabstractionforlargescalestructuralbioinformatics AT boddusairamchowdary developingethreadpipelineusingsagapilotabstractionforlargescalestructuralbioinformatics AT kimnayong developingethreadpipelineusingsagapilotabstractionforlargescalestructuralbioinformatics AT feinsteinwei developingethreadpipelineusingsagapilotabstractionforlargescalestructuralbioinformatics AT brylinskimichal developingethreadpipelineusingsagapilotabstractionforlargescalestructuralbioinformatics AT jhashantenu developingethreadpipelineusingsagapilotabstractionforlargescalestructuralbioinformatics AT kimjoohyun developingethreadpipelineusingsagapilotabstractionforlargescalestructuralbioinformatics

Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics

Ejemplares similares