Cargando…

MPI-PHYLIP: Parallelizing Computationally Intensive Phylogenetic Analysis Routines for the Analysis of Large Protein Families

BACKGROUND: Phylogenetic study of protein sequences provides unique and valuable insights into the molecular and genetic basis of important medical and epidemiological problems as well as insights about the origins and development of physiological features in present day organisms. Consensus phyloge...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ropelewski, Alexander J., Nicholas, Hugh B., Gonzalez Mendez, Ricardo R.
Formato:	Texto
Lenguaje:	English
Publicado:	Public Library of Science 2010
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2981553/ https://www.ncbi.nlm.nih.gov/pubmed/21085574 http://dx.doi.org/10.1371/journal.pone.0013999

_version_	1782191684545675264
author	Ropelewski, Alexander J. Nicholas, Hugh B. Gonzalez Mendez, Ricardo R.
author_facet	Ropelewski, Alexander J. Nicholas, Hugh B. Gonzalez Mendez, Ricardo R.
author_sort	Ropelewski, Alexander J.
collection	PubMed
description	BACKGROUND: Phylogenetic study of protein sequences provides unique and valuable insights into the molecular and genetic basis of important medical and epidemiological problems as well as insights about the origins and development of physiological features in present day organisms. Consensus phylogenies based on the bootstrap and other resampling methods play a crucial part in analyzing the robustness of the trees produced for these analyses. METHODOLOGY: Our focus was to increase the number of bootstrap replications that can be performed on large protein datasets using the maximum parsimony, distance matrix, and maximum likelihood methods. We have modified the PHYLIP package using MPI to enable large-scale phylogenetic study of protein sequences, using a statistically robust number of bootstrapped datasets, to be performed in a moderate amount of time. This paper discusses the methodology used to parallelize the PHYLIP programs and reports the performance of the parallel PHYLIP programs that are relevant to the study of protein evolution on several protein datasets. CONCLUSIONS: Calculations that currently take a few days on a state of the art desktop workstation are reduced to calculations that can be performed over lunchtime on a modern parallel computer. Of the three protein methods tested, the maximum likelihood method scales the best, followed by the distance method, and then the maximum parsimony method. However, the maximum likelihood method requires significant memory resources, which limits its application to more moderately sized protein datasets.
format	Text
id	pubmed-2981553
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-29815532010-11-17 MPI-PHYLIP: Parallelizing Computationally Intensive Phylogenetic Analysis Routines for the Analysis of Large Protein Families Ropelewski, Alexander J. Nicholas, Hugh B. Gonzalez Mendez, Ricardo R. PLoS One Research Article BACKGROUND: Phylogenetic study of protein sequences provides unique and valuable insights into the molecular and genetic basis of important medical and epidemiological problems as well as insights about the origins and development of physiological features in present day organisms. Consensus phylogenies based on the bootstrap and other resampling methods play a crucial part in analyzing the robustness of the trees produced for these analyses. METHODOLOGY: Our focus was to increase the number of bootstrap replications that can be performed on large protein datasets using the maximum parsimony, distance matrix, and maximum likelihood methods. We have modified the PHYLIP package using MPI to enable large-scale phylogenetic study of protein sequences, using a statistically robust number of bootstrapped datasets, to be performed in a moderate amount of time. This paper discusses the methodology used to parallelize the PHYLIP programs and reports the performance of the parallel PHYLIP programs that are relevant to the study of protein evolution on several protein datasets. CONCLUSIONS: Calculations that currently take a few days on a state of the art desktop workstation are reduced to calculations that can be performed over lunchtime on a modern parallel computer. Of the three protein methods tested, the maximum likelihood method scales the best, followed by the distance method, and then the maximum parsimony method. However, the maximum likelihood method requires significant memory resources, which limits its application to more moderately sized protein datasets. Public Library of Science 2010-11-15 /pmc/articles/PMC2981553/ /pubmed/21085574 http://dx.doi.org/10.1371/journal.pone.0013999 Text en Ropelewski et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Ropelewski, Alexander J. Nicholas, Hugh B. Gonzalez Mendez, Ricardo R. MPI-PHYLIP: Parallelizing Computationally Intensive Phylogenetic Analysis Routines for the Analysis of Large Protein Families
title	MPI-PHYLIP: Parallelizing Computationally Intensive Phylogenetic Analysis Routines for the Analysis of Large Protein Families
title_full	MPI-PHYLIP: Parallelizing Computationally Intensive Phylogenetic Analysis Routines for the Analysis of Large Protein Families
title_fullStr	MPI-PHYLIP: Parallelizing Computationally Intensive Phylogenetic Analysis Routines for the Analysis of Large Protein Families
title_full_unstemmed	MPI-PHYLIP: Parallelizing Computationally Intensive Phylogenetic Analysis Routines for the Analysis of Large Protein Families
title_short	MPI-PHYLIP: Parallelizing Computationally Intensive Phylogenetic Analysis Routines for the Analysis of Large Protein Families
title_sort	mpi-phylip: parallelizing computationally intensive phylogenetic analysis routines for the analysis of large protein families
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2981553/ https://www.ncbi.nlm.nih.gov/pubmed/21085574 http://dx.doi.org/10.1371/journal.pone.0013999
work_keys_str_mv	AT ropelewskialexanderj mpiphylipparallelizingcomputationallyintensivephylogeneticanalysisroutinesfortheanalysisoflargeproteinfamilies AT nicholashughb mpiphylipparallelizingcomputationallyintensivephylogeneticanalysisroutinesfortheanalysisoflargeproteinfamilies AT gonzalezmendezricardor mpiphylipparallelizingcomputationallyintensivephylogeneticanalysisroutinesfortheanalysisoflargeproteinfamilies

MPI-PHYLIP: Parallelizing Computationally Intensive Phylogenetic Analysis Routines for the Analysis of Large Protein Families

Ejemplares similares