Cargando…

Reconstructing evolutionary trees in parallel for massive sequences

BACKGROUND: Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed re...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zou, Quan, Wan, Shixiang, Zeng, Xiangxiang, Ma, Zhanshan Sam
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5751538/ https://www.ncbi.nlm.nih.gov/pubmed/29297337 http://dx.doi.org/10.1186/s12918-017-0476-3

_version_	1783289966663565312
author	Zou, Quan Wan, Shixiang Zeng, Xiangxiang Ma, Zhanshan Sam
author_facet	Zou, Quan Wan, Shixiang Zeng, Xiangxiang Ma, Zhanshan Sam
author_sort	Zou, Quan
collection	PubMed
description	BACKGROUND: Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. RESULTS: HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. CONCLUSIONS: In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/.
format	Online Article Text
id	pubmed-5751538
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-57515382018-01-05 Reconstructing evolutionary trees in parallel for massive sequences Zou, Quan Wan, Shixiang Zeng, Xiangxiang Ma, Zhanshan Sam BMC Syst Biol Research BACKGROUND: Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. RESULTS: HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. CONCLUSIONS: In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/. BioMed Central 2017-12-14 /pmc/articles/PMC5751538/ /pubmed/29297337 http://dx.doi.org/10.1186/s12918-017-0476-3 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Zou, Quan Wan, Shixiang Zeng, Xiangxiang Ma, Zhanshan Sam Reconstructing evolutionary trees in parallel for massive sequences
title	Reconstructing evolutionary trees in parallel for massive sequences
title_full	Reconstructing evolutionary trees in parallel for massive sequences
title_fullStr	Reconstructing evolutionary trees in parallel for massive sequences
title_full_unstemmed	Reconstructing evolutionary trees in parallel for massive sequences
title_short	Reconstructing evolutionary trees in parallel for massive sequences
title_sort	reconstructing evolutionary trees in parallel for massive sequences
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5751538/ https://www.ncbi.nlm.nih.gov/pubmed/29297337 http://dx.doi.org/10.1186/s12918-017-0476-3
work_keys_str_mv	AT zouquan reconstructingevolutionarytreesinparallelformassivesequences AT wanshixiang reconstructingevolutionarytreesinparallelformassivesequences AT zengxiangxiang reconstructingevolutionarytreesinparallelformassivesequences AT mazhanshansam reconstructingevolutionarytreesinparallelformassivesequences

Reconstructing evolutionary trees in parallel for massive sequences

Ejemplares similares