Cargando…

Reconstructing evolutionary trees in parallel for massive sequences

BACKGROUND: Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed re...

Descripción completa

Detalles Bibliográficos
Autores principales: Zou, Quan, Wan, Shixiang, Zeng, Xiangxiang, Ma, Zhanshan Sam
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5751538/
https://www.ncbi.nlm.nih.gov/pubmed/29297337
http://dx.doi.org/10.1186/s12918-017-0476-3
_version_ 1783289966663565312
author Zou, Quan
Wan, Shixiang
Zeng, Xiangxiang
Ma, Zhanshan Sam
author_facet Zou, Quan
Wan, Shixiang
Zeng, Xiangxiang
Ma, Zhanshan Sam
author_sort Zou, Quan
collection PubMed
description BACKGROUND: Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. RESULTS: HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. CONCLUSIONS: In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/.
format Online
Article
Text
id pubmed-5751538
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57515382018-01-05 Reconstructing evolutionary trees in parallel for massive sequences Zou, Quan Wan, Shixiang Zeng, Xiangxiang Ma, Zhanshan Sam BMC Syst Biol Research BACKGROUND: Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. RESULTS: HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. CONCLUSIONS: In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/. BioMed Central 2017-12-14 /pmc/articles/PMC5751538/ /pubmed/29297337 http://dx.doi.org/10.1186/s12918-017-0476-3 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Zou, Quan
Wan, Shixiang
Zeng, Xiangxiang
Ma, Zhanshan Sam
Reconstructing evolutionary trees in parallel for massive sequences
title Reconstructing evolutionary trees in parallel for massive sequences
title_full Reconstructing evolutionary trees in parallel for massive sequences
title_fullStr Reconstructing evolutionary trees in parallel for massive sequences
title_full_unstemmed Reconstructing evolutionary trees in parallel for massive sequences
title_short Reconstructing evolutionary trees in parallel for massive sequences
title_sort reconstructing evolutionary trees in parallel for massive sequences
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5751538/
https://www.ncbi.nlm.nih.gov/pubmed/29297337
http://dx.doi.org/10.1186/s12918-017-0476-3
work_keys_str_mv AT zouquan reconstructingevolutionarytreesinparallelformassivesequences
AT wanshixiang reconstructingevolutionarytreesinparallelformassivesequences
AT zengxiangxiang reconstructingevolutionarytreesinparallelformassivesequences
AT mazhanshansam reconstructingevolutionarytreesinparallelformassivesequences