Cargando…
Reconstructing evolutionary trees in parallel for massive sequences
BACKGROUND: Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed re...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5751538/ https://www.ncbi.nlm.nih.gov/pubmed/29297337 http://dx.doi.org/10.1186/s12918-017-0476-3 |
_version_ | 1783289966663565312 |
---|---|
author | Zou, Quan Wan, Shixiang Zeng, Xiangxiang Ma, Zhanshan Sam |
author_facet | Zou, Quan Wan, Shixiang Zeng, Xiangxiang Ma, Zhanshan Sam |
author_sort | Zou, Quan |
collection | PubMed |
description | BACKGROUND: Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. RESULTS: HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. CONCLUSIONS: In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/. |
format | Online Article Text |
id | pubmed-5751538 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-57515382018-01-05 Reconstructing evolutionary trees in parallel for massive sequences Zou, Quan Wan, Shixiang Zeng, Xiangxiang Ma, Zhanshan Sam BMC Syst Biol Research BACKGROUND: Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. RESULTS: HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. CONCLUSIONS: In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/. BioMed Central 2017-12-14 /pmc/articles/PMC5751538/ /pubmed/29297337 http://dx.doi.org/10.1186/s12918-017-0476-3 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Zou, Quan Wan, Shixiang Zeng, Xiangxiang Ma, Zhanshan Sam Reconstructing evolutionary trees in parallel for massive sequences |
title | Reconstructing evolutionary trees in parallel for massive sequences |
title_full | Reconstructing evolutionary trees in parallel for massive sequences |
title_fullStr | Reconstructing evolutionary trees in parallel for massive sequences |
title_full_unstemmed | Reconstructing evolutionary trees in parallel for massive sequences |
title_short | Reconstructing evolutionary trees in parallel for massive sequences |
title_sort | reconstructing evolutionary trees in parallel for massive sequences |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5751538/ https://www.ncbi.nlm.nih.gov/pubmed/29297337 http://dx.doi.org/10.1186/s12918-017-0476-3 |
work_keys_str_mv | AT zouquan reconstructingevolutionarytreesinparallelformassivesequences AT wanshixiang reconstructingevolutionarytreesinparallelformassivesequences AT zengxiangxiang reconstructingevolutionarytreesinparallelformassivesequences AT mazhanshansam reconstructingevolutionarytreesinparallelformassivesequences |