Cargando…
EasyCGTree: a pipeline for prokaryotic phylogenomic analysis based on core gene sets
BACKGROUND: Genome-scale phylogenetic analysis based on core gene sets is routinely used in microbiological research. However, the techniques are still not approachable for individuals with little bioinformatics experience. Here, we present EasyCGTree, a user-friendly and cross-platform pipeline to...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10576351/ https://www.ncbi.nlm.nih.gov/pubmed/37838689 http://dx.doi.org/10.1186/s12859-023-05527-2 |
_version_ | 1785121103418490880 |
---|---|
author | Zhang, Dao-Feng He, Wei Shao, Zongze Ahmed, Iftikhar Zhang, Yuqin Li, Wen-Jun Zhao, Zhe |
author_facet | Zhang, Dao-Feng He, Wei Shao, Zongze Ahmed, Iftikhar Zhang, Yuqin Li, Wen-Jun Zhao, Zhe |
author_sort | Zhang, Dao-Feng |
collection | PubMed |
description | BACKGROUND: Genome-scale phylogenetic analysis based on core gene sets is routinely used in microbiological research. However, the techniques are still not approachable for individuals with little bioinformatics experience. Here, we present EasyCGTree, a user-friendly and cross-platform pipeline to reconstruct genome-scale maximum-likehood (ML) phylogenetic tree using supermatrix (SM) and supertree (ST) approaches. RESULTS: EasyCGTree was implemented in Perl programming languages and was built using a collection of published reputable programs. All the programs were precompiled as standalone executable files and contained in the EasyCGTree package. It can run after installing Perl language environment. Several profile hidden Markov models (HMMs) of core gene sets were prepared in advance to construct a profile HMM database (PHD) that was enclosed in the package and available for homolog searching. Customized gene sets can also be used to build profile HMM and added to the PHD via EasyCGTree. Taking 43 genomes of the genus Paracoccus as the testing data set, consensus (a variant of the typical SM), SM, and ST trees were inferred via EasyCGTree successfully, and the SM trees were compared with those inferred via the pipelines UBCG and bcgTree, using the metrics of cophenetic correlation coefficients (CCC) and Robinson–Foulds distance (topological distance). The results suggested that EasyCGTree can infer SM trees with nearly identical topology (distance < 0.1) and accuracy (CCC > 0.99) to those of trees inferred with the two pipelines. CONCLUSIONS: EasyCGTree is an all-in-one automatic pipeline from input data to phylogenomic tree with guaranteed accuracy, and is much easier to install and use than the reference pipelines. In addition, ST is implemented in EasyCGTree conveniently and can be used to explore prokaryotic evolutionary signals from a different perspective. The EasyCGTree version 4 is freely available for Linux and Windows users at Github (https://github.com/zdf1987/EasyCGTree4). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05527-2. |
format | Online Article Text |
id | pubmed-10576351 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-105763512023-10-15 EasyCGTree: a pipeline for prokaryotic phylogenomic analysis based on core gene sets Zhang, Dao-Feng He, Wei Shao, Zongze Ahmed, Iftikhar Zhang, Yuqin Li, Wen-Jun Zhao, Zhe BMC Bioinformatics Software BACKGROUND: Genome-scale phylogenetic analysis based on core gene sets is routinely used in microbiological research. However, the techniques are still not approachable for individuals with little bioinformatics experience. Here, we present EasyCGTree, a user-friendly and cross-platform pipeline to reconstruct genome-scale maximum-likehood (ML) phylogenetic tree using supermatrix (SM) and supertree (ST) approaches. RESULTS: EasyCGTree was implemented in Perl programming languages and was built using a collection of published reputable programs. All the programs were precompiled as standalone executable files and contained in the EasyCGTree package. It can run after installing Perl language environment. Several profile hidden Markov models (HMMs) of core gene sets were prepared in advance to construct a profile HMM database (PHD) that was enclosed in the package and available for homolog searching. Customized gene sets can also be used to build profile HMM and added to the PHD via EasyCGTree. Taking 43 genomes of the genus Paracoccus as the testing data set, consensus (a variant of the typical SM), SM, and ST trees were inferred via EasyCGTree successfully, and the SM trees were compared with those inferred via the pipelines UBCG and bcgTree, using the metrics of cophenetic correlation coefficients (CCC) and Robinson–Foulds distance (topological distance). The results suggested that EasyCGTree can infer SM trees with nearly identical topology (distance < 0.1) and accuracy (CCC > 0.99) to those of trees inferred with the two pipelines. CONCLUSIONS: EasyCGTree is an all-in-one automatic pipeline from input data to phylogenomic tree with guaranteed accuracy, and is much easier to install and use than the reference pipelines. In addition, ST is implemented in EasyCGTree conveniently and can be used to explore prokaryotic evolutionary signals from a different perspective. The EasyCGTree version 4 is freely available for Linux and Windows users at Github (https://github.com/zdf1987/EasyCGTree4). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05527-2. BioMed Central 2023-10-14 /pmc/articles/PMC10576351/ /pubmed/37838689 http://dx.doi.org/10.1186/s12859-023-05527-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Zhang, Dao-Feng He, Wei Shao, Zongze Ahmed, Iftikhar Zhang, Yuqin Li, Wen-Jun Zhao, Zhe EasyCGTree: a pipeline for prokaryotic phylogenomic analysis based on core gene sets |
title | EasyCGTree: a pipeline for prokaryotic phylogenomic analysis based on core gene sets |
title_full | EasyCGTree: a pipeline for prokaryotic phylogenomic analysis based on core gene sets |
title_fullStr | EasyCGTree: a pipeline for prokaryotic phylogenomic analysis based on core gene sets |
title_full_unstemmed | EasyCGTree: a pipeline for prokaryotic phylogenomic analysis based on core gene sets |
title_short | EasyCGTree: a pipeline for prokaryotic phylogenomic analysis based on core gene sets |
title_sort | easycgtree: a pipeline for prokaryotic phylogenomic analysis based on core gene sets |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10576351/ https://www.ncbi.nlm.nih.gov/pubmed/37838689 http://dx.doi.org/10.1186/s12859-023-05527-2 |
work_keys_str_mv | AT zhangdaofeng easycgtreeapipelineforprokaryoticphylogenomicanalysisbasedoncoregenesets AT hewei easycgtreeapipelineforprokaryoticphylogenomicanalysisbasedoncoregenesets AT shaozongze easycgtreeapipelineforprokaryoticphylogenomicanalysisbasedoncoregenesets AT ahmediftikhar easycgtreeapipelineforprokaryoticphylogenomicanalysisbasedoncoregenesets AT zhangyuqin easycgtreeapipelineforprokaryoticphylogenomicanalysisbasedoncoregenesets AT liwenjun easycgtreeapipelineforprokaryoticphylogenomicanalysisbasedoncoregenesets AT zhaozhe easycgtreeapipelineforprokaryoticphylogenomicanalysisbasedoncoregenesets |