Cargando…

EasyCGTree: a pipeline for prokaryotic phylogenomic analysis based on core gene sets

BACKGROUND: Genome-scale phylogenetic analysis based on core gene sets is routinely used in microbiological research. However, the techniques are still not approachable for individuals with little bioinformatics experience. Here, we present EasyCGTree, a user-friendly and cross-platform pipeline to...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Dao-Feng, He, Wei, Shao, Zongze, Ahmed, Iftikhar, Zhang, Yuqin, Li, Wen-Jun, Zhao, Zhe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10576351/
https://www.ncbi.nlm.nih.gov/pubmed/37838689
http://dx.doi.org/10.1186/s12859-023-05527-2
_version_ 1785121103418490880
author Zhang, Dao-Feng
He, Wei
Shao, Zongze
Ahmed, Iftikhar
Zhang, Yuqin
Li, Wen-Jun
Zhao, Zhe
author_facet Zhang, Dao-Feng
He, Wei
Shao, Zongze
Ahmed, Iftikhar
Zhang, Yuqin
Li, Wen-Jun
Zhao, Zhe
author_sort Zhang, Dao-Feng
collection PubMed
description BACKGROUND: Genome-scale phylogenetic analysis based on core gene sets is routinely used in microbiological research. However, the techniques are still not approachable for individuals with little bioinformatics experience. Here, we present EasyCGTree, a user-friendly and cross-platform pipeline to reconstruct genome-scale maximum-likehood (ML) phylogenetic tree using supermatrix (SM) and supertree (ST) approaches. RESULTS: EasyCGTree was implemented in Perl programming languages and was built using a collection of published reputable programs. All the programs were precompiled as standalone executable files and contained in the EasyCGTree package. It can run after installing Perl language environment. Several profile hidden Markov models (HMMs) of core gene sets were prepared in advance to construct a profile HMM database (PHD) that was enclosed in the package and available for homolog searching. Customized gene sets can also be used to build profile HMM and added to the PHD via EasyCGTree. Taking 43 genomes of the genus Paracoccus as the testing data set, consensus (a variant of the typical SM), SM, and ST trees were inferred via EasyCGTree successfully, and the SM trees were compared with those inferred via the pipelines UBCG and bcgTree, using the metrics of cophenetic correlation coefficients (CCC) and Robinson–Foulds distance (topological distance). The results suggested that EasyCGTree can infer SM trees with nearly identical topology (distance < 0.1) and accuracy (CCC > 0.99) to those of trees inferred with the two pipelines. CONCLUSIONS: EasyCGTree is an all-in-one automatic pipeline from input data to phylogenomic tree with guaranteed accuracy, and is much easier to install and use than the reference pipelines. In addition, ST is implemented in EasyCGTree conveniently and can be used to explore prokaryotic evolutionary signals from a different perspective. The EasyCGTree version 4 is freely available for Linux and Windows users at Github (https://github.com/zdf1987/EasyCGTree4). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05527-2.
format Online
Article
Text
id pubmed-10576351
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-105763512023-10-15 EasyCGTree: a pipeline for prokaryotic phylogenomic analysis based on core gene sets Zhang, Dao-Feng He, Wei Shao, Zongze Ahmed, Iftikhar Zhang, Yuqin Li, Wen-Jun Zhao, Zhe BMC Bioinformatics Software BACKGROUND: Genome-scale phylogenetic analysis based on core gene sets is routinely used in microbiological research. However, the techniques are still not approachable for individuals with little bioinformatics experience. Here, we present EasyCGTree, a user-friendly and cross-platform pipeline to reconstruct genome-scale maximum-likehood (ML) phylogenetic tree using supermatrix (SM) and supertree (ST) approaches. RESULTS: EasyCGTree was implemented in Perl programming languages and was built using a collection of published reputable programs. All the programs were precompiled as standalone executable files and contained in the EasyCGTree package. It can run after installing Perl language environment. Several profile hidden Markov models (HMMs) of core gene sets were prepared in advance to construct a profile HMM database (PHD) that was enclosed in the package and available for homolog searching. Customized gene sets can also be used to build profile HMM and added to the PHD via EasyCGTree. Taking 43 genomes of the genus Paracoccus as the testing data set, consensus (a variant of the typical SM), SM, and ST trees were inferred via EasyCGTree successfully, and the SM trees were compared with those inferred via the pipelines UBCG and bcgTree, using the metrics of cophenetic correlation coefficients (CCC) and Robinson–Foulds distance (topological distance). The results suggested that EasyCGTree can infer SM trees with nearly identical topology (distance < 0.1) and accuracy (CCC > 0.99) to those of trees inferred with the two pipelines. CONCLUSIONS: EasyCGTree is an all-in-one automatic pipeline from input data to phylogenomic tree with guaranteed accuracy, and is much easier to install and use than the reference pipelines. In addition, ST is implemented in EasyCGTree conveniently and can be used to explore prokaryotic evolutionary signals from a different perspective. The EasyCGTree version 4 is freely available for Linux and Windows users at Github (https://github.com/zdf1987/EasyCGTree4). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05527-2. BioMed Central 2023-10-14 /pmc/articles/PMC10576351/ /pubmed/37838689 http://dx.doi.org/10.1186/s12859-023-05527-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Zhang, Dao-Feng
He, Wei
Shao, Zongze
Ahmed, Iftikhar
Zhang, Yuqin
Li, Wen-Jun
Zhao, Zhe
EasyCGTree: a pipeline for prokaryotic phylogenomic analysis based on core gene sets
title EasyCGTree: a pipeline for prokaryotic phylogenomic analysis based on core gene sets
title_full EasyCGTree: a pipeline for prokaryotic phylogenomic analysis based on core gene sets
title_fullStr EasyCGTree: a pipeline for prokaryotic phylogenomic analysis based on core gene sets
title_full_unstemmed EasyCGTree: a pipeline for prokaryotic phylogenomic analysis based on core gene sets
title_short EasyCGTree: a pipeline for prokaryotic phylogenomic analysis based on core gene sets
title_sort easycgtree: a pipeline for prokaryotic phylogenomic analysis based on core gene sets
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10576351/
https://www.ncbi.nlm.nih.gov/pubmed/37838689
http://dx.doi.org/10.1186/s12859-023-05527-2
work_keys_str_mv AT zhangdaofeng easycgtreeapipelineforprokaryoticphylogenomicanalysisbasedoncoregenesets
AT hewei easycgtreeapipelineforprokaryoticphylogenomicanalysisbasedoncoregenesets
AT shaozongze easycgtreeapipelineforprokaryoticphylogenomicanalysisbasedoncoregenesets
AT ahmediftikhar easycgtreeapipelineforprokaryoticphylogenomicanalysisbasedoncoregenesets
AT zhangyuqin easycgtreeapipelineforprokaryoticphylogenomicanalysisbasedoncoregenesets
AT liwenjun easycgtreeapipelineforprokaryoticphylogenomicanalysisbasedoncoregenesets
AT zhaozhe easycgtreeapipelineforprokaryoticphylogenomicanalysisbasedoncoregenesets