Cargando…

A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes

BACKGROUND: Whole-genome approaches are widely preferred for species delineation in prokaryotes. However, these methods require pairwise alignments and calculations at the whole-genome level and thus are computationally intensive. To address this problem, a strategy consisting of sieving (pre-select...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Yizhuang, Zheng, Jifang, Wu, Yepeng, Zhang, Wenting, Jin, Junfei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7045542/
https://www.ncbi.nlm.nih.gov/pubmed/32102653
http://dx.doi.org/10.1186/s12864-020-6597-x
_version_ 1783501797640372224
author Zhou, Yizhuang
Zheng, Jifang
Wu, Yepeng
Zhang, Wenting
Jin, Junfei
author_facet Zhou, Yizhuang
Zheng, Jifang
Wu, Yepeng
Zhang, Wenting
Jin, Junfei
author_sort Zhou, Yizhuang
collection PubMed
description BACKGROUND: Whole-genome approaches are widely preferred for species delineation in prokaryotes. However, these methods require pairwise alignments and calculations at the whole-genome level and thus are computationally intensive. To address this problem, a strategy consisting of sieving (pre-selecting closely related genomes) followed by alignment and calculation has been proposed. RESULTS: Here, we initially test a published approach called “genome-wide tetranucleotide frequency correlation coefficient” (TETRA), which is specially tailored for sieving. Our results show that sieving by TETRA requires > 40% completeness for both genomes of a pair to yield > 95% sensitivity, indicating that TETRA is completeness-dependent. Accordingly, we develop a novel algorithm called “fragment tetranucleotide frequency correlation coefficient” (FRAGTE), which uses fragments rather than whole genomes for sieving. Our results show that FRAGTE achieves ~ 100% sensitivity and high specificity on simulated genomes, real genomes and metagenome-assembled genomes, demonstrating that FRAGTE is completeness-independent. Additionally, FRAGTE sieved a reduced number of total genomes for subsequent alignment and calculation to greatly improve computational efficiency for the process after sieving. Aside from this computational improvement, FRAGTE also reduces the computational cost for the sieving process. Consequently, FRAGTE extremely improves run efficiency for both the processes of sieving and after sieving (subsequent alignment and calculation) to together accelerate genome-wide species delineation. CONCLUSIONS: FRAGTE is a completeness-independent algorithm for sieving. Due to its high sensitivity, high specificity, highly reduced number of sieved genomes and highly improved runtime, FRAGTE will be helpful for whole-genome approaches to facilitate taxonomic studies in prokaryotes.
format Online
Article
Text
id pubmed-7045542
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-70455422020-03-03 A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes Zhou, Yizhuang Zheng, Jifang Wu, Yepeng Zhang, Wenting Jin, Junfei BMC Genomics Methodology Article BACKGROUND: Whole-genome approaches are widely preferred for species delineation in prokaryotes. However, these methods require pairwise alignments and calculations at the whole-genome level and thus are computationally intensive. To address this problem, a strategy consisting of sieving (pre-selecting closely related genomes) followed by alignment and calculation has been proposed. RESULTS: Here, we initially test a published approach called “genome-wide tetranucleotide frequency correlation coefficient” (TETRA), which is specially tailored for sieving. Our results show that sieving by TETRA requires > 40% completeness for both genomes of a pair to yield > 95% sensitivity, indicating that TETRA is completeness-dependent. Accordingly, we develop a novel algorithm called “fragment tetranucleotide frequency correlation coefficient” (FRAGTE), which uses fragments rather than whole genomes for sieving. Our results show that FRAGTE achieves ~ 100% sensitivity and high specificity on simulated genomes, real genomes and metagenome-assembled genomes, demonstrating that FRAGTE is completeness-independent. Additionally, FRAGTE sieved a reduced number of total genomes for subsequent alignment and calculation to greatly improve computational efficiency for the process after sieving. Aside from this computational improvement, FRAGTE also reduces the computational cost for the sieving process. Consequently, FRAGTE extremely improves run efficiency for both the processes of sieving and after sieving (subsequent alignment and calculation) to together accelerate genome-wide species delineation. CONCLUSIONS: FRAGTE is a completeness-independent algorithm for sieving. Due to its high sensitivity, high specificity, highly reduced number of sieved genomes and highly improved runtime, FRAGTE will be helpful for whole-genome approaches to facilitate taxonomic studies in prokaryotes. BioMed Central 2020-02-26 /pmc/articles/PMC7045542/ /pubmed/32102653 http://dx.doi.org/10.1186/s12864-020-6597-x Text en © The Author(s). 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Zhou, Yizhuang
Zheng, Jifang
Wu, Yepeng
Zhang, Wenting
Jin, Junfei
A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes
title A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes
title_full A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes
title_fullStr A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes
title_full_unstemmed A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes
title_short A completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes
title_sort completeness-independent method for pre-selection of closely related genomes for species delineation in prokaryotes
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7045542/
https://www.ncbi.nlm.nih.gov/pubmed/32102653
http://dx.doi.org/10.1186/s12864-020-6597-x
work_keys_str_mv AT zhouyizhuang acompletenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes
AT zhengjifang acompletenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes
AT wuyepeng acompletenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes
AT zhangwenting acompletenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes
AT jinjunfei acompletenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes
AT zhouyizhuang completenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes
AT zhengjifang completenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes
AT wuyepeng completenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes
AT zhangwenting completenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes
AT jinjunfei completenessindependentmethodforpreselectionofcloselyrelatedgenomesforspeciesdelineationinprokaryotes