Cargando…

Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach

Among proteins, orthologs are defined as those that are derived by vertical descent from a single progenitor in the last common ancestor of their host organisms. Our goal is to compute a complete set of protein orthologs derived from all currently available complete bacterial and archaeal genomes. T...

Descripción completa

Detalles Bibliográficos
Autores principales: Halachev, Mihail R., Loman, Nicholas J., Pallen, Mark J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3236195/
https://www.ncbi.nlm.nih.gov/pubmed/22174796
http://dx.doi.org/10.1371/journal.pone.0028388
_version_ 1782218701763772416
author Halachev, Mihail R.
Loman, Nicholas J.
Pallen, Mark J.
author_facet Halachev, Mihail R.
Loman, Nicholas J.
Pallen, Mark J.
author_sort Halachev, Mihail R.
collection PubMed
description Among proteins, orthologs are defined as those that are derived by vertical descent from a single progenitor in the last common ancestor of their host organisms. Our goal is to compute a complete set of protein orthologs derived from all currently available complete bacterial and archaeal genomes. Traditional approaches typically rely on all-against-all BLAST searching which is prohibitively expensive in terms of hardware requirements or computational time (requiring an estimated 18 months or more on a typical server). Here, we present xBASE-Orth, a system for ongoing ortholog annotation, which applies a “divide and conquer” approach and adopts a pragmatic scheme that trades accuracy for speed. Starting at species level, xBASE-Orth carefully constructs and uses pan-genomes as proxies for the full collections of coding sequences at each level as it progressively climbs the taxonomic tree using the previously computed data. This leads to a significant decrease in the number of alignments that need to be performed, which translates into faster computation, making ortholog computation possible on a global scale. Using xBASE-Orth, we analyzed an NCBI collection of 1,288 bacterial and 94 archaeal complete genomes with more than 4 million coding sequences in 5 weeks and predicted more than 700 million ortholog pairs, clustered in 175,531 orthologous groups. We have also identified sets of highly conserved bacterial and archaeal orthologs and in so doing have highlighted anomalies in genome annotation and in the proposed composition of the minimal bacterial genome. In summary, our approach allows for scalable and efficient computation of the bacterial and archaeal ortholog annotations. In addition, due to its hierarchical nature, it is suitable for incorporating novel complete genomes and alternative genome annotations. The computed ortholog data and a continuously evolving set of applications based on it are integrated in the xBASE database, available at http://www.xbase.ac.uk/.
format Online
Article
Text
id pubmed-3236195
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-32361952011-12-15 Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach Halachev, Mihail R. Loman, Nicholas J. Pallen, Mark J. PLoS One Research Article Among proteins, orthologs are defined as those that are derived by vertical descent from a single progenitor in the last common ancestor of their host organisms. Our goal is to compute a complete set of protein orthologs derived from all currently available complete bacterial and archaeal genomes. Traditional approaches typically rely on all-against-all BLAST searching which is prohibitively expensive in terms of hardware requirements or computational time (requiring an estimated 18 months or more on a typical server). Here, we present xBASE-Orth, a system for ongoing ortholog annotation, which applies a “divide and conquer” approach and adopts a pragmatic scheme that trades accuracy for speed. Starting at species level, xBASE-Orth carefully constructs and uses pan-genomes as proxies for the full collections of coding sequences at each level as it progressively climbs the taxonomic tree using the previously computed data. This leads to a significant decrease in the number of alignments that need to be performed, which translates into faster computation, making ortholog computation possible on a global scale. Using xBASE-Orth, we analyzed an NCBI collection of 1,288 bacterial and 94 archaeal complete genomes with more than 4 million coding sequences in 5 weeks and predicted more than 700 million ortholog pairs, clustered in 175,531 orthologous groups. We have also identified sets of highly conserved bacterial and archaeal orthologs and in so doing have highlighted anomalies in genome annotation and in the proposed composition of the minimal bacterial genome. In summary, our approach allows for scalable and efficient computation of the bacterial and archaeal ortholog annotations. In addition, due to its hierarchical nature, it is suitable for incorporating novel complete genomes and alternative genome annotations. The computed ortholog data and a continuously evolving set of applications based on it are integrated in the xBASE database, available at http://www.xbase.ac.uk/. Public Library of Science 2011-12-12 /pmc/articles/PMC3236195/ /pubmed/22174796 http://dx.doi.org/10.1371/journal.pone.0028388 Text en Halachev et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Halachev, Mihail R.
Loman, Nicholas J.
Pallen, Mark J.
Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach
title Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach
title_full Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach
title_fullStr Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach
title_full_unstemmed Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach
title_short Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach
title_sort calculating orthologs in bacteria and archaea: a divide and conquer approach
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3236195/
https://www.ncbi.nlm.nih.gov/pubmed/22174796
http://dx.doi.org/10.1371/journal.pone.0028388
work_keys_str_mv AT halachevmihailr calculatingorthologsinbacteriaandarchaeaadivideandconquerapproach
AT lomannicholasj calculatingorthologsinbacteriaandarchaeaadivideandconquerapproach
AT pallenmarkj calculatingorthologsinbacteriaandarchaeaadivideandconquerapproach