Cargando…
Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach
Among proteins, orthologs are defined as those that are derived by vertical descent from a single progenitor in the last common ancestor of their host organisms. Our goal is to compute a complete set of protein orthologs derived from all currently available complete bacterial and archaeal genomes. T...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3236195/ https://www.ncbi.nlm.nih.gov/pubmed/22174796 http://dx.doi.org/10.1371/journal.pone.0028388 |
_version_ | 1782218701763772416 |
---|---|
author | Halachev, Mihail R. Loman, Nicholas J. Pallen, Mark J. |
author_facet | Halachev, Mihail R. Loman, Nicholas J. Pallen, Mark J. |
author_sort | Halachev, Mihail R. |
collection | PubMed |
description | Among proteins, orthologs are defined as those that are derived by vertical descent from a single progenitor in the last common ancestor of their host organisms. Our goal is to compute a complete set of protein orthologs derived from all currently available complete bacterial and archaeal genomes. Traditional approaches typically rely on all-against-all BLAST searching which is prohibitively expensive in terms of hardware requirements or computational time (requiring an estimated 18 months or more on a typical server). Here, we present xBASE-Orth, a system for ongoing ortholog annotation, which applies a “divide and conquer” approach and adopts a pragmatic scheme that trades accuracy for speed. Starting at species level, xBASE-Orth carefully constructs and uses pan-genomes as proxies for the full collections of coding sequences at each level as it progressively climbs the taxonomic tree using the previously computed data. This leads to a significant decrease in the number of alignments that need to be performed, which translates into faster computation, making ortholog computation possible on a global scale. Using xBASE-Orth, we analyzed an NCBI collection of 1,288 bacterial and 94 archaeal complete genomes with more than 4 million coding sequences in 5 weeks and predicted more than 700 million ortholog pairs, clustered in 175,531 orthologous groups. We have also identified sets of highly conserved bacterial and archaeal orthologs and in so doing have highlighted anomalies in genome annotation and in the proposed composition of the minimal bacterial genome. In summary, our approach allows for scalable and efficient computation of the bacterial and archaeal ortholog annotations. In addition, due to its hierarchical nature, it is suitable for incorporating novel complete genomes and alternative genome annotations. The computed ortholog data and a continuously evolving set of applications based on it are integrated in the xBASE database, available at http://www.xbase.ac.uk/. |
format | Online Article Text |
id | pubmed-3236195 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-32361952011-12-15 Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach Halachev, Mihail R. Loman, Nicholas J. Pallen, Mark J. PLoS One Research Article Among proteins, orthologs are defined as those that are derived by vertical descent from a single progenitor in the last common ancestor of their host organisms. Our goal is to compute a complete set of protein orthologs derived from all currently available complete bacterial and archaeal genomes. Traditional approaches typically rely on all-against-all BLAST searching which is prohibitively expensive in terms of hardware requirements or computational time (requiring an estimated 18 months or more on a typical server). Here, we present xBASE-Orth, a system for ongoing ortholog annotation, which applies a “divide and conquer” approach and adopts a pragmatic scheme that trades accuracy for speed. Starting at species level, xBASE-Orth carefully constructs and uses pan-genomes as proxies for the full collections of coding sequences at each level as it progressively climbs the taxonomic tree using the previously computed data. This leads to a significant decrease in the number of alignments that need to be performed, which translates into faster computation, making ortholog computation possible on a global scale. Using xBASE-Orth, we analyzed an NCBI collection of 1,288 bacterial and 94 archaeal complete genomes with more than 4 million coding sequences in 5 weeks and predicted more than 700 million ortholog pairs, clustered in 175,531 orthologous groups. We have also identified sets of highly conserved bacterial and archaeal orthologs and in so doing have highlighted anomalies in genome annotation and in the proposed composition of the minimal bacterial genome. In summary, our approach allows for scalable and efficient computation of the bacterial and archaeal ortholog annotations. In addition, due to its hierarchical nature, it is suitable for incorporating novel complete genomes and alternative genome annotations. The computed ortholog data and a continuously evolving set of applications based on it are integrated in the xBASE database, available at http://www.xbase.ac.uk/. Public Library of Science 2011-12-12 /pmc/articles/PMC3236195/ /pubmed/22174796 http://dx.doi.org/10.1371/journal.pone.0028388 Text en Halachev et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Halachev, Mihail R. Loman, Nicholas J. Pallen, Mark J. Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach |
title | Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach |
title_full | Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach |
title_fullStr | Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach |
title_full_unstemmed | Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach |
title_short | Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach |
title_sort | calculating orthologs in bacteria and archaea: a divide and conquer approach |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3236195/ https://www.ncbi.nlm.nih.gov/pubmed/22174796 http://dx.doi.org/10.1371/journal.pone.0028388 |
work_keys_str_mv | AT halachevmihailr calculatingorthologsinbacteriaandarchaeaadivideandconquerapproach AT lomannicholasj calculatingorthologsinbacteriaandarchaeaadivideandconquerapproach AT pallenmarkj calculatingorthologsinbacteriaandarchaeaadivideandconquerapproach |