Cargando…

eggNOG v4.0: nested orthology inference across 3686 organisms

With the increasing availability of various ‘omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs)...

Descripción completa

Detalles Bibliográficos
Autores principales: Powell, Sean, Forslund, Kristoffer, Szklarczyk, Damian, Trachana, Kalliopi, Roth, Alexander, Huerta-Cepas, Jaime, Gabaldón, Toni, Rattei, Thomas, Creevey, Chris, Kuhn, Michael, Jensen, Lars J., von Mering, Christian, Bork, Peer
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3964997/
https://www.ncbi.nlm.nih.gov/pubmed/24297252
http://dx.doi.org/10.1093/nar/gkt1253
_version_ 1782479275656478720
author Powell, Sean
Forslund, Kristoffer
Szklarczyk, Damian
Trachana, Kalliopi
Roth, Alexander
Huerta-Cepas, Jaime
Gabaldón, Toni
Rattei, Thomas
Creevey, Chris
Kuhn, Michael
Jensen, Lars J.
von Mering, Christian
Bork, Peer
author_facet Powell, Sean
Forslund, Kristoffer
Szklarczyk, Damian
Trachana, Kalliopi
Roth, Alexander
Huerta-Cepas, Jaime
Gabaldón, Toni
Rattei, Thomas
Creevey, Chris
Kuhn, Michael
Jensen, Lars J.
von Mering, Christian
Bork, Peer
author_sort Powell, Sean
collection PubMed
description With the increasing availability of various ‘omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehensive characterization and analysis pipeline to the resulting gene families. Compared with the previous version, we have more than tripled the underlying species set to cover 3686 organisms, keeping track with genome project completions while prioritizing the inclusion of high-quality genomes to minimize error propagation from incomplete proteome sets. Major technological advances include (i) a robust and scalable procedure for the identification and inclusion of high-quality genomes, (ii) provision of orthologous groups for 107 different taxonomic levels compared with 41 in eggNOGv3, (iii) identification and annotation of particularly closely related orthologous groups, facilitating analysis of related gene families, (iv) improvements of the clustering and functional annotation approach, (v) adoption of a revised tree building procedure based on the multiple alignments generated during the process and (vi) implementation of quality control procedures throughout the entire pipeline. As in previous versions, eggNOGv4 provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Users can access the complete database of orthologous groups via a web interface, as well as through bulk download.
format Online
Article
Text
id pubmed-3964997
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-39649972014-03-25 eggNOG v4.0: nested orthology inference across 3686 organisms Powell, Sean Forslund, Kristoffer Szklarczyk, Damian Trachana, Kalliopi Roth, Alexander Huerta-Cepas, Jaime Gabaldón, Toni Rattei, Thomas Creevey, Chris Kuhn, Michael Jensen, Lars J. von Mering, Christian Bork, Peer Nucleic Acids Res II. Protein sequence and structure, motifs and domains With the increasing availability of various ‘omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehensive characterization and analysis pipeline to the resulting gene families. Compared with the previous version, we have more than tripled the underlying species set to cover 3686 organisms, keeping track with genome project completions while prioritizing the inclusion of high-quality genomes to minimize error propagation from incomplete proteome sets. Major technological advances include (i) a robust and scalable procedure for the identification and inclusion of high-quality genomes, (ii) provision of orthologous groups for 107 different taxonomic levels compared with 41 in eggNOGv3, (iii) identification and annotation of particularly closely related orthologous groups, facilitating analysis of related gene families, (iv) improvements of the clustering and functional annotation approach, (v) adoption of a revised tree building procedure based on the multiple alignments generated during the process and (vi) implementation of quality control procedures throughout the entire pipeline. As in previous versions, eggNOGv4 provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Users can access the complete database of orthologous groups via a web interface, as well as through bulk download. Oxford University Press 2014-01-01 2013-11-30 /pmc/articles/PMC3964997/ /pubmed/24297252 http://dx.doi.org/10.1093/nar/gkt1253 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle II. Protein sequence and structure, motifs and domains
Powell, Sean
Forslund, Kristoffer
Szklarczyk, Damian
Trachana, Kalliopi
Roth, Alexander
Huerta-Cepas, Jaime
Gabaldón, Toni
Rattei, Thomas
Creevey, Chris
Kuhn, Michael
Jensen, Lars J.
von Mering, Christian
Bork, Peer
eggNOG v4.0: nested orthology inference across 3686 organisms
title eggNOG v4.0: nested orthology inference across 3686 organisms
title_full eggNOG v4.0: nested orthology inference across 3686 organisms
title_fullStr eggNOG v4.0: nested orthology inference across 3686 organisms
title_full_unstemmed eggNOG v4.0: nested orthology inference across 3686 organisms
title_short eggNOG v4.0: nested orthology inference across 3686 organisms
title_sort eggnog v4.0: nested orthology inference across 3686 organisms
topic II. Protein sequence and structure, motifs and domains
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3964997/
https://www.ncbi.nlm.nih.gov/pubmed/24297252
http://dx.doi.org/10.1093/nar/gkt1253
work_keys_str_mv AT powellsean eggnogv40nestedorthologyinferenceacross3686organisms
AT forslundkristoffer eggnogv40nestedorthologyinferenceacross3686organisms
AT szklarczykdamian eggnogv40nestedorthologyinferenceacross3686organisms
AT trachanakalliopi eggnogv40nestedorthologyinferenceacross3686organisms
AT rothalexander eggnogv40nestedorthologyinferenceacross3686organisms
AT huertacepasjaime eggnogv40nestedorthologyinferenceacross3686organisms
AT gabaldontoni eggnogv40nestedorthologyinferenceacross3686organisms
AT ratteithomas eggnogv40nestedorthologyinferenceacross3686organisms
AT creeveychris eggnogv40nestedorthologyinferenceacross3686organisms
AT kuhnmichael eggnogv40nestedorthologyinferenceacross3686organisms
AT jensenlarsj eggnogv40nestedorthologyinferenceacross3686organisms
AT vonmeringchristian eggnogv40nestedorthologyinferenceacross3686organisms
AT borkpeer eggnogv40nestedorthologyinferenceacross3686organisms