Cargando…

EDGAR3.0: comparative genomics and phylogenomics on a scalable infrastructure

The EDGAR platform, a web server providing databases of precomputed orthology data for thousands of microbial genomes, is one of the most established tools in the field of comparative genomics and phylogenomics. Based on precomputed gene alignments, EDGAR allows quick identification of the different...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dieckmann, Marius Alfred, Beyvers, Sebastian, Nkouamedjo-Fankep, Rudel Christian, Hanel, Patrick Harald Georg, Jelonek, Lukas, Blom, Jochen, Goesmann, Alexander
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2021
Materias:	Web Server Issue
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8262741/ https://www.ncbi.nlm.nih.gov/pubmed/33988716 http://dx.doi.org/10.1093/nar/gkab341

_version_	1783719243019190272
author	Dieckmann, Marius Alfred Beyvers, Sebastian Nkouamedjo-Fankep, Rudel Christian Hanel, Patrick Harald Georg Jelonek, Lukas Blom, Jochen Goesmann, Alexander
author_facet	Dieckmann, Marius Alfred Beyvers, Sebastian Nkouamedjo-Fankep, Rudel Christian Hanel, Patrick Harald Georg Jelonek, Lukas Blom, Jochen Goesmann, Alexander
author_sort	Dieckmann, Marius Alfred
collection	PubMed
description	The EDGAR platform, a web server providing databases of precomputed orthology data for thousands of microbial genomes, is one of the most established tools in the field of comparative genomics and phylogenomics. Based on precomputed gene alignments, EDGAR allows quick identification of the differential gene content, i.e. the pan genome, the core genome, or singleton genes. Furthermore, EDGAR features a wide range of analyses and visualizations like Venn diagrams, synteny plots, phylogenetic trees, as well as Amino Acid Identity (AAI) and Average Nucleotide Identity (ANI) matrices. During the last few years, the average number of genomes analyzed in an EDGAR project increased by two orders of magnitude. To handle this massive increase, a completely new technical backend infrastructure for the EDGAR platform was designed and launched as EDGAR3.0. For the calculation of new EDGAR3.0 projects, we are now using a scalable Kubernetes cluster running in a cloud environment. A new storage infrastructure was developed using a file-based high-performance storage backend which ensures timely data handling and efficient access. The new data backend guarantees a memory efficient calculation of orthologs, and parallelization has led to drastically reduced processing times. Based on the advanced technical infrastructure new analysis features could be implemented including POCP and FastANI genomes similarity indices, UpSet intersecting set visualization, and circular genome plots. Also the public database section of EDGAR was largely updated and now offers access to 24,317 genomes in 749 free-to-use projects. In summary, EDGAR 3.0 provides a new, scalable infrastructure for comprehensive microbial comparative gene content analysis. The web server is accessible at http://edgar3.computational.bio.
format	Online Article Text
id	pubmed-8262741
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-82627412021-07-08 EDGAR3.0: comparative genomics and phylogenomics on a scalable infrastructure Dieckmann, Marius Alfred Beyvers, Sebastian Nkouamedjo-Fankep, Rudel Christian Hanel, Patrick Harald Georg Jelonek, Lukas Blom, Jochen Goesmann, Alexander Nucleic Acids Res Web Server Issue The EDGAR platform, a web server providing databases of precomputed orthology data for thousands of microbial genomes, is one of the most established tools in the field of comparative genomics and phylogenomics. Based on precomputed gene alignments, EDGAR allows quick identification of the differential gene content, i.e. the pan genome, the core genome, or singleton genes. Furthermore, EDGAR features a wide range of analyses and visualizations like Venn diagrams, synteny plots, phylogenetic trees, as well as Amino Acid Identity (AAI) and Average Nucleotide Identity (ANI) matrices. During the last few years, the average number of genomes analyzed in an EDGAR project increased by two orders of magnitude. To handle this massive increase, a completely new technical backend infrastructure for the EDGAR platform was designed and launched as EDGAR3.0. For the calculation of new EDGAR3.0 projects, we are now using a scalable Kubernetes cluster running in a cloud environment. A new storage infrastructure was developed using a file-based high-performance storage backend which ensures timely data handling and efficient access. The new data backend guarantees a memory efficient calculation of orthologs, and parallelization has led to drastically reduced processing times. Based on the advanced technical infrastructure new analysis features could be implemented including POCP and FastANI genomes similarity indices, UpSet intersecting set visualization, and circular genome plots. Also the public database section of EDGAR was largely updated and now offers access to 24,317 genomes in 749 free-to-use projects. In summary, EDGAR 3.0 provides a new, scalable infrastructure for comprehensive microbial comparative gene content analysis. The web server is accessible at http://edgar3.computational.bio. Oxford University Press 2021-05-14 /pmc/articles/PMC8262741/ /pubmed/33988716 http://dx.doi.org/10.1093/nar/gkab341 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Web Server Issue Dieckmann, Marius Alfred Beyvers, Sebastian Nkouamedjo-Fankep, Rudel Christian Hanel, Patrick Harald Georg Jelonek, Lukas Blom, Jochen Goesmann, Alexander EDGAR3.0: comparative genomics and phylogenomics on a scalable infrastructure
title	EDGAR3.0: comparative genomics and phylogenomics on a scalable infrastructure
title_full	EDGAR3.0: comparative genomics and phylogenomics on a scalable infrastructure
title_fullStr	EDGAR3.0: comparative genomics and phylogenomics on a scalable infrastructure
title_full_unstemmed	EDGAR3.0: comparative genomics and phylogenomics on a scalable infrastructure
title_short	EDGAR3.0: comparative genomics and phylogenomics on a scalable infrastructure
title_sort	edgar3.0: comparative genomics and phylogenomics on a scalable infrastructure
topic	Web Server Issue
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8262741/ https://www.ncbi.nlm.nih.gov/pubmed/33988716 http://dx.doi.org/10.1093/nar/gkab341
work_keys_str_mv	AT dieckmannmariusalfred edgar30comparativegenomicsandphylogenomicsonascalableinfrastructure AT beyverssebastian edgar30comparativegenomicsandphylogenomicsonascalableinfrastructure AT nkouamedjofankeprudelchristian edgar30comparativegenomicsandphylogenomicsonascalableinfrastructure AT hanelpatrickharaldgeorg edgar30comparativegenomicsandphylogenomicsonascalableinfrastructure AT jeloneklukas edgar30comparativegenomicsandphylogenomicsonascalableinfrastructure AT blomjochen edgar30comparativegenomicsandphylogenomicsonascalableinfrastructure AT goesmannalexander edgar30comparativegenomicsandphylogenomicsonascalableinfrastructure

EDGAR3.0: comparative genomics and phylogenomics on a scalable infrastructure

Ejemplares similares