Cargando…

PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph

The use of comparative genomics for functional, evolutionary, and epidemiological studies requires methods to classify gene families in terms of occurrence in a given species. These methods usually lack multivariate statistical models to infer the partitions and the optimal number of classes and don...

Descripción completa

Detalles Bibliográficos
Autores principales: Gautreau, Guillaume, Bazin, Adelme, Gachet, Mathieu, Planel, Rémi, Burlot, Laura, Dubois, Mathieu, Perrin, Amandine, Médigue, Claudine, Calteau, Alexandra, Cruveiller, Stéphane, Matias, Catherine, Ambroise, Christophe, Rocha, Eduardo P. C., Vallenet, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7108747/
https://www.ncbi.nlm.nih.gov/pubmed/32191703
http://dx.doi.org/10.1371/journal.pcbi.1007732
_version_ 1783512838249119744
author Gautreau, Guillaume
Bazin, Adelme
Gachet, Mathieu
Planel, Rémi
Burlot, Laura
Dubois, Mathieu
Perrin, Amandine
Médigue, Claudine
Calteau, Alexandra
Cruveiller, Stéphane
Matias, Catherine
Ambroise, Christophe
Rocha, Eduardo P. C.
Vallenet, David
author_facet Gautreau, Guillaume
Bazin, Adelme
Gachet, Mathieu
Planel, Rémi
Burlot, Laura
Dubois, Mathieu
Perrin, Amandine
Médigue, Claudine
Calteau, Alexandra
Cruveiller, Stéphane
Matias, Catherine
Ambroise, Christophe
Rocha, Eduardo P. C.
Vallenet, David
author_sort Gautreau, Guillaume
collection PubMed
description The use of comparative genomics for functional, evolutionary, and epidemiological studies requires methods to classify gene families in terms of occurrence in a given species. These methods usually lack multivariate statistical models to infer the partitions and the optimal number of classes and don’t account for genome organization. We introduce a graph structure to model pangenomes in which nodes represent gene families and edges represent genomic neighborhood. Our method, named PPanGGOLiN, partitions nodes using an Expectation-Maximization algorithm based on multivariate Bernoulli Mixture Model coupled with a Markov Random Field. This approach takes into account the topology of the graph and the presence/absence of genes in pangenomes to classify gene families into persistent, cloud, and one or several shell partitions. By analyzing the partitioned pangenome graphs of isolate genomes from 439 species and metagenome-assembled genomes from 78 species, we demonstrate that our method is effective in estimating the persistent genome. Interestingly, it shows that the shell genome is a key element to understand genome dynamics, presumably because it reflects how genes present at intermediate frequencies drive adaptation of species, and its proportion in genomes is independent of genome size. The graph-based approach proposed by PPanGGOLiN is useful to depict the overall genomic diversity of thousands of strains in a compact structure and provides an effective basis for very large scale comparative genomics. The software is freely available at https://github.com/labgem/PPanGGOLiN.
format Online
Article
Text
id pubmed-7108747
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-71087472020-04-03 PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph Gautreau, Guillaume Bazin, Adelme Gachet, Mathieu Planel, Rémi Burlot, Laura Dubois, Mathieu Perrin, Amandine Médigue, Claudine Calteau, Alexandra Cruveiller, Stéphane Matias, Catherine Ambroise, Christophe Rocha, Eduardo P. C. Vallenet, David PLoS Comput Biol Research Article The use of comparative genomics for functional, evolutionary, and epidemiological studies requires methods to classify gene families in terms of occurrence in a given species. These methods usually lack multivariate statistical models to infer the partitions and the optimal number of classes and don’t account for genome organization. We introduce a graph structure to model pangenomes in which nodes represent gene families and edges represent genomic neighborhood. Our method, named PPanGGOLiN, partitions nodes using an Expectation-Maximization algorithm based on multivariate Bernoulli Mixture Model coupled with a Markov Random Field. This approach takes into account the topology of the graph and the presence/absence of genes in pangenomes to classify gene families into persistent, cloud, and one or several shell partitions. By analyzing the partitioned pangenome graphs of isolate genomes from 439 species and metagenome-assembled genomes from 78 species, we demonstrate that our method is effective in estimating the persistent genome. Interestingly, it shows that the shell genome is a key element to understand genome dynamics, presumably because it reflects how genes present at intermediate frequencies drive adaptation of species, and its proportion in genomes is independent of genome size. The graph-based approach proposed by PPanGGOLiN is useful to depict the overall genomic diversity of thousands of strains in a compact structure and provides an effective basis for very large scale comparative genomics. The software is freely available at https://github.com/labgem/PPanGGOLiN. Public Library of Science 2020-03-19 /pmc/articles/PMC7108747/ /pubmed/32191703 http://dx.doi.org/10.1371/journal.pcbi.1007732 Text en © 2020 Gautreau et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Gautreau, Guillaume
Bazin, Adelme
Gachet, Mathieu
Planel, Rémi
Burlot, Laura
Dubois, Mathieu
Perrin, Amandine
Médigue, Claudine
Calteau, Alexandra
Cruveiller, Stéphane
Matias, Catherine
Ambroise, Christophe
Rocha, Eduardo P. C.
Vallenet, David
PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph
title PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph
title_full PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph
title_fullStr PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph
title_full_unstemmed PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph
title_short PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph
title_sort ppanggolin: depicting microbial diversity via a partitioned pangenome graph
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7108747/
https://www.ncbi.nlm.nih.gov/pubmed/32191703
http://dx.doi.org/10.1371/journal.pcbi.1007732
work_keys_str_mv AT gautreauguillaume ppanggolindepictingmicrobialdiversityviaapartitionedpangenomegraph
AT bazinadelme ppanggolindepictingmicrobialdiversityviaapartitionedpangenomegraph
AT gachetmathieu ppanggolindepictingmicrobialdiversityviaapartitionedpangenomegraph
AT planelremi ppanggolindepictingmicrobialdiversityviaapartitionedpangenomegraph
AT burlotlaura ppanggolindepictingmicrobialdiversityviaapartitionedpangenomegraph
AT duboismathieu ppanggolindepictingmicrobialdiversityviaapartitionedpangenomegraph
AT perrinamandine ppanggolindepictingmicrobialdiversityviaapartitionedpangenomegraph
AT medigueclaudine ppanggolindepictingmicrobialdiversityviaapartitionedpangenomegraph
AT calteaualexandra ppanggolindepictingmicrobialdiversityviaapartitionedpangenomegraph
AT cruveillerstephane ppanggolindepictingmicrobialdiversityviaapartitionedpangenomegraph
AT matiascatherine ppanggolindepictingmicrobialdiversityviaapartitionedpangenomegraph
AT ambroisechristophe ppanggolindepictingmicrobialdiversityviaapartitionedpangenomegraph
AT rochaeduardopc ppanggolindepictingmicrobialdiversityviaapartitionedpangenomegraph
AT vallenetdavid ppanggolindepictingmicrobialdiversityviaapartitionedpangenomegraph