Cargando…

First Steps in the Analysis of Prokaryotic Pan-Genomes

Pan-genome is defined as the set of orthologous and unique genes of a specific group of organisms. The pan-genome is composed by the core genome, accessory genome, and species- or strain-specific genes. The pan-genome is considered open or closed based on the alpha value of the Heap law. In an open...

Descripción completa

Detalles Bibliográficos
Autores principales: Costa, Sávio Souza, Guimarães, Luís Carlos, Silva, Artur, Soares, Siomar Castro, Baraúna, Rafael Azevedo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7418249/
https://www.ncbi.nlm.nih.gov/pubmed/32843837
http://dx.doi.org/10.1177/1177932220938064
_version_ 1783569655082778624
author Costa, Sávio Souza
Guimarães, Luís Carlos
Silva, Artur
Soares, Siomar Castro
Baraúna, Rafael Azevedo
author_facet Costa, Sávio Souza
Guimarães, Luís Carlos
Silva, Artur
Soares, Siomar Castro
Baraúna, Rafael Azevedo
author_sort Costa, Sávio Souza
collection PubMed
description Pan-genome is defined as the set of orthologous and unique genes of a specific group of organisms. The pan-genome is composed by the core genome, accessory genome, and species- or strain-specific genes. The pan-genome is considered open or closed based on the alpha value of the Heap law. In an open pan-genome, the number of gene families will continuously increase with the addition of new genomes to the analysis, while in a closed pan-genome, the number of gene families will not increase considerably. The first step of a pan-genome analysis is the homogenization of genome annotation. The same software should be used to annotate genomes, such as GeneMark or RAST. Subsequently, several software are used to calculate the pan-genome such as BPGA, GET_HOMOLOGUES, PGAP, among others. This review presents all these initial steps for those who want to perform a pan-genome analysis, explaining key concepts of the area. Furthermore, we present the pan-genomic analysis of 9 bacterial species. These are the species with the highest number of genomes deposited in GenBank. We also show the influence of the identity and coverage parameters on the prediction of orthologous and paralogous genes. Finally, we cite the perspectives of several research areas where pan-genome analysis can be used to answer important issues.
format Online
Article
Text
id pubmed-7418249
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-74182492020-08-24 First Steps in the Analysis of Prokaryotic Pan-Genomes Costa, Sávio Souza Guimarães, Luís Carlos Silva, Artur Soares, Siomar Castro Baraúna, Rafael Azevedo Bioinform Biol Insights Review Pan-genome is defined as the set of orthologous and unique genes of a specific group of organisms. The pan-genome is composed by the core genome, accessory genome, and species- or strain-specific genes. The pan-genome is considered open or closed based on the alpha value of the Heap law. In an open pan-genome, the number of gene families will continuously increase with the addition of new genomes to the analysis, while in a closed pan-genome, the number of gene families will not increase considerably. The first step of a pan-genome analysis is the homogenization of genome annotation. The same software should be used to annotate genomes, such as GeneMark or RAST. Subsequently, several software are used to calculate the pan-genome such as BPGA, GET_HOMOLOGUES, PGAP, among others. This review presents all these initial steps for those who want to perform a pan-genome analysis, explaining key concepts of the area. Furthermore, we present the pan-genomic analysis of 9 bacterial species. These are the species with the highest number of genomes deposited in GenBank. We also show the influence of the identity and coverage parameters on the prediction of orthologous and paralogous genes. Finally, we cite the perspectives of several research areas where pan-genome analysis can be used to answer important issues. SAGE Publications 2020-08-07 /pmc/articles/PMC7418249/ /pubmed/32843837 http://dx.doi.org/10.1177/1177932220938064 Text en © The Author(s) 2020 https://creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Review
Costa, Sávio Souza
Guimarães, Luís Carlos
Silva, Artur
Soares, Siomar Castro
Baraúna, Rafael Azevedo
First Steps in the Analysis of Prokaryotic Pan-Genomes
title First Steps in the Analysis of Prokaryotic Pan-Genomes
title_full First Steps in the Analysis of Prokaryotic Pan-Genomes
title_fullStr First Steps in the Analysis of Prokaryotic Pan-Genomes
title_full_unstemmed First Steps in the Analysis of Prokaryotic Pan-Genomes
title_short First Steps in the Analysis of Prokaryotic Pan-Genomes
title_sort first steps in the analysis of prokaryotic pan-genomes
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7418249/
https://www.ncbi.nlm.nih.gov/pubmed/32843837
http://dx.doi.org/10.1177/1177932220938064
work_keys_str_mv AT costasaviosouza firststepsintheanalysisofprokaryoticpangenomes
AT guimaraesluiscarlos firststepsintheanalysisofprokaryoticpangenomes
AT silvaartur firststepsintheanalysisofprokaryoticpangenomes
AT soaressiomarcastro firststepsintheanalysisofprokaryoticpangenomes
AT baraunarafaelazevedo firststepsintheanalysisofprokaryoticpangenomes