Cargando…
Computational workflow for analysis of gain and loss of genes in distantly related genomes
BACKGROUND: Early evolution of animals led to profound changes in body plan organization, symmetry and the rise of tissue complexity including formation of muscular and nervous systems. This process was associated with massive restructuring of animal genomes as well as deletion, acquisition and rapi...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3439731/ https://www.ncbi.nlm.nih.gov/pubmed/23046496 http://dx.doi.org/10.1186/1471-2105-13-S15-S5 |
_version_ | 1782243056490119168 |
---|---|
author | Ptitsyn, Andrey Moroz, Leonid L |
author_facet | Ptitsyn, Andrey Moroz, Leonid L |
author_sort | Ptitsyn, Andrey |
collection | PubMed |
description | BACKGROUND: Early evolution of animals led to profound changes in body plan organization, symmetry and the rise of tissue complexity including formation of muscular and nervous systems. This process was associated with massive restructuring of animal genomes as well as deletion, acquisition and rapid differentiation of genes from a common metazoan ancestor. Here, we present a simple but efficient workflow for elucidation of gene gain and gene loss within major branches of the animal kingdom. METHODS: We have designed a pipeline of sequence comparison, clustering and functional annotation using 12 major phyla as illustrative examples. Specifically, for the input we used sets of ab initio predicted gene models from the genomes of six bilaterians, three basal metazoans (Cnidaria, Placozoa, Porifera), two unicellular eukaryotes (Monosiga and Capsospora) and the green plant Arabidopsis as an out-group. Due to the large amounts of data the software required a high-performance Linux cluster. The final results can be imported into standard spreadsheet analysis software and queried for the numbers and specific sets of genes absent in specific genomes, uniquely present or shared among different taxons. RESULTS AND CONCLUSIONS: The developed software is open source and available free of charge on Open Source principles. It allows the user to address a number of specific questions regarding gene gain and gene loss in particular genomes, and user-defined groups of genomes can be formulated in a type of logical expression. For example, our analysis of 12 sequenced genomes indicated that these genomes possess at least 90,000 unique genes and gene families, suggesting enormous diversity of the genome repertoire in the animal kingdom. Approximately 9% of these gene families are shared universally (homologous) among all genomes, 53% are unique to specific taxa, and the rest are shared between two or more distantly related genomes. |
format | Online Article Text |
id | pubmed-3439731 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-34397312012-09-17 Computational workflow for analysis of gain and loss of genes in distantly related genomes Ptitsyn, Andrey Moroz, Leonid L BMC Bioinformatics Proceedings BACKGROUND: Early evolution of animals led to profound changes in body plan organization, symmetry and the rise of tissue complexity including formation of muscular and nervous systems. This process was associated with massive restructuring of animal genomes as well as deletion, acquisition and rapid differentiation of genes from a common metazoan ancestor. Here, we present a simple but efficient workflow for elucidation of gene gain and gene loss within major branches of the animal kingdom. METHODS: We have designed a pipeline of sequence comparison, clustering and functional annotation using 12 major phyla as illustrative examples. Specifically, for the input we used sets of ab initio predicted gene models from the genomes of six bilaterians, three basal metazoans (Cnidaria, Placozoa, Porifera), two unicellular eukaryotes (Monosiga and Capsospora) and the green plant Arabidopsis as an out-group. Due to the large amounts of data the software required a high-performance Linux cluster. The final results can be imported into standard spreadsheet analysis software and queried for the numbers and specific sets of genes absent in specific genomes, uniquely present or shared among different taxons. RESULTS AND CONCLUSIONS: The developed software is open source and available free of charge on Open Source principles. It allows the user to address a number of specific questions regarding gene gain and gene loss in particular genomes, and user-defined groups of genomes can be formulated in a type of logical expression. For example, our analysis of 12 sequenced genomes indicated that these genomes possess at least 90,000 unique genes and gene families, suggesting enormous diversity of the genome repertoire in the animal kingdom. Approximately 9% of these gene families are shared universally (homologous) among all genomes, 53% are unique to specific taxa, and the rest are shared between two or more distantly related genomes. BioMed Central 2012-09-11 /pmc/articles/PMC3439731/ /pubmed/23046496 http://dx.doi.org/10.1186/1471-2105-13-S15-S5 Text en Copyright ©2012 Ptitsyn and Moroz; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Ptitsyn, Andrey Moroz, Leonid L Computational workflow for analysis of gain and loss of genes in distantly related genomes |
title | Computational workflow for analysis of gain and loss of genes in distantly related genomes |
title_full | Computational workflow for analysis of gain and loss of genes in distantly related genomes |
title_fullStr | Computational workflow for analysis of gain and loss of genes in distantly related genomes |
title_full_unstemmed | Computational workflow for analysis of gain and loss of genes in distantly related genomes |
title_short | Computational workflow for analysis of gain and loss of genes in distantly related genomes |
title_sort | computational workflow for analysis of gain and loss of genes in distantly related genomes |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3439731/ https://www.ncbi.nlm.nih.gov/pubmed/23046496 http://dx.doi.org/10.1186/1471-2105-13-S15-S5 |
work_keys_str_mv | AT ptitsynandrey computationalworkflowforanalysisofgainandlossofgenesindistantlyrelatedgenomes AT morozleonidl computationalworkflowforanalysisofgainandlossofgenesindistantlyrelatedgenomes |