Cargando…

GeneSeqToFamily: a Galaxy workflow to find gene families based on the Ensembl Compara GeneTrees pipeline

BACKGROUND: Gene duplication is a major factor contributing to evolutionary novelty, and the contraction or expansion of gene families has often been associated with morphological, physiological, and environmental adaptations. The study of homologous genes helps us to understand the evolution of gen...

Descripción completa

Detalles Bibliográficos
Autores principales: Thanki, Anil S, Soranzo, Nicola, Haerty, Wilfried, Davey, Robert P
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5863215/
https://www.ncbi.nlm.nih.gov/pubmed/29425291
http://dx.doi.org/10.1093/gigascience/giy005
_version_ 1783308344511954944
author Thanki, Anil S
Soranzo, Nicola
Haerty, Wilfried
Davey, Robert P
author_facet Thanki, Anil S
Soranzo, Nicola
Haerty, Wilfried
Davey, Robert P
author_sort Thanki, Anil S
collection PubMed
description BACKGROUND: Gene duplication is a major factor contributing to evolutionary novelty, and the contraction or expansion of gene families has often been associated with morphological, physiological, and environmental adaptations. The study of homologous genes helps us to understand the evolution of gene families. It plays a vital role in finding ancestral gene duplication events as well as identifying genes that have diverged from a common ancestor under positive selection. There are various tools available, such as MSOAR, OrthoMCL, and HomoloGene, to identify gene families and visualize syntenic information between species, providing an overview of syntenic regions evolution at the family level. Unfortunately, none of them provide information about structural changes within genes, such as the conservation of ancestral exon boundaries among multiple genomes. The Ensembl GeneTrees computational pipeline generates gene trees based on coding sequences, provides details about exon conservation, and is used in the Ensembl Compara project to discover gene families. FINDINGS: A certain amount of expertise is required to configure and run the Ensembl Compara GeneTrees pipeline via command line. Therefore, we converted this pipeline into a Galaxy workflow, called GeneSeqToFamily, and provided additional functionality. This workflow uses existing tools from the Galaxy ToolShed, as well as providing additional wrappers and tools that are required to run the workflow. CONCLUSIONS: GeneSeqToFamily represents the Ensembl GeneTrees pipeline as a set of interconnected Galaxy tools, so they can be run interactively within the Galaxy's user-friendly workflow environment while still providing the flexibility to tailor the analysis by changing configurations and tools if necessary. Additional tools allow users to subsequently visualize the gene families produced by the workflow, using the Aequatus.js interactive tool, which has been developed as part of the Aequatus software project.
format Online
Article
Text
id pubmed-5863215
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-58632152018-03-29 GeneSeqToFamily: a Galaxy workflow to find gene families based on the Ensembl Compara GeneTrees pipeline Thanki, Anil S Soranzo, Nicola Haerty, Wilfried Davey, Robert P Gigascience Technical Note BACKGROUND: Gene duplication is a major factor contributing to evolutionary novelty, and the contraction or expansion of gene families has often been associated with morphological, physiological, and environmental adaptations. The study of homologous genes helps us to understand the evolution of gene families. It plays a vital role in finding ancestral gene duplication events as well as identifying genes that have diverged from a common ancestor under positive selection. There are various tools available, such as MSOAR, OrthoMCL, and HomoloGene, to identify gene families and visualize syntenic information between species, providing an overview of syntenic regions evolution at the family level. Unfortunately, none of them provide information about structural changes within genes, such as the conservation of ancestral exon boundaries among multiple genomes. The Ensembl GeneTrees computational pipeline generates gene trees based on coding sequences, provides details about exon conservation, and is used in the Ensembl Compara project to discover gene families. FINDINGS: A certain amount of expertise is required to configure and run the Ensembl Compara GeneTrees pipeline via command line. Therefore, we converted this pipeline into a Galaxy workflow, called GeneSeqToFamily, and provided additional functionality. This workflow uses existing tools from the Galaxy ToolShed, as well as providing additional wrappers and tools that are required to run the workflow. CONCLUSIONS: GeneSeqToFamily represents the Ensembl GeneTrees pipeline as a set of interconnected Galaxy tools, so they can be run interactively within the Galaxy's user-friendly workflow environment while still providing the flexibility to tailor the analysis by changing configurations and tools if necessary. Additional tools allow users to subsequently visualize the gene families produced by the workflow, using the Aequatus.js interactive tool, which has been developed as part of the Aequatus software project. Oxford University Press 2018-02-07 /pmc/articles/PMC5863215/ /pubmed/29425291 http://dx.doi.org/10.1093/gigascience/giy005 Text en © The Authors 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Thanki, Anil S
Soranzo, Nicola
Haerty, Wilfried
Davey, Robert P
GeneSeqToFamily: a Galaxy workflow to find gene families based on the Ensembl Compara GeneTrees pipeline
title GeneSeqToFamily: a Galaxy workflow to find gene families based on the Ensembl Compara GeneTrees pipeline
title_full GeneSeqToFamily: a Galaxy workflow to find gene families based on the Ensembl Compara GeneTrees pipeline
title_fullStr GeneSeqToFamily: a Galaxy workflow to find gene families based on the Ensembl Compara GeneTrees pipeline
title_full_unstemmed GeneSeqToFamily: a Galaxy workflow to find gene families based on the Ensembl Compara GeneTrees pipeline
title_short GeneSeqToFamily: a Galaxy workflow to find gene families based on the Ensembl Compara GeneTrees pipeline
title_sort geneseqtofamily: a galaxy workflow to find gene families based on the ensembl compara genetrees pipeline
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5863215/
https://www.ncbi.nlm.nih.gov/pubmed/29425291
http://dx.doi.org/10.1093/gigascience/giy005
work_keys_str_mv AT thankianils geneseqtofamilyagalaxyworkflowtofindgenefamiliesbasedontheensemblcomparagenetreespipeline
AT soranzonicola geneseqtofamilyagalaxyworkflowtofindgenefamiliesbasedontheensemblcomparagenetreespipeline
AT haertywilfried geneseqtofamilyagalaxyworkflowtofindgenefamiliesbasedontheensemblcomparagenetreespipeline
AT daveyrobertp geneseqtofamilyagalaxyworkflowtofindgenefamiliesbasedontheensemblcomparagenetreespipeline