Cargando…

fagin: synteny-based phylostratigraphy and finer classification of young genes

BACKGROUND: With every new genome that is sequenced, thousands of species-specific genes (orphans) are found, some originating from ultra-rapid mutations of existing genes, many others originating de novo from non-genic regions of the genome. If some of these genes survive across speciations, then e...

Descripción completa

Detalles Bibliográficos
Autores principales: Arendsee, Zebulun, Li, Jing, Singh, Urminder, Bhandary, Priyanka, Seetharam, Arun, Wurtele, Eve Syrkin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6712868/
https://www.ncbi.nlm.nih.gov/pubmed/31455236
http://dx.doi.org/10.1186/s12859-019-3023-y
_version_ 1783446770635767808
author Arendsee, Zebulun
Li, Jing
Singh, Urminder
Bhandary, Priyanka
Seetharam, Arun
Wurtele, Eve Syrkin
author_facet Arendsee, Zebulun
Li, Jing
Singh, Urminder
Bhandary, Priyanka
Seetharam, Arun
Wurtele, Eve Syrkin
author_sort Arendsee, Zebulun
collection PubMed
description BACKGROUND: With every new genome that is sequenced, thousands of species-specific genes (orphans) are found, some originating from ultra-rapid mutations of existing genes, many others originating de novo from non-genic regions of the genome. If some of these genes survive across speciations, then extant organisms will contain a patchwork of genes whose ancestors first appeared at different times. Standard phylostratigraphy, the technique of partitioning genes by their age, is based solely on protein similarity algorithms. However, this approach relies on negative evidence ─ a failure to detect a homolog of a query gene. An alternative approach is to limit the search for homologs to syntenic regions. Then, genes can be positively identified as de novo orphans by tracing them to non-coding sequences in related species. RESULTS: We have developed a synteny-based pipeline in the R framework. Fagin determines the genomic context of each query gene in a focal species compared to homologous sequence in target species. We tested the fagin pipeline on two focal species, Arabidopsis thaliana (plus four target species in Brassicaseae) and Saccharomyces cerevisiae (plus six target species in Saccharomyces). Using microsynteny maps, fagin classified the homology relationship of each query gene against each target genome into three main classes, and further subclasses: AAic (has a coding syntenic homolog), NTic (has a non-coding syntenic homolog), and Unknown (has no detected syntenic homolog). fagin inferred over half the “Unknown” A. thaliana query genes, and about 20% for S. cerevisiae, as lacking a syntenic homolog because of local indels or scrambled synteny. CONCLUSIONS: fagin augments standard phylostratigraphy, and extends synteny-based phylostratigraphy with an automated, customizable, and detailed contextual analysis. By comparing synteny-based phylostrata to standard phylostrata, fagin systematically identifies those orphans and lineage-specific genes that are well-supported to have originated de novo. Analyzing within-species genomes should distinguish orphan genes that may have originated through rapid divergence from de novo orphans. Fagin also delineates whether a gene has no syntenic homolog because of technical or biological reasons. These analyses indicate that some orphans may be associated with regions of high genomic perturbation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3023-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6712868
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-67128682019-09-04 fagin: synteny-based phylostratigraphy and finer classification of young genes Arendsee, Zebulun Li, Jing Singh, Urminder Bhandary, Priyanka Seetharam, Arun Wurtele, Eve Syrkin BMC Bioinformatics Software BACKGROUND: With every new genome that is sequenced, thousands of species-specific genes (orphans) are found, some originating from ultra-rapid mutations of existing genes, many others originating de novo from non-genic regions of the genome. If some of these genes survive across speciations, then extant organisms will contain a patchwork of genes whose ancestors first appeared at different times. Standard phylostratigraphy, the technique of partitioning genes by their age, is based solely on protein similarity algorithms. However, this approach relies on negative evidence ─ a failure to detect a homolog of a query gene. An alternative approach is to limit the search for homologs to syntenic regions. Then, genes can be positively identified as de novo orphans by tracing them to non-coding sequences in related species. RESULTS: We have developed a synteny-based pipeline in the R framework. Fagin determines the genomic context of each query gene in a focal species compared to homologous sequence in target species. We tested the fagin pipeline on two focal species, Arabidopsis thaliana (plus four target species in Brassicaseae) and Saccharomyces cerevisiae (plus six target species in Saccharomyces). Using microsynteny maps, fagin classified the homology relationship of each query gene against each target genome into three main classes, and further subclasses: AAic (has a coding syntenic homolog), NTic (has a non-coding syntenic homolog), and Unknown (has no detected syntenic homolog). fagin inferred over half the “Unknown” A. thaliana query genes, and about 20% for S. cerevisiae, as lacking a syntenic homolog because of local indels or scrambled synteny. CONCLUSIONS: fagin augments standard phylostratigraphy, and extends synteny-based phylostratigraphy with an automated, customizable, and detailed contextual analysis. By comparing synteny-based phylostrata to standard phylostrata, fagin systematically identifies those orphans and lineage-specific genes that are well-supported to have originated de novo. Analyzing within-species genomes should distinguish orphan genes that may have originated through rapid divergence from de novo orphans. Fagin also delineates whether a gene has no syntenic homolog because of technical or biological reasons. These analyses indicate that some orphans may be associated with regions of high genomic perturbation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3023-y) contains supplementary material, which is available to authorized users. BioMed Central 2019-08-27 /pmc/articles/PMC6712868/ /pubmed/31455236 http://dx.doi.org/10.1186/s12859-019-3023-y Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Arendsee, Zebulun
Li, Jing
Singh, Urminder
Bhandary, Priyanka
Seetharam, Arun
Wurtele, Eve Syrkin
fagin: synteny-based phylostratigraphy and finer classification of young genes
title fagin: synteny-based phylostratigraphy and finer classification of young genes
title_full fagin: synteny-based phylostratigraphy and finer classification of young genes
title_fullStr fagin: synteny-based phylostratigraphy and finer classification of young genes
title_full_unstemmed fagin: synteny-based phylostratigraphy and finer classification of young genes
title_short fagin: synteny-based phylostratigraphy and finer classification of young genes
title_sort fagin: synteny-based phylostratigraphy and finer classification of young genes
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6712868/
https://www.ncbi.nlm.nih.gov/pubmed/31455236
http://dx.doi.org/10.1186/s12859-019-3023-y
work_keys_str_mv AT arendseezebulun faginsyntenybasedphylostratigraphyandfinerclassificationofyounggenes
AT lijing faginsyntenybasedphylostratigraphyandfinerclassificationofyounggenes
AT singhurminder faginsyntenybasedphylostratigraphyandfinerclassificationofyounggenes
AT bhandarypriyanka faginsyntenybasedphylostratigraphyandfinerclassificationofyounggenes
AT seetharamarun faginsyntenybasedphylostratigraphyandfinerclassificationofyounggenes
AT wurteleevesyrkin faginsyntenybasedphylostratigraphyandfinerclassificationofyounggenes