Cargando…
fagin: synteny-based phylostratigraphy and finer classification of young genes
BACKGROUND: With every new genome that is sequenced, thousands of species-specific genes (orphans) are found, some originating from ultra-rapid mutations of existing genes, many others originating de novo from non-genic regions of the genome. If some of these genes survive across speciations, then e...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6712868/ https://www.ncbi.nlm.nih.gov/pubmed/31455236 http://dx.doi.org/10.1186/s12859-019-3023-y |
_version_ | 1783446770635767808 |
---|---|
author | Arendsee, Zebulun Li, Jing Singh, Urminder Bhandary, Priyanka Seetharam, Arun Wurtele, Eve Syrkin |
author_facet | Arendsee, Zebulun Li, Jing Singh, Urminder Bhandary, Priyanka Seetharam, Arun Wurtele, Eve Syrkin |
author_sort | Arendsee, Zebulun |
collection | PubMed |
description | BACKGROUND: With every new genome that is sequenced, thousands of species-specific genes (orphans) are found, some originating from ultra-rapid mutations of existing genes, many others originating de novo from non-genic regions of the genome. If some of these genes survive across speciations, then extant organisms will contain a patchwork of genes whose ancestors first appeared at different times. Standard phylostratigraphy, the technique of partitioning genes by their age, is based solely on protein similarity algorithms. However, this approach relies on negative evidence ─ a failure to detect a homolog of a query gene. An alternative approach is to limit the search for homologs to syntenic regions. Then, genes can be positively identified as de novo orphans by tracing them to non-coding sequences in related species. RESULTS: We have developed a synteny-based pipeline in the R framework. Fagin determines the genomic context of each query gene in a focal species compared to homologous sequence in target species. We tested the fagin pipeline on two focal species, Arabidopsis thaliana (plus four target species in Brassicaseae) and Saccharomyces cerevisiae (plus six target species in Saccharomyces). Using microsynteny maps, fagin classified the homology relationship of each query gene against each target genome into three main classes, and further subclasses: AAic (has a coding syntenic homolog), NTic (has a non-coding syntenic homolog), and Unknown (has no detected syntenic homolog). fagin inferred over half the “Unknown” A. thaliana query genes, and about 20% for S. cerevisiae, as lacking a syntenic homolog because of local indels or scrambled synteny. CONCLUSIONS: fagin augments standard phylostratigraphy, and extends synteny-based phylostratigraphy with an automated, customizable, and detailed contextual analysis. By comparing synteny-based phylostrata to standard phylostrata, fagin systematically identifies those orphans and lineage-specific genes that are well-supported to have originated de novo. Analyzing within-species genomes should distinguish orphan genes that may have originated through rapid divergence from de novo orphans. Fagin also delineates whether a gene has no syntenic homolog because of technical or biological reasons. These analyses indicate that some orphans may be associated with regions of high genomic perturbation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3023-y) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6712868 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-67128682019-09-04 fagin: synteny-based phylostratigraphy and finer classification of young genes Arendsee, Zebulun Li, Jing Singh, Urminder Bhandary, Priyanka Seetharam, Arun Wurtele, Eve Syrkin BMC Bioinformatics Software BACKGROUND: With every new genome that is sequenced, thousands of species-specific genes (orphans) are found, some originating from ultra-rapid mutations of existing genes, many others originating de novo from non-genic regions of the genome. If some of these genes survive across speciations, then extant organisms will contain a patchwork of genes whose ancestors first appeared at different times. Standard phylostratigraphy, the technique of partitioning genes by their age, is based solely on protein similarity algorithms. However, this approach relies on negative evidence ─ a failure to detect a homolog of a query gene. An alternative approach is to limit the search for homologs to syntenic regions. Then, genes can be positively identified as de novo orphans by tracing them to non-coding sequences in related species. RESULTS: We have developed a synteny-based pipeline in the R framework. Fagin determines the genomic context of each query gene in a focal species compared to homologous sequence in target species. We tested the fagin pipeline on two focal species, Arabidopsis thaliana (plus four target species in Brassicaseae) and Saccharomyces cerevisiae (plus six target species in Saccharomyces). Using microsynteny maps, fagin classified the homology relationship of each query gene against each target genome into three main classes, and further subclasses: AAic (has a coding syntenic homolog), NTic (has a non-coding syntenic homolog), and Unknown (has no detected syntenic homolog). fagin inferred over half the “Unknown” A. thaliana query genes, and about 20% for S. cerevisiae, as lacking a syntenic homolog because of local indels or scrambled synteny. CONCLUSIONS: fagin augments standard phylostratigraphy, and extends synteny-based phylostratigraphy with an automated, customizable, and detailed contextual analysis. By comparing synteny-based phylostrata to standard phylostrata, fagin systematically identifies those orphans and lineage-specific genes that are well-supported to have originated de novo. Analyzing within-species genomes should distinguish orphan genes that may have originated through rapid divergence from de novo orphans. Fagin also delineates whether a gene has no syntenic homolog because of technical or biological reasons. These analyses indicate that some orphans may be associated with regions of high genomic perturbation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3023-y) contains supplementary material, which is available to authorized users. BioMed Central 2019-08-27 /pmc/articles/PMC6712868/ /pubmed/31455236 http://dx.doi.org/10.1186/s12859-019-3023-y Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Arendsee, Zebulun Li, Jing Singh, Urminder Bhandary, Priyanka Seetharam, Arun Wurtele, Eve Syrkin fagin: synteny-based phylostratigraphy and finer classification of young genes |
title | fagin: synteny-based phylostratigraphy and finer classification of young genes |
title_full | fagin: synteny-based phylostratigraphy and finer classification of young genes |
title_fullStr | fagin: synteny-based phylostratigraphy and finer classification of young genes |
title_full_unstemmed | fagin: synteny-based phylostratigraphy and finer classification of young genes |
title_short | fagin: synteny-based phylostratigraphy and finer classification of young genes |
title_sort | fagin: synteny-based phylostratigraphy and finer classification of young genes |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6712868/ https://www.ncbi.nlm.nih.gov/pubmed/31455236 http://dx.doi.org/10.1186/s12859-019-3023-y |
work_keys_str_mv | AT arendseezebulun faginsyntenybasedphylostratigraphyandfinerclassificationofyounggenes AT lijing faginsyntenybasedphylostratigraphyandfinerclassificationofyounggenes AT singhurminder faginsyntenybasedphylostratigraphyandfinerclassificationofyounggenes AT bhandarypriyanka faginsyntenybasedphylostratigraphyandfinerclassificationofyounggenes AT seetharamarun faginsyntenybasedphylostratigraphyandfinerclassificationofyounggenes AT wurteleevesyrkin faginsyntenybasedphylostratigraphyandfinerclassificationofyounggenes |