Cargando…

Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation

Identifying coding genes is an essential step in genome annotation. Here, we utilize existing whole genome alignments to detect conserved coding exons and then map gene annotations from one genome to many aligned genomes. We show that genome alignments contain thousands of spurious frameshifts and s...

Descripción completa

Detalles Bibliográficos
Autores principales: Sharma, Virag, Elghafari, Anas, Hiller, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4914097/
https://www.ncbi.nlm.nih.gov/pubmed/27016733
http://dx.doi.org/10.1093/nar/gkw210
_version_ 1782438508939444224
author Sharma, Virag
Elghafari, Anas
Hiller, Michael
author_facet Sharma, Virag
Elghafari, Anas
Hiller, Michael
author_sort Sharma, Virag
collection PubMed
description Identifying coding genes is an essential step in genome annotation. Here, we utilize existing whole genome alignments to detect conserved coding exons and then map gene annotations from one genome to many aligned genomes. We show that genome alignments contain thousands of spurious frameshifts and splice site mutations in exons that are truly conserved. To overcome these limitations, we have developed CESAR (Coding Exon-Structure Aware Realigner) that realigns coding exons, while considering reading frame and splice sites of each exon. CESAR effectively avoids spurious frameshifts in conserved genes and detects 91% of shifted splice sites. This results in the identification of thousands of additional conserved exons and 99% of the exons that lack inactivating mutations match real exons. Finally, to demonstrate the potential of using CESAR for comparative gene annotation, we applied it to 188 788 exons of 19 865 human genes to annotate human genes in 99 other vertebrates. These comparative gene annotations are available as a resource (http://bds.mpi-cbg.de/hillerlab/CESAR/). CESAR (https://github.com/hillerlab/CESAR/) can readily be applied to other alignments to accurately annotate coding genes in many other vertebrate and invertebrate genomes.
format Online
Article
Text
id pubmed-4914097
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-49140972016-06-22 Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation Sharma, Virag Elghafari, Anas Hiller, Michael Nucleic Acids Res Methods Online Identifying coding genes is an essential step in genome annotation. Here, we utilize existing whole genome alignments to detect conserved coding exons and then map gene annotations from one genome to many aligned genomes. We show that genome alignments contain thousands of spurious frameshifts and splice site mutations in exons that are truly conserved. To overcome these limitations, we have developed CESAR (Coding Exon-Structure Aware Realigner) that realigns coding exons, while considering reading frame and splice sites of each exon. CESAR effectively avoids spurious frameshifts in conserved genes and detects 91% of shifted splice sites. This results in the identification of thousands of additional conserved exons and 99% of the exons that lack inactivating mutations match real exons. Finally, to demonstrate the potential of using CESAR for comparative gene annotation, we applied it to 188 788 exons of 19 865 human genes to annotate human genes in 99 other vertebrates. These comparative gene annotations are available as a resource (http://bds.mpi-cbg.de/hillerlab/CESAR/). CESAR (https://github.com/hillerlab/CESAR/) can readily be applied to other alignments to accurately annotate coding genes in many other vertebrate and invertebrate genomes. Oxford University Press 2016-06-20 2016-03-25 /pmc/articles/PMC4914097/ /pubmed/27016733 http://dx.doi.org/10.1093/nar/gkw210 Text en © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Sharma, Virag
Elghafari, Anas
Hiller, Michael
Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation
title Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation
title_full Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation
title_fullStr Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation
title_full_unstemmed Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation
title_short Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation
title_sort coding exon-structure aware realigner (cesar) utilizes genome alignments for accurate comparative gene annotation
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4914097/
https://www.ncbi.nlm.nih.gov/pubmed/27016733
http://dx.doi.org/10.1093/nar/gkw210
work_keys_str_mv AT sharmavirag codingexonstructureawarerealignercesarutilizesgenomealignmentsforaccuratecomparativegeneannotation
AT elghafarianas codingexonstructureawarerealignercesarutilizesgenomealignmentsforaccuratecomparativegeneannotation
AT hillermichael codingexonstructureawarerealignercesarutilizesgenomealignmentsforaccuratecomparativegeneannotation