Cargando…
Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation
Identifying coding genes is an essential step in genome annotation. Here, we utilize existing whole genome alignments to detect conserved coding exons and then map gene annotations from one genome to many aligned genomes. We show that genome alignments contain thousands of spurious frameshifts and s...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4914097/ https://www.ncbi.nlm.nih.gov/pubmed/27016733 http://dx.doi.org/10.1093/nar/gkw210 |
_version_ | 1782438508939444224 |
---|---|
author | Sharma, Virag Elghafari, Anas Hiller, Michael |
author_facet | Sharma, Virag Elghafari, Anas Hiller, Michael |
author_sort | Sharma, Virag |
collection | PubMed |
description | Identifying coding genes is an essential step in genome annotation. Here, we utilize existing whole genome alignments to detect conserved coding exons and then map gene annotations from one genome to many aligned genomes. We show that genome alignments contain thousands of spurious frameshifts and splice site mutations in exons that are truly conserved. To overcome these limitations, we have developed CESAR (Coding Exon-Structure Aware Realigner) that realigns coding exons, while considering reading frame and splice sites of each exon. CESAR effectively avoids spurious frameshifts in conserved genes and detects 91% of shifted splice sites. This results in the identification of thousands of additional conserved exons and 99% of the exons that lack inactivating mutations match real exons. Finally, to demonstrate the potential of using CESAR for comparative gene annotation, we applied it to 188 788 exons of 19 865 human genes to annotate human genes in 99 other vertebrates. These comparative gene annotations are available as a resource (http://bds.mpi-cbg.de/hillerlab/CESAR/). CESAR (https://github.com/hillerlab/CESAR/) can readily be applied to other alignments to accurately annotate coding genes in many other vertebrate and invertebrate genomes. |
format | Online Article Text |
id | pubmed-4914097 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-49140972016-06-22 Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation Sharma, Virag Elghafari, Anas Hiller, Michael Nucleic Acids Res Methods Online Identifying coding genes is an essential step in genome annotation. Here, we utilize existing whole genome alignments to detect conserved coding exons and then map gene annotations from one genome to many aligned genomes. We show that genome alignments contain thousands of spurious frameshifts and splice site mutations in exons that are truly conserved. To overcome these limitations, we have developed CESAR (Coding Exon-Structure Aware Realigner) that realigns coding exons, while considering reading frame and splice sites of each exon. CESAR effectively avoids spurious frameshifts in conserved genes and detects 91% of shifted splice sites. This results in the identification of thousands of additional conserved exons and 99% of the exons that lack inactivating mutations match real exons. Finally, to demonstrate the potential of using CESAR for comparative gene annotation, we applied it to 188 788 exons of 19 865 human genes to annotate human genes in 99 other vertebrates. These comparative gene annotations are available as a resource (http://bds.mpi-cbg.de/hillerlab/CESAR/). CESAR (https://github.com/hillerlab/CESAR/) can readily be applied to other alignments to accurately annotate coding genes in many other vertebrate and invertebrate genomes. Oxford University Press 2016-06-20 2016-03-25 /pmc/articles/PMC4914097/ /pubmed/27016733 http://dx.doi.org/10.1093/nar/gkw210 Text en © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Online Sharma, Virag Elghafari, Anas Hiller, Michael Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation |
title | Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation |
title_full | Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation |
title_fullStr | Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation |
title_full_unstemmed | Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation |
title_short | Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation |
title_sort | coding exon-structure aware realigner (cesar) utilizes genome alignments for accurate comparative gene annotation |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4914097/ https://www.ncbi.nlm.nih.gov/pubmed/27016733 http://dx.doi.org/10.1093/nar/gkw210 |
work_keys_str_mv | AT sharmavirag codingexonstructureawarerealignercesarutilizesgenomealignmentsforaccuratecomparativegeneannotation AT elghafarianas codingexonstructureawarerealignercesarutilizesgenomealignmentsforaccuratecomparativegeneannotation AT hillermichael codingexonstructureawarerealignercesarutilizesgenomealignmentsforaccuratecomparativegeneannotation |