Cargando…

Using intron position conservation for homology-based gene prediction

Annotation of protein-coding genes is very important in bioinformatics and biology and has a decisive influence on many downstream analyses. Homology-based gene prediction programs allow for transferring knowledge about protein-coding genes from an annotated organism to an organism of interest. Here...

Descripción completa

Detalles Bibliográficos
Autores principales: Keilwagen, Jens, Wenk, Michael, Erickson, Jessica L., Schattat, Martin H., Grau, Jan, Hartung, Frank
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4872089/
https://www.ncbi.nlm.nih.gov/pubmed/26893356
http://dx.doi.org/10.1093/nar/gkw092
_version_ 1782432675571695616
author Keilwagen, Jens
Wenk, Michael
Erickson, Jessica L.
Schattat, Martin H.
Grau, Jan
Hartung, Frank
author_facet Keilwagen, Jens
Wenk, Michael
Erickson, Jessica L.
Schattat, Martin H.
Grau, Jan
Hartung, Frank
author_sort Keilwagen, Jens
collection PubMed
description Annotation of protein-coding genes is very important in bioinformatics and biology and has a decisive influence on many downstream analyses. Homology-based gene prediction programs allow for transferring knowledge about protein-coding genes from an annotated organism to an organism of interest. Here, we present a homology-based gene prediction program called GeMoMa. GeMoMa utilizes the conservation of intron positions within genes to predict related genes in other organisms. We assess the performance of GeMoMa and compare it with state-of-the-art competitors on plant and animal genomes using an extended best reciprocal hit approach. We find that GeMoMa often makes more precise predictions than its competitors yielding a substantially increased number of correct transcripts. Subsequently, we exemplarily validate GeMoMa predictions using Sanger sequencing. Finally, we use RNA-seq data to compare the predictions of homology-based gene prediction programs, and find again that GeMoMa performs well. Hence, we conclude that exploiting intron position conservation improves homology-based gene prediction, and we make GeMoMa freely available as command-line tool and Galaxy integration.
format Online
Article
Text
id pubmed-4872089
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-48720892016-05-27 Using intron position conservation for homology-based gene prediction Keilwagen, Jens Wenk, Michael Erickson, Jessica L. Schattat, Martin H. Grau, Jan Hartung, Frank Nucleic Acids Res Methods Online Annotation of protein-coding genes is very important in bioinformatics and biology and has a decisive influence on many downstream analyses. Homology-based gene prediction programs allow for transferring knowledge about protein-coding genes from an annotated organism to an organism of interest. Here, we present a homology-based gene prediction program called GeMoMa. GeMoMa utilizes the conservation of intron positions within genes to predict related genes in other organisms. We assess the performance of GeMoMa and compare it with state-of-the-art competitors on plant and animal genomes using an extended best reciprocal hit approach. We find that GeMoMa often makes more precise predictions than its competitors yielding a substantially increased number of correct transcripts. Subsequently, we exemplarily validate GeMoMa predictions using Sanger sequencing. Finally, we use RNA-seq data to compare the predictions of homology-based gene prediction programs, and find again that GeMoMa performs well. Hence, we conclude that exploiting intron position conservation improves homology-based gene prediction, and we make GeMoMa freely available as command-line tool and Galaxy integration. Oxford University Press 2016-05-19 2016-02-17 /pmc/articles/PMC4872089/ /pubmed/26893356 http://dx.doi.org/10.1093/nar/gkw092 Text en © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Keilwagen, Jens
Wenk, Michael
Erickson, Jessica L.
Schattat, Martin H.
Grau, Jan
Hartung, Frank
Using intron position conservation for homology-based gene prediction
title Using intron position conservation for homology-based gene prediction
title_full Using intron position conservation for homology-based gene prediction
title_fullStr Using intron position conservation for homology-based gene prediction
title_full_unstemmed Using intron position conservation for homology-based gene prediction
title_short Using intron position conservation for homology-based gene prediction
title_sort using intron position conservation for homology-based gene prediction
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4872089/
https://www.ncbi.nlm.nih.gov/pubmed/26893356
http://dx.doi.org/10.1093/nar/gkw092
work_keys_str_mv AT keilwagenjens usingintronpositionconservationforhomologybasedgeneprediction
AT wenkmichael usingintronpositionconservationforhomologybasedgeneprediction
AT ericksonjessical usingintronpositionconservationforhomologybasedgeneprediction
AT schattatmartinh usingintronpositionconservationforhomologybasedgeneprediction
AT graujan usingintronpositionconservationforhomologybasedgeneprediction
AT hartungfrank usingintronpositionconservationforhomologybasedgeneprediction