Cargando…
Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment
BACKGROUND: Accurate computational identification of eukaryotic gene organization is a long-standing problem. Despite the fundamental importance of precise annotation of genes encoded in newly sequenced genomes, the accuracy of predicted gene structures has not been critically evaluated, mostly due...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4065584/ https://www.ncbi.nlm.nih.gov/pubmed/24927652 http://dx.doi.org/10.1186/1471-2105-15-189 |
_version_ | 1782322110866128896 |
---|---|
author | Gotoh, Osamu Morita, Mariko Nelson, David R |
author_facet | Gotoh, Osamu Morita, Mariko Nelson, David R |
author_sort | Gotoh, Osamu |
collection | PubMed |
description | BACKGROUND: Accurate computational identification of eukaryotic gene organization is a long-standing problem. Despite the fundamental importance of precise annotation of genes encoded in newly sequenced genomes, the accuracy of predicted gene structures has not been critically evaluated, mostly due to the scarcity of proper assessment methods. RESULTS: We present a gene-structure-aware multiple sequence alignment method for gene prediction using amino acid sequences translated from homologous genes from many genomes. The approach provides rich information concerning the reliability of each predicted gene structure. We have also devised an iterative method that attempts to improve the structures of suspiciously predicted genes based on a spliced alignment algorithm using consensus sequences or reliable homologs as templates. Application of our methods to cytochrome P450 and ribosomal proteins from 47 plant genomes indicated that 50 ~ 60 % of the annotated gene structures are likely to contain some defects. Whereas more than half of the defect-containing genes may be intrinsically broken, i.e. they are pseudogenes or gene fragments, located in unfinished sequencing areas, or corresponding to non-productive isoforms, the defects found in a majority of the remaining gene candidates can be remedied by our iterative refinement method. CONCLUSIONS: Refinement of eukaryotic gene structures mediated by gene-structure-aware multiple protein sequence alignment is a useful strategy to dramatically improve the overall prediction quality of a set of homologous genes. Our method will be applicable to various families of protein-coding genes if their domain structures are evolutionarily stable. It is also feasible to apply our method to gene families from all kingdoms of life, not just plants. |
format | Online Article Text |
id | pubmed-4065584 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-40655842014-06-22 Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment Gotoh, Osamu Morita, Mariko Nelson, David R BMC Bioinformatics Research Article BACKGROUND: Accurate computational identification of eukaryotic gene organization is a long-standing problem. Despite the fundamental importance of precise annotation of genes encoded in newly sequenced genomes, the accuracy of predicted gene structures has not been critically evaluated, mostly due to the scarcity of proper assessment methods. RESULTS: We present a gene-structure-aware multiple sequence alignment method for gene prediction using amino acid sequences translated from homologous genes from many genomes. The approach provides rich information concerning the reliability of each predicted gene structure. We have also devised an iterative method that attempts to improve the structures of suspiciously predicted genes based on a spliced alignment algorithm using consensus sequences or reliable homologs as templates. Application of our methods to cytochrome P450 and ribosomal proteins from 47 plant genomes indicated that 50 ~ 60 % of the annotated gene structures are likely to contain some defects. Whereas more than half of the defect-containing genes may be intrinsically broken, i.e. they are pseudogenes or gene fragments, located in unfinished sequencing areas, or corresponding to non-productive isoforms, the defects found in a majority of the remaining gene candidates can be remedied by our iterative refinement method. CONCLUSIONS: Refinement of eukaryotic gene structures mediated by gene-structure-aware multiple protein sequence alignment is a useful strategy to dramatically improve the overall prediction quality of a set of homologous genes. Our method will be applicable to various families of protein-coding genes if their domain structures are evolutionarily stable. It is also feasible to apply our method to gene families from all kingdoms of life, not just plants. BioMed Central 2014-06-14 /pmc/articles/PMC4065584/ /pubmed/24927652 http://dx.doi.org/10.1186/1471-2105-15-189 Text en Copyright © 2014 Gotoh et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Gotoh, Osamu Morita, Mariko Nelson, David R Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment |
title | Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment |
title_full | Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment |
title_fullStr | Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment |
title_full_unstemmed | Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment |
title_short | Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment |
title_sort | assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4065584/ https://www.ncbi.nlm.nih.gov/pubmed/24927652 http://dx.doi.org/10.1186/1471-2105-15-189 |
work_keys_str_mv | AT gotohosamu assessmentandrefinementofeukaryoticgenestructurepredictionwithgenestructureawaremultipleproteinsequencealignment AT moritamariko assessmentandrefinementofeukaryoticgenestructurepredictionwithgenestructureawaremultipleproteinsequencealignment AT nelsondavidr assessmentandrefinementofeukaryoticgenestructurepredictionwithgenestructureawaremultipleproteinsequencealignment |