Cargando…
Intron length distributions and gene prediction
Accurate gene prediction in eukaryotes is a difficult and subtle problem. Here we point out a useful feature of expected distributions of spliceosomal intron lengths. Since introns are removed from transcripts prior to translation, intron lengths are not expected to respect coding frame, thus the nu...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1950532/ https://www.ncbi.nlm.nih.gov/pubmed/17617639 http://dx.doi.org/10.1093/nar/gkm281 |
_version_ | 1782134557136388096 |
---|---|
author | Roy, Scott William Penny, David |
author_facet | Roy, Scott William Penny, David |
author_sort | Roy, Scott William |
collection | PubMed |
description | Accurate gene prediction in eukaryotes is a difficult and subtle problem. Here we point out a useful feature of expected distributions of spliceosomal intron lengths. Since introns are removed from transcripts prior to translation, intron lengths are not expected to respect coding frame, thus the number of genomic introns that are a multiple of three bases (‘3n introns’) should be similar to the number that are a multiple of three plus one bases (or plus two bases). Skewed predicted intron length distributions thus suggest systematic errors in intron prediction. For instance, a genome-wide excess of 3n introns suggests that many internal exonic sequences have been incorrectly called introns, whereas a deficit of 3n introns suggests that many 3n introns that lack stop codons have been mistaken for exonic sequence. A survey of genomic annotations for 29 diverse eukaryotic species showed that skew in intron length distributions is a common problem. We discuss several examples of skews in genome-wide intron length distributions that indicate systematic problems with gene prediction. We suggest that evaluation of length distributions of predicted introns is a fast and simple method for detecting a variety of possible systematic biases in gene prediction or even problems with genome assemblies, and discuss ways in which these insights could be incorporated into genome annotation protocols. |
format | Text |
id | pubmed-1950532 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-19505322007-08-22 Intron length distributions and gene prediction Roy, Scott William Penny, David Nucleic Acids Res Genomics Accurate gene prediction in eukaryotes is a difficult and subtle problem. Here we point out a useful feature of expected distributions of spliceosomal intron lengths. Since introns are removed from transcripts prior to translation, intron lengths are not expected to respect coding frame, thus the number of genomic introns that are a multiple of three bases (‘3n introns’) should be similar to the number that are a multiple of three plus one bases (or plus two bases). Skewed predicted intron length distributions thus suggest systematic errors in intron prediction. For instance, a genome-wide excess of 3n introns suggests that many internal exonic sequences have been incorrectly called introns, whereas a deficit of 3n introns suggests that many 3n introns that lack stop codons have been mistaken for exonic sequence. A survey of genomic annotations for 29 diverse eukaryotic species showed that skew in intron length distributions is a common problem. We discuss several examples of skews in genome-wide intron length distributions that indicate systematic problems with gene prediction. We suggest that evaluation of length distributions of predicted introns is a fast and simple method for detecting a variety of possible systematic biases in gene prediction or even problems with genome assemblies, and discuss ways in which these insights could be incorporated into genome annotation protocols. Oxford University Press 2007-07 2007-07-07 /pmc/articles/PMC1950532/ /pubmed/17617639 http://dx.doi.org/10.1093/nar/gkm281 Text en © 2007 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Genomics Roy, Scott William Penny, David Intron length distributions and gene prediction |
title | Intron length distributions and gene prediction |
title_full | Intron length distributions and gene prediction |
title_fullStr | Intron length distributions and gene prediction |
title_full_unstemmed | Intron length distributions and gene prediction |
title_short | Intron length distributions and gene prediction |
title_sort | intron length distributions and gene prediction |
topic | Genomics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1950532/ https://www.ncbi.nlm.nih.gov/pubmed/17617639 http://dx.doi.org/10.1093/nar/gkm281 |
work_keys_str_mv | AT royscottwilliam intronlengthdistributionsandgeneprediction AT pennydavid intronlengthdistributionsandgeneprediction |