Cargando…

Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions

Intron prediction is an important problem of the constantly updated genome annotation. Using two model plant (rice and Arabidopsis) genomes, we compared two well-known intron prediction tools: the Blast-Like Alignment Tool (BLAT) and Sim4cc. The results showed that each of the tools had its own adva...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Long, Cho, Hwan-Gue
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Korea Genome Organization 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3475488/
https://www.ncbi.nlm.nih.gov/pubmed/23105930
http://dx.doi.org/10.5808/GI.2012.10.1.58
_version_ 1782246956630802432
author Yang, Long
Cho, Hwan-Gue
author_facet Yang, Long
Cho, Hwan-Gue
author_sort Yang, Long
collection PubMed
description Intron prediction is an important problem of the constantly updated genome annotation. Using two model plant (rice and Arabidopsis) genomes, we compared two well-known intron prediction tools: the Blast-Like Alignment Tool (BLAT) and Sim4cc. The results showed that each of the tools had its own advantages and disadvantages. BLAT predicted more than 99% introns of whole genomic introns with a small number of false-positive introns. Sim4cc was successful at finding the correct introns with a false-negative rate of 1.02% to 4.85%, and it needed a longer run time than BLAT. Further, we evaluated the intron information of 10 complete plant genomes. As non-coding sequences, intron lengths are not limited by a triplet codon frame; so, intron lengths have three phases: a multiple of three bases (3n), a multiple of three bases plus one (3n + 1), and a multiple of three bases plus two (3n + 2). It was widely accepted that the percentages of the 3n, 3n + 1, and 3n + 2 introns were quite similar in genomes. Our studies showed that 80% (8/10) of species were similar in terms of the number of three phases. The percentages of 3n introns in Ostreococcus lucimarinus was excessive (47.7%), while in Ostreococcus tauri, it was deficient (29.1%). This discrepancy could have been the result of errors in intron prediction. It is suggested that a three-phase evaluation is a fast and effective method of detecting intron annotation problems.
format Online
Article
Text
id pubmed-3475488
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Korea Genome Organization
record_format MEDLINE/PubMed
spelling pubmed-34754882012-10-26 Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions Yang, Long Cho, Hwan-Gue Genomics Inf Article Intron prediction is an important problem of the constantly updated genome annotation. Using two model plant (rice and Arabidopsis) genomes, we compared two well-known intron prediction tools: the Blast-Like Alignment Tool (BLAT) and Sim4cc. The results showed that each of the tools had its own advantages and disadvantages. BLAT predicted more than 99% introns of whole genomic introns with a small number of false-positive introns. Sim4cc was successful at finding the correct introns with a false-negative rate of 1.02% to 4.85%, and it needed a longer run time than BLAT. Further, we evaluated the intron information of 10 complete plant genomes. As non-coding sequences, intron lengths are not limited by a triplet codon frame; so, intron lengths have three phases: a multiple of three bases (3n), a multiple of three bases plus one (3n + 1), and a multiple of three bases plus two (3n + 2). It was widely accepted that the percentages of the 3n, 3n + 1, and 3n + 2 introns were quite similar in genomes. Our studies showed that 80% (8/10) of species were similar in terms of the number of three phases. The percentages of 3n introns in Ostreococcus lucimarinus was excessive (47.7%), while in Ostreococcus tauri, it was deficient (29.1%). This discrepancy could have been the result of errors in intron prediction. It is suggested that a three-phase evaluation is a fast and effective method of detecting intron annotation problems. Korea Genome Organization 2012-03 2012-03-31 /pmc/articles/PMC3475488/ /pubmed/23105930 http://dx.doi.org/10.5808/GI.2012.10.1.58 Text en Copyright © 2012 by The Korea Genome Organization http://creativecommons.org/licenses/by-nc/3.0 It is identical to the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/).
spellingShingle Article
Yang, Long
Cho, Hwan-Gue
Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions
title Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions
title_full Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions
title_fullStr Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions
title_full_unstemmed Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions
title_short Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions
title_sort comparative evaluation of intron prediction methods and detection of plant genome annotation using intron length distributions
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3475488/
https://www.ncbi.nlm.nih.gov/pubmed/23105930
http://dx.doi.org/10.5808/GI.2012.10.1.58
work_keys_str_mv AT yanglong comparativeevaluationofintronpredictionmethodsanddetectionofplantgenomeannotationusingintronlengthdistributions
AT chohwangue comparativeevaluationofintronpredictionmethodsanddetectionofplantgenomeannotationusingintronlengthdistributions