Cargando…

Modeling one thousand intron length distributions with fitild

MOTIVATION: Intron length distribution (ILD) is a specific feature of a genome that exhibits extensive species-specific variation. Whereas ILD contributes to up to 30% of the total information content for intron recognition in some species, rendering it an important component of computational gene p...

Descripción completa

Detalles Bibliográficos
Autor principal: Gotoh, Osamu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6157073/
https://www.ncbi.nlm.nih.gov/pubmed/29722882
http://dx.doi.org/10.1093/bioinformatics/bty353
Descripción
Sumario:MOTIVATION: Intron length distribution (ILD) is a specific feature of a genome that exhibits extensive species-specific variation. Whereas ILD contributes to up to 30% of the total information content for intron recognition in some species, rendering it an important component of computational gene prediction, very few studies have been conducted to quantitatively characterize ILDs of various species. RESULTS: We developed a set of computer programs (fitild, compild, etc.) to build statistical models of ILDs and compare them with one another. Each ILD of more than 1000 genomes was fitted with fitild to a statistical model consisting of one, two, or three components of Frechet distributions. Several measures of distances between ILDs were calculated by compild. A theoretical model was presented to better understand the origin of the observed shape of an ILD. AVAILABILITY AND IMPLEMENTATION: The C++ source codes are available at https://github.com/ogotoh/fitild.git/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.