Cargando…

Dinucleotide controlled null models for comparative RNA gene prediction

BACKGROUND: Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those program...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gesell, Tanja, Washietl, Stefan
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2453142/ https://www.ncbi.nlm.nih.gov/pubmed/18505553 http://dx.doi.org/10.1186/1471-2105-9-248

_version_	1782157352712011776
author	Gesell, Tanja Washietl, Stefan
author_facet	Gesell, Tanja Washietl, Stefan
author_sort	Gesell, Tanja
collection	PubMed
description	BACKGROUND: Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. RESULTS: We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. CONCLUSION: SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered. AVAILABILITY: SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: .
format	Text
id	pubmed-2453142
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-24531422008-07-11 Dinucleotide controlled null models for comparative RNA gene prediction Gesell, Tanja Washietl, Stefan BMC Bioinformatics Methodology Article BACKGROUND: Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. RESULTS: We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. CONCLUSION: SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered. AVAILABILITY: SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: . BioMed Central 2008-05-27 /pmc/articles/PMC2453142/ /pubmed/18505553 http://dx.doi.org/10.1186/1471-2105-9-248 Text en Copyright © 2008 Gesell and Washietl; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Gesell, Tanja Washietl, Stefan Dinucleotide controlled null models for comparative RNA gene prediction
title	Dinucleotide controlled null models for comparative RNA gene prediction
title_full	Dinucleotide controlled null models for comparative RNA gene prediction
title_fullStr	Dinucleotide controlled null models for comparative RNA gene prediction
title_full_unstemmed	Dinucleotide controlled null models for comparative RNA gene prediction
title_short	Dinucleotide controlled null models for comparative RNA gene prediction
title_sort	dinucleotide controlled null models for comparative rna gene prediction
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2453142/ https://www.ncbi.nlm.nih.gov/pubmed/18505553 http://dx.doi.org/10.1186/1471-2105-9-248
work_keys_str_mv	AT geselltanja dinucleotidecontrollednullmodelsforcomparativernageneprediction AT washietlstefan dinucleotidecontrollednullmodelsforcomparativernageneprediction

Dinucleotide controlled null models for comparative RNA gene prediction

Ejemplares similares