Cargando…

Considerations in the identification of functional RNA structural elements in genomic alignments

BACKGROUND: Accurate identification of novel, functional noncoding (nc) RNA features in genome sequence has proven more difficult than for exons. Current algorithms identify and score potential RNA secondary structures on the basis of thermodynamic stability, conservation, and/or covariance in seque...

Descripción completa

Detalles Bibliográficos
Autores principales: Babak, Tomas, Blencowe, Benjamin J, Hughes, Timothy R
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1803800/
https://www.ncbi.nlm.nih.gov/pubmed/17263882
http://dx.doi.org/10.1186/1471-2105-8-33
_version_ 1782132436476362752
author Babak, Tomas
Blencowe, Benjamin J
Hughes, Timothy R
author_facet Babak, Tomas
Blencowe, Benjamin J
Hughes, Timothy R
author_sort Babak, Tomas
collection PubMed
description BACKGROUND: Accurate identification of novel, functional noncoding (nc) RNA features in genome sequence has proven more difficult than for exons. Current algorithms identify and score potential RNA secondary structures on the basis of thermodynamic stability, conservation, and/or covariance in sequence alignments. Neither the algorithms nor the information gained from the individual inputs have been independently assessed. Furthermore, due to issues in modelling background signal, it has been difficult to gauge the precision of these algorithms on a genomic scale, in which even a seemingly small false-positive rate can result in a vast excess of false discoveries. RESULTS: We developed a shuffling algorithm, shuffle-pair.pl, that simultaneously preserves dinucleotide frequency, gaps, and local conservation in pairwise sequence alignments. We used shuffle-pair.pl to assess precision and recall of six ncRNA search tools (MSARI, QRNA, ddbRNA, RNAz, Evofold, and several variants of simple thermodynamic stability on a test set of 3046 alignments of known ncRNAs. Relative to mononucleotide shuffling, preservation of dinucleotide content in shuffling the alignments resulted in a drastic increase in estimated false-positive detection rates for ncRNA elements, precluding evaluation of higher order alignments, which cannot not be adequately shuffled maintaining both dinucleotides and alignment structure. On pairwise alignments, none of the covariance-based tools performed markedly better than thermodynamic scoring alone. Although the high false-positive rates call into question the veracity of any individual predicted secondary structural element in our analysis, we nevertheless identified intriguing global trends in human genome alignments. The distribution of ncRNA prediction scores in 75-base windows overlapping UTRs, introns, and intergenic regions analyzed using both thermodynamic stability and EvoFold (which has no thermodynamic component) was significantly higher for real than shuffled sequence, while the distribution for coding sequences was lower than that of corresponding shuffles. CONCLUSION: Accurate prediction of novel RNA structural elements in genome sequence remains a difficult problem, and development of an appropriate negative-control strategy for multiple alignments is an important practical challenge. Nonetheless, the general trends we observed for the distributions of predicted ncRNAs across genomic features are biologically meaningful, supporting the presence of secondary structural elements in many 3' UTRs, and providing evidence for evolutionary selection against secondary structures in coding regions.
format Text
id pubmed-1803800
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18038002007-02-23 Considerations in the identification of functional RNA structural elements in genomic alignments Babak, Tomas Blencowe, Benjamin J Hughes, Timothy R BMC Bioinformatics Research Article BACKGROUND: Accurate identification of novel, functional noncoding (nc) RNA features in genome sequence has proven more difficult than for exons. Current algorithms identify and score potential RNA secondary structures on the basis of thermodynamic stability, conservation, and/or covariance in sequence alignments. Neither the algorithms nor the information gained from the individual inputs have been independently assessed. Furthermore, due to issues in modelling background signal, it has been difficult to gauge the precision of these algorithms on a genomic scale, in which even a seemingly small false-positive rate can result in a vast excess of false discoveries. RESULTS: We developed a shuffling algorithm, shuffle-pair.pl, that simultaneously preserves dinucleotide frequency, gaps, and local conservation in pairwise sequence alignments. We used shuffle-pair.pl to assess precision and recall of six ncRNA search tools (MSARI, QRNA, ddbRNA, RNAz, Evofold, and several variants of simple thermodynamic stability on a test set of 3046 alignments of known ncRNAs. Relative to mononucleotide shuffling, preservation of dinucleotide content in shuffling the alignments resulted in a drastic increase in estimated false-positive detection rates for ncRNA elements, precluding evaluation of higher order alignments, which cannot not be adequately shuffled maintaining both dinucleotides and alignment structure. On pairwise alignments, none of the covariance-based tools performed markedly better than thermodynamic scoring alone. Although the high false-positive rates call into question the veracity of any individual predicted secondary structural element in our analysis, we nevertheless identified intriguing global trends in human genome alignments. The distribution of ncRNA prediction scores in 75-base windows overlapping UTRs, introns, and intergenic regions analyzed using both thermodynamic stability and EvoFold (which has no thermodynamic component) was significantly higher for real than shuffled sequence, while the distribution for coding sequences was lower than that of corresponding shuffles. CONCLUSION: Accurate prediction of novel RNA structural elements in genome sequence remains a difficult problem, and development of an appropriate negative-control strategy for multiple alignments is an important practical challenge. Nonetheless, the general trends we observed for the distributions of predicted ncRNAs across genomic features are biologically meaningful, supporting the presence of secondary structural elements in many 3' UTRs, and providing evidence for evolutionary selection against secondary structures in coding regions. BioMed Central 2007-01-30 /pmc/articles/PMC1803800/ /pubmed/17263882 http://dx.doi.org/10.1186/1471-2105-8-33 Text en Copyright © 2007 Babak et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Babak, Tomas
Blencowe, Benjamin J
Hughes, Timothy R
Considerations in the identification of functional RNA structural elements in genomic alignments
title Considerations in the identification of functional RNA structural elements in genomic alignments
title_full Considerations in the identification of functional RNA structural elements in genomic alignments
title_fullStr Considerations in the identification of functional RNA structural elements in genomic alignments
title_full_unstemmed Considerations in the identification of functional RNA structural elements in genomic alignments
title_short Considerations in the identification of functional RNA structural elements in genomic alignments
title_sort considerations in the identification of functional rna structural elements in genomic alignments
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1803800/
https://www.ncbi.nlm.nih.gov/pubmed/17263882
http://dx.doi.org/10.1186/1471-2105-8-33
work_keys_str_mv AT babaktomas considerationsintheidentificationoffunctionalrnastructuralelementsingenomicalignments
AT blencowebenjaminj considerationsintheidentificationoffunctionalrnastructuralelementsingenomicalignments
AT hughestimothyr considerationsintheidentificationoffunctionalrnastructuralelementsingenomicalignments