Cargando…

Identification of consensus RNA secondary structures using suffix arrays

BACKGROUND: The identification of a consensus RNA motif often consists in finding a conserved secondary structure with minimum free energy in an ensemble of aligned sequences. However, an alignment is often difficult to obtain without prior structural information. Thus the need for tools to automate...

Descripción completa

Detalles Bibliográficos
Autores principales: Anwar, Mohammad, Nguyen, Truong, Turcotte, Marcel
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1475642/
https://www.ncbi.nlm.nih.gov/pubmed/16677380
http://dx.doi.org/10.1186/1471-2105-7-244
_version_ 1782128133338562560
author Anwar, Mohammad
Nguyen, Truong
Turcotte, Marcel
author_facet Anwar, Mohammad
Nguyen, Truong
Turcotte, Marcel
author_sort Anwar, Mohammad
collection PubMed
description BACKGROUND: The identification of a consensus RNA motif often consists in finding a conserved secondary structure with minimum free energy in an ensemble of aligned sequences. However, an alignment is often difficult to obtain without prior structural information. Thus the need for tools to automate this process. RESULTS: We present an algorithm called Seed to identify all the conserved RNA secondary structure motifs in a set of unaligned sequences. The search space is defined as the set of all the secondary structure motifs inducible from a seed sequence. A general-to-specific search allows finding all the motifs that are conserved. Suffix arrays are used to enumerate efficiently all the biological palindromes as well as for the matching of RNA secondary structure expressions. We assessed the ability of this approach to uncover known structures using four datasets. The enumeration of the motifs relies only on the secondary structure definition and conservation only, therefore allowing for the independent evaluation of scoring schemes. Twelve simple objective functions based on free energy were evaluated for their potential to discriminate native folds from the rest. CONCLUSION: Our evaluation shows that 1) support and exclusion constraints are sufficient to make an exhaustive search of the secondary structure space feasible. 2) The search space induced from a seed sequence contains known motifs. 3) Simple objective functions, consisting of a combination of the free energy of matching sequences, can generally identify motifs with high positive predictive value and sensitivity to known motifs.
format Text
id pubmed-1475642
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-14756422006-06-12 Identification of consensus RNA secondary structures using suffix arrays Anwar, Mohammad Nguyen, Truong Turcotte, Marcel BMC Bioinformatics Research Article BACKGROUND: The identification of a consensus RNA motif often consists in finding a conserved secondary structure with minimum free energy in an ensemble of aligned sequences. However, an alignment is often difficult to obtain without prior structural information. Thus the need for tools to automate this process. RESULTS: We present an algorithm called Seed to identify all the conserved RNA secondary structure motifs in a set of unaligned sequences. The search space is defined as the set of all the secondary structure motifs inducible from a seed sequence. A general-to-specific search allows finding all the motifs that are conserved. Suffix arrays are used to enumerate efficiently all the biological palindromes as well as for the matching of RNA secondary structure expressions. We assessed the ability of this approach to uncover known structures using four datasets. The enumeration of the motifs relies only on the secondary structure definition and conservation only, therefore allowing for the independent evaluation of scoring schemes. Twelve simple objective functions based on free energy were evaluated for their potential to discriminate native folds from the rest. CONCLUSION: Our evaluation shows that 1) support and exclusion constraints are sufficient to make an exhaustive search of the secondary structure space feasible. 2) The search space induced from a seed sequence contains known motifs. 3) Simple objective functions, consisting of a combination of the free energy of matching sequences, can generally identify motifs with high positive predictive value and sensitivity to known motifs. BioMed Central 2006-05-05 /pmc/articles/PMC1475642/ /pubmed/16677380 http://dx.doi.org/10.1186/1471-2105-7-244 Text en Copyright © 2006 Anwar et al; licensee BioMed Central Ltd.
spellingShingle Research Article
Anwar, Mohammad
Nguyen, Truong
Turcotte, Marcel
Identification of consensus RNA secondary structures using suffix arrays
title Identification of consensus RNA secondary structures using suffix arrays
title_full Identification of consensus RNA secondary structures using suffix arrays
title_fullStr Identification of consensus RNA secondary structures using suffix arrays
title_full_unstemmed Identification of consensus RNA secondary structures using suffix arrays
title_short Identification of consensus RNA secondary structures using suffix arrays
title_sort identification of consensus rna secondary structures using suffix arrays
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1475642/
https://www.ncbi.nlm.nih.gov/pubmed/16677380
http://dx.doi.org/10.1186/1471-2105-7-244
work_keys_str_mv AT anwarmohammad identificationofconsensusrnasecondarystructuresusingsuffixarrays
AT nguyentruong identificationofconsensusrnasecondarystructuresusingsuffixarrays
AT turcottemarcel identificationofconsensusrnasecondarystructuresusingsuffixarrays