Cargando…

Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce

BACKGROUND: Ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Their secondary structures are crucial for the RNA functionality, and the prediction of the secondary structures is widely studied. Our previous research shows tha...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Boyu, Yehdego, Daniel T, Johnson, Kyle L, Leung, Ming-Ying, Taufer, Michela
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3952952/ https://www.ncbi.nlm.nih.gov/pubmed/24564983 http://dx.doi.org/10.1186/1472-6807-13-S1-S3

_version_	1782307282312232960
author	Zhang, Boyu Yehdego, Daniel T Johnson, Kyle L Leung, Ming-Ying Taufer, Michela
author_facet	Zhang, Boyu Yehdego, Daniel T Johnson, Kyle L Leung, Ming-Ying Taufer, Michela
author_sort	Zhang, Boyu
collection	PubMed
description	BACKGROUND: Ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Their secondary structures are crucial for the RNA functionality, and the prediction of the secondary structures is widely studied. Our previous research shows that cutting long sequences into shorter chunks, predicting secondary structures of the chunks independently using thermodynamic methods, and reconstructing the entire secondary structure from the predicted chunk structures can yield better accuracy than predicting the secondary structure using the RNA sequence as a whole. The chunking, prediction, and reconstruction processes can use different methods and parameters, some of which produce more accurate predictions than others. In this paper, we study the prediction accuracy and efficiency of three different chunking methods using seven popular secondary structure prediction programs that apply to two datasets of RNA with known secondary structures, which include both pseudoknotted and non-pseudoknotted sequences, as well as a family of viral genome RNAs whose structures have not been predicted before. Our modularized MapReduce framework based on Hadoop allows us to study the problem in a parallel and robust environment. RESULTS: On average, the maximum accuracy retention values are larger than one for our chunking methods and the seven prediction programs over 50 non-pseudoknotted sequences, meaning that the secondary structure predicted using chunking is more similar to the real structure than the secondary structure predicted by using the whole sequence. We observe similar results for the 23 pseudoknotted sequences, except for the NUPACK program using the centered chunking method. The performance analysis for 14 long RNA sequences from the Nodaviridae virus family outlines how the coarse-grained mapping of chunking and predictions in the MapReduce framework exhibits shorter turnaround times for short RNA sequences. However, as the lengths of the RNA sequences increase, the fine-grained mapping can surpass the coarse-grained mapping in performance. CONCLUSIONS: By using our MapReduce framework together with statistical analysis on the accuracy retention results, we observe how the inversion-based chunking methods can outperform predictions using the whole sequence. Our chunk-based approach also enables us to predict secondary structures for very long RNA sequences, which is not feasible with traditional methods alone.
format	Online Article Text
id	pubmed-3952952
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-39529522014-03-24 Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce Zhang, Boyu Yehdego, Daniel T Johnson, Kyle L Leung, Ming-Ying Taufer, Michela BMC Struct Biol Research BACKGROUND: Ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Their secondary structures are crucial for the RNA functionality, and the prediction of the secondary structures is widely studied. Our previous research shows that cutting long sequences into shorter chunks, predicting secondary structures of the chunks independently using thermodynamic methods, and reconstructing the entire secondary structure from the predicted chunk structures can yield better accuracy than predicting the secondary structure using the RNA sequence as a whole. The chunking, prediction, and reconstruction processes can use different methods and parameters, some of which produce more accurate predictions than others. In this paper, we study the prediction accuracy and efficiency of three different chunking methods using seven popular secondary structure prediction programs that apply to two datasets of RNA with known secondary structures, which include both pseudoknotted and non-pseudoknotted sequences, as well as a family of viral genome RNAs whose structures have not been predicted before. Our modularized MapReduce framework based on Hadoop allows us to study the problem in a parallel and robust environment. RESULTS: On average, the maximum accuracy retention values are larger than one for our chunking methods and the seven prediction programs over 50 non-pseudoknotted sequences, meaning that the secondary structure predicted using chunking is more similar to the real structure than the secondary structure predicted by using the whole sequence. We observe similar results for the 23 pseudoknotted sequences, except for the NUPACK program using the centered chunking method. The performance analysis for 14 long RNA sequences from the Nodaviridae virus family outlines how the coarse-grained mapping of chunking and predictions in the MapReduce framework exhibits shorter turnaround times for short RNA sequences. However, as the lengths of the RNA sequences increase, the fine-grained mapping can surpass the coarse-grained mapping in performance. CONCLUSIONS: By using our MapReduce framework together with statistical analysis on the accuracy retention results, we observe how the inversion-based chunking methods can outperform predictions using the whole sequence. Our chunk-based approach also enables us to predict secondary structures for very long RNA sequences, which is not feasible with traditional methods alone. BioMed Central 2013-11-08 /pmc/articles/PMC3952952/ /pubmed/24564983 http://dx.doi.org/10.1186/1472-6807-13-S1-S3 Text en Copyright © 2013 Zhang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Zhang, Boyu Yehdego, Daniel T Johnson, Kyle L Leung, Ming-Ying Taufer, Michela Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce
title	Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce
title_full	Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce
title_fullStr	Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce
title_full_unstemmed	Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce
title_short	Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce
title_sort	enhancement of accuracy and efficiency for rna secondary structure prediction by sequence segmentation and mapreduce
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3952952/ https://www.ncbi.nlm.nih.gov/pubmed/24564983 http://dx.doi.org/10.1186/1472-6807-13-S1-S3
work_keys_str_mv	AT zhangboyu enhancementofaccuracyandefficiencyforrnasecondarystructurepredictionbysequencesegmentationandmapreduce AT yehdegodanielt enhancementofaccuracyandefficiencyforrnasecondarystructurepredictionbysequencesegmentationandmapreduce AT johnsonkylel enhancementofaccuracyandefficiencyforrnasecondarystructurepredictionbysequencesegmentationandmapreduce AT leungmingying enhancementofaccuracyandefficiencyforrnasecondarystructurepredictionbysequencesegmentationandmapreduce AT taufermichela enhancementofaccuracyandefficiencyforrnasecondarystructurepredictionbysequencesegmentationandmapreduce

Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce

Ejemplares similares