Cargando…

Parallel computation of genome-scale RNA secondary structure to detect structural constraints on human genome

BACKGROUND: RNA secondary structure around splice sites is known to assist normal splicing by promoting spliceosome recognition. However, analyzing the structural properties of entire intronic regions or pre-mRNA sequences has been difficult hitherto, owing to serious experimental and computational...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kawaguchi, Risa, Kiryu, Hisanori
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4858847/ https://www.ncbi.nlm.nih.gov/pubmed/27153986 http://dx.doi.org/10.1186/s12859-016-1067-9

_version_	1782430866944819200
author	Kawaguchi, Risa Kiryu, Hisanori
author_facet	Kawaguchi, Risa Kiryu, Hisanori
author_sort	Kawaguchi, Risa
collection	PubMed
description	BACKGROUND: RNA secondary structure around splice sites is known to assist normal splicing by promoting spliceosome recognition. However, analyzing the structural properties of entire intronic regions or pre-mRNA sequences has been difficult hitherto, owing to serious experimental and computational limitations, such as low read coverage and numerical problems. RESULTS: Our novel software, “ParasoR”, is designed to run on a computer cluster and enables the exact computation of various structural features of long RNA sequences under the constraint of maximal base-pairing distance. ParasoR divides dynamic programming (DP) matrices into smaller pieces, such that each piece can be computed by a separate computer node without losing the connectivity information between the pieces. ParasoR directly computes the ratios of DP variables to avoid the reduction of numerical precision caused by the cancellation of a large number of Boltzmann factors. The structural preferences of mRNAs computed by ParasoR shows a high concordance with those determined by high-throughput sequencing analyses. Using ParasoR, we investigated the global structural preferences of transcribed regions in the human genome. A genome-wide folding simulation indicated that transcribed regions are significantly more structural than intergenic regions after removing repeat sequences and k-mer frequency bias. In particular, we observed a highly significant preference for base pairing over entire intronic regions as compared to their antisense sequences, as well as to intergenic regions. A comparison between pre-mRNAs and mRNAs showed that coding regions become more accessible after splicing, indicating constraints for translational efficiency. Such changes are correlated with gene expression levels, as well as GC content, and are enriched among genes associated with cytoskeleton and kinase functions. CONCLUSIONS: We have shown that ParasoR is very useful for analyzing the structural properties of long RNA sequences such as mRNAs, pre-mRNAs, and long non-coding RNAs whose lengths can be more than a million bases in the human genome. In our analyses, transcribed regions including introns are indicated to be subject to various types of structural constraints that cannot be explained from simple sequence composition biases. ParasoR is freely available at https://github.com/carushi/ParasoR. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1067-9) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4858847
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-48588472016-06-02 Parallel computation of genome-scale RNA secondary structure to detect structural constraints on human genome Kawaguchi, Risa Kiryu, Hisanori BMC Bioinformatics Methodology Article BACKGROUND: RNA secondary structure around splice sites is known to assist normal splicing by promoting spliceosome recognition. However, analyzing the structural properties of entire intronic regions or pre-mRNA sequences has been difficult hitherto, owing to serious experimental and computational limitations, such as low read coverage and numerical problems. RESULTS: Our novel software, “ParasoR”, is designed to run on a computer cluster and enables the exact computation of various structural features of long RNA sequences under the constraint of maximal base-pairing distance. ParasoR divides dynamic programming (DP) matrices into smaller pieces, such that each piece can be computed by a separate computer node without losing the connectivity information between the pieces. ParasoR directly computes the ratios of DP variables to avoid the reduction of numerical precision caused by the cancellation of a large number of Boltzmann factors. The structural preferences of mRNAs computed by ParasoR shows a high concordance with those determined by high-throughput sequencing analyses. Using ParasoR, we investigated the global structural preferences of transcribed regions in the human genome. A genome-wide folding simulation indicated that transcribed regions are significantly more structural than intergenic regions after removing repeat sequences and k-mer frequency bias. In particular, we observed a highly significant preference for base pairing over entire intronic regions as compared to their antisense sequences, as well as to intergenic regions. A comparison between pre-mRNAs and mRNAs showed that coding regions become more accessible after splicing, indicating constraints for translational efficiency. Such changes are correlated with gene expression levels, as well as GC content, and are enriched among genes associated with cytoskeleton and kinase functions. CONCLUSIONS: We have shown that ParasoR is very useful for analyzing the structural properties of long RNA sequences such as mRNAs, pre-mRNAs, and long non-coding RNAs whose lengths can be more than a million bases in the human genome. In our analyses, transcribed regions including introns are indicated to be subject to various types of structural constraints that cannot be explained from simple sequence composition biases. ParasoR is freely available at https://github.com/carushi/ParasoR. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1067-9) contains supplementary material, which is available to authorized users. BioMed Central 2016-05-06 /pmc/articles/PMC4858847/ /pubmed/27153986 http://dx.doi.org/10.1186/s12859-016-1067-9 Text en © Kawaguchi and Kiryu. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Kawaguchi, Risa Kiryu, Hisanori Parallel computation of genome-scale RNA secondary structure to detect structural constraints on human genome
title	Parallel computation of genome-scale RNA secondary structure to detect structural constraints on human genome
title_full	Parallel computation of genome-scale RNA secondary structure to detect structural constraints on human genome
title_fullStr	Parallel computation of genome-scale RNA secondary structure to detect structural constraints on human genome
title_full_unstemmed	Parallel computation of genome-scale RNA secondary structure to detect structural constraints on human genome
title_short	Parallel computation of genome-scale RNA secondary structure to detect structural constraints on human genome
title_sort	parallel computation of genome-scale rna secondary structure to detect structural constraints on human genome
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4858847/ https://www.ncbi.nlm.nih.gov/pubmed/27153986 http://dx.doi.org/10.1186/s12859-016-1067-9
work_keys_str_mv	AT kawaguchirisa parallelcomputationofgenomescalernasecondarystructuretodetectstructuralconstraintsonhumangenome AT kiryuhisanori parallelcomputationofgenomescalernasecondarystructuretodetectstructuralconstraintsonhumangenome

Parallel computation of genome-scale RNA secondary structure to detect structural constraints on human genome

Ejemplares similares