Cargando…

Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures

Recently, non-coding RNAs (ncRNAs) have been discovered with novel functions, and it has been appreciated that there is pervasive transcription of genomes. Moreover, many novel ncRNAs are not conserved on the primary sequence level. Therefore, de novo computational ncRNA detection that is accurate a...

Descripción completa

Detalles Bibliográficos
Autores principales: Fu, Yinghan, Xu, Zhenjiang Zech, Lu, Zhi J., Zhao, Shan, Mathews, David H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4468099/
https://www.ncbi.nlm.nih.gov/pubmed/26075601
http://dx.doi.org/10.1371/journal.pone.0130200
_version_ 1782376438854320128
author Fu, Yinghan
Xu, Zhenjiang Zech
Lu, Zhi J.
Zhao, Shan
Mathews, David H.
author_facet Fu, Yinghan
Xu, Zhenjiang Zech
Lu, Zhi J.
Zhao, Shan
Mathews, David H.
author_sort Fu, Yinghan
collection PubMed
description Recently, non-coding RNAs (ncRNAs) have been discovered with novel functions, and it has been appreciated that there is pervasive transcription of genomes. Moreover, many novel ncRNAs are not conserved on the primary sequence level. Therefore, de novo computational ncRNA detection that is accurate and efficient is desirable. The purpose of this study is to develop a ncRNA detection method based on conservation of structure in more than two genomes. A new method called Multifind, using Multilign, was developed. Multilign predicts the common secondary structure for multiple input sequences. Multifind then uses measures of structure conservation to estimate the probability that the input sequences are a conserved ncRNA using a classification support vector machine. Multilign is based on Dynalign, which folds and aligns two sequences simultaneously using a scoring scheme that does not include sequence identity; its structure prediction quality is therefore not affected by input sequence diversity. Additionally, ensemble defect was introduced to Multifind as an additional discriminating feature that quantifies the compactness of the folding space for a sequence. Benchmarks showed Multifind performs better than RNAz and LocARNATE+RNAz, a method that uses RNAz on structure alignments generated by LocARNATE, on testing sequences extracted from the Rfam database. For de novo ncRNA discovery in three genomes, Multifind and LocARNATE+RNAz had an advantage over RNAz in low similarity regions of genome alignments. Additionally, Multifind and LocARNATE+RNAz found different subsets of known ncRNA sequences, suggesting the two approaches are complementary.
format Online
Article
Text
id pubmed-4468099
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44680992015-06-25 Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures Fu, Yinghan Xu, Zhenjiang Zech Lu, Zhi J. Zhao, Shan Mathews, David H. PLoS One Research Article Recently, non-coding RNAs (ncRNAs) have been discovered with novel functions, and it has been appreciated that there is pervasive transcription of genomes. Moreover, many novel ncRNAs are not conserved on the primary sequence level. Therefore, de novo computational ncRNA detection that is accurate and efficient is desirable. The purpose of this study is to develop a ncRNA detection method based on conservation of structure in more than two genomes. A new method called Multifind, using Multilign, was developed. Multilign predicts the common secondary structure for multiple input sequences. Multifind then uses measures of structure conservation to estimate the probability that the input sequences are a conserved ncRNA using a classification support vector machine. Multilign is based on Dynalign, which folds and aligns two sequences simultaneously using a scoring scheme that does not include sequence identity; its structure prediction quality is therefore not affected by input sequence diversity. Additionally, ensemble defect was introduced to Multifind as an additional discriminating feature that quantifies the compactness of the folding space for a sequence. Benchmarks showed Multifind performs better than RNAz and LocARNATE+RNAz, a method that uses RNAz on structure alignments generated by LocARNATE, on testing sequences extracted from the Rfam database. For de novo ncRNA discovery in three genomes, Multifind and LocARNATE+RNAz had an advantage over RNAz in low similarity regions of genome alignments. Additionally, Multifind and LocARNATE+RNAz found different subsets of known ncRNA sequences, suggesting the two approaches are complementary. Public Library of Science 2015-06-15 /pmc/articles/PMC4468099/ /pubmed/26075601 http://dx.doi.org/10.1371/journal.pone.0130200 Text en © 2015 Fu et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Fu, Yinghan
Xu, Zhenjiang Zech
Lu, Zhi J.
Zhao, Shan
Mathews, David H.
Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures
title Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures
title_full Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures
title_fullStr Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures
title_full_unstemmed Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures
title_short Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures
title_sort discovery of novel ncrna sequences in multiple genome alignments on the basis of conserved and stable secondary structures
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4468099/
https://www.ncbi.nlm.nih.gov/pubmed/26075601
http://dx.doi.org/10.1371/journal.pone.0130200
work_keys_str_mv AT fuyinghan discoveryofnovelncrnasequencesinmultiplegenomealignmentsonthebasisofconservedandstablesecondarystructures
AT xuzhenjiangzech discoveryofnovelncrnasequencesinmultiplegenomealignmentsonthebasisofconservedandstablesecondarystructures
AT luzhij discoveryofnovelncrnasequencesinmultiplegenomealignmentsonthebasisofconservedandstablesecondarystructures
AT zhaoshan discoveryofnovelncrnasequencesinmultiplegenomealignmentsonthebasisofconservedandstablesecondarystructures
AT mathewsdavidh discoveryofnovelncrnasequencesinmultiplegenomealignmentsonthebasisofconservedandstablesecondarystructures