Cargando…
Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures
Recently, non-coding RNAs (ncRNAs) have been discovered with novel functions, and it has been appreciated that there is pervasive transcription of genomes. Moreover, many novel ncRNAs are not conserved on the primary sequence level. Therefore, de novo computational ncRNA detection that is accurate a...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4468099/ https://www.ncbi.nlm.nih.gov/pubmed/26075601 http://dx.doi.org/10.1371/journal.pone.0130200 |
_version_ | 1782376438854320128 |
---|---|
author | Fu, Yinghan Xu, Zhenjiang Zech Lu, Zhi J. Zhao, Shan Mathews, David H. |
author_facet | Fu, Yinghan Xu, Zhenjiang Zech Lu, Zhi J. Zhao, Shan Mathews, David H. |
author_sort | Fu, Yinghan |
collection | PubMed |
description | Recently, non-coding RNAs (ncRNAs) have been discovered with novel functions, and it has been appreciated that there is pervasive transcription of genomes. Moreover, many novel ncRNAs are not conserved on the primary sequence level. Therefore, de novo computational ncRNA detection that is accurate and efficient is desirable. The purpose of this study is to develop a ncRNA detection method based on conservation of structure in more than two genomes. A new method called Multifind, using Multilign, was developed. Multilign predicts the common secondary structure for multiple input sequences. Multifind then uses measures of structure conservation to estimate the probability that the input sequences are a conserved ncRNA using a classification support vector machine. Multilign is based on Dynalign, which folds and aligns two sequences simultaneously using a scoring scheme that does not include sequence identity; its structure prediction quality is therefore not affected by input sequence diversity. Additionally, ensemble defect was introduced to Multifind as an additional discriminating feature that quantifies the compactness of the folding space for a sequence. Benchmarks showed Multifind performs better than RNAz and LocARNATE+RNAz, a method that uses RNAz on structure alignments generated by LocARNATE, on testing sequences extracted from the Rfam database. For de novo ncRNA discovery in three genomes, Multifind and LocARNATE+RNAz had an advantage over RNAz in low similarity regions of genome alignments. Additionally, Multifind and LocARNATE+RNAz found different subsets of known ncRNA sequences, suggesting the two approaches are complementary. |
format | Online Article Text |
id | pubmed-4468099 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-44680992015-06-25 Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures Fu, Yinghan Xu, Zhenjiang Zech Lu, Zhi J. Zhao, Shan Mathews, David H. PLoS One Research Article Recently, non-coding RNAs (ncRNAs) have been discovered with novel functions, and it has been appreciated that there is pervasive transcription of genomes. Moreover, many novel ncRNAs are not conserved on the primary sequence level. Therefore, de novo computational ncRNA detection that is accurate and efficient is desirable. The purpose of this study is to develop a ncRNA detection method based on conservation of structure in more than two genomes. A new method called Multifind, using Multilign, was developed. Multilign predicts the common secondary structure for multiple input sequences. Multifind then uses measures of structure conservation to estimate the probability that the input sequences are a conserved ncRNA using a classification support vector machine. Multilign is based on Dynalign, which folds and aligns two sequences simultaneously using a scoring scheme that does not include sequence identity; its structure prediction quality is therefore not affected by input sequence diversity. Additionally, ensemble defect was introduced to Multifind as an additional discriminating feature that quantifies the compactness of the folding space for a sequence. Benchmarks showed Multifind performs better than RNAz and LocARNATE+RNAz, a method that uses RNAz on structure alignments generated by LocARNATE, on testing sequences extracted from the Rfam database. For de novo ncRNA discovery in three genomes, Multifind and LocARNATE+RNAz had an advantage over RNAz in low similarity regions of genome alignments. Additionally, Multifind and LocARNATE+RNAz found different subsets of known ncRNA sequences, suggesting the two approaches are complementary. Public Library of Science 2015-06-15 /pmc/articles/PMC4468099/ /pubmed/26075601 http://dx.doi.org/10.1371/journal.pone.0130200 Text en © 2015 Fu et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Fu, Yinghan Xu, Zhenjiang Zech Lu, Zhi J. Zhao, Shan Mathews, David H. Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures |
title | Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures |
title_full | Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures |
title_fullStr | Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures |
title_full_unstemmed | Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures |
title_short | Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures |
title_sort | discovery of novel ncrna sequences in multiple genome alignments on the basis of conserved and stable secondary structures |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4468099/ https://www.ncbi.nlm.nih.gov/pubmed/26075601 http://dx.doi.org/10.1371/journal.pone.0130200 |
work_keys_str_mv | AT fuyinghan discoveryofnovelncrnasequencesinmultiplegenomealignmentsonthebasisofconservedandstablesecondarystructures AT xuzhenjiangzech discoveryofnovelncrnasequencesinmultiplegenomealignmentsonthebasisofconservedandstablesecondarystructures AT luzhij discoveryofnovelncrnasequencesinmultiplegenomealignmentsonthebasisofconservedandstablesecondarystructures AT zhaoshan discoveryofnovelncrnasequencesinmultiplegenomealignmentsonthebasisofconservedandstablesecondarystructures AT mathewsdavidh discoveryofnovelncrnasequencesinmultiplegenomealignmentsonthebasisofconservedandstablesecondarystructures |