Cargando…
Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum
Identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected un...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4446414/ https://www.ncbi.nlm.nih.gov/pubmed/25765651 http://dx.doi.org/10.1093/nar/gkv177 |
_version_ | 1782373419241701376 |
---|---|
author | Chou, Wen-Chi Ma, Qin Yang, Shihui Cao, Sha Klingeman, Dawn M. Brown, Steven D. Xu, Ying |
author_facet | Chou, Wen-Chi Ma, Qin Yang, Shihui Cao, Sha Klingeman, Dawn M. Brown, Steven D. Xu, Ying |
author_sort | Chou, Wen-Chi |
collection | PubMed |
description | Identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets. Among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available at https://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria. |
format | Online Article Text |
id | pubmed-4446414 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-44464142015-06-15 Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum Chou, Wen-Chi Ma, Qin Yang, Shihui Cao, Sha Klingeman, Dawn M. Brown, Steven D. Xu, Ying Nucleic Acids Res Methods Online Identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets. Among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available at https://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria. Oxford University Press 2015-05-26 2015-03-12 /pmc/articles/PMC4446414/ /pubmed/25765651 http://dx.doi.org/10.1093/nar/gkv177 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Chou, Wen-Chi Ma, Qin Yang, Shihui Cao, Sha Klingeman, Dawn M. Brown, Steven D. Xu, Ying Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum |
title | Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum |
title_full | Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum |
title_fullStr | Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum |
title_full_unstemmed | Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum |
title_short | Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum |
title_sort | analysis of strand-specific rna-seq data using machine learning reveals the structures of transcription units in clostridium thermocellum |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4446414/ https://www.ncbi.nlm.nih.gov/pubmed/25765651 http://dx.doi.org/10.1093/nar/gkv177 |
work_keys_str_mv | AT chouwenchi analysisofstrandspecificrnaseqdatausingmachinelearningrevealsthestructuresoftranscriptionunitsinclostridiumthermocellum AT maqin analysisofstrandspecificrnaseqdatausingmachinelearningrevealsthestructuresoftranscriptionunitsinclostridiumthermocellum AT yangshihui analysisofstrandspecificrnaseqdatausingmachinelearningrevealsthestructuresoftranscriptionunitsinclostridiumthermocellum AT caosha analysisofstrandspecificrnaseqdatausingmachinelearningrevealsthestructuresoftranscriptionunitsinclostridiumthermocellum AT klingemandawnm analysisofstrandspecificrnaseqdatausingmachinelearningrevealsthestructuresoftranscriptionunitsinclostridiumthermocellum AT brownstevend analysisofstrandspecificrnaseqdatausingmachinelearningrevealsthestructuresoftranscriptionunitsinclostridiumthermocellum AT xuying analysisofstrandspecificrnaseqdatausingmachinelearningrevealsthestructuresoftranscriptionunitsinclostridiumthermocellum |