Cargando…

Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum

Identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected un...

Descripción completa

Detalles Bibliográficos
Autores principales: Chou, Wen-Chi, Ma, Qin, Yang, Shihui, Cao, Sha, Klingeman, Dawn M., Brown, Steven D., Xu, Ying
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4446414/
https://www.ncbi.nlm.nih.gov/pubmed/25765651
http://dx.doi.org/10.1093/nar/gkv177
_version_ 1782373419241701376
author Chou, Wen-Chi
Ma, Qin
Yang, Shihui
Cao, Sha
Klingeman, Dawn M.
Brown, Steven D.
Xu, Ying
author_facet Chou, Wen-Chi
Ma, Qin
Yang, Shihui
Cao, Sha
Klingeman, Dawn M.
Brown, Steven D.
Xu, Ying
author_sort Chou, Wen-Chi
collection PubMed
description Identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets. Among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available at https://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.
format Online
Article
Text
id pubmed-4446414
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-44464142015-06-15 Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum Chou, Wen-Chi Ma, Qin Yang, Shihui Cao, Sha Klingeman, Dawn M. Brown, Steven D. Xu, Ying Nucleic Acids Res Methods Online Identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets. Among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available at https://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria. Oxford University Press 2015-05-26 2015-03-12 /pmc/articles/PMC4446414/ /pubmed/25765651 http://dx.doi.org/10.1093/nar/gkv177 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Chou, Wen-Chi
Ma, Qin
Yang, Shihui
Cao, Sha
Klingeman, Dawn M.
Brown, Steven D.
Xu, Ying
Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum
title Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum
title_full Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum
title_fullStr Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum
title_full_unstemmed Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum
title_short Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum
title_sort analysis of strand-specific rna-seq data using machine learning reveals the structures of transcription units in clostridium thermocellum
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4446414/
https://www.ncbi.nlm.nih.gov/pubmed/25765651
http://dx.doi.org/10.1093/nar/gkv177
work_keys_str_mv AT chouwenchi analysisofstrandspecificrnaseqdatausingmachinelearningrevealsthestructuresoftranscriptionunitsinclostridiumthermocellum
AT maqin analysisofstrandspecificrnaseqdatausingmachinelearningrevealsthestructuresoftranscriptionunitsinclostridiumthermocellum
AT yangshihui analysisofstrandspecificrnaseqdatausingmachinelearningrevealsthestructuresoftranscriptionunitsinclostridiumthermocellum
AT caosha analysisofstrandspecificrnaseqdatausingmachinelearningrevealsthestructuresoftranscriptionunitsinclostridiumthermocellum
AT klingemandawnm analysisofstrandspecificrnaseqdatausingmachinelearningrevealsthestructuresoftranscriptionunitsinclostridiumthermocellum
AT brownstevend analysisofstrandspecificrnaseqdatausingmachinelearningrevealsthestructuresoftranscriptionunitsinclostridiumthermocellum
AT xuying analysisofstrandspecificrnaseqdatausingmachinelearningrevealsthestructuresoftranscriptionunitsinclostridiumthermocellum