Cargando…

Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains

The comparison between microbial sequencing data is critical to understand the dynamics of microbial communities. The alignment-based tools analyzing metagenomic datasets require reference sequences and read alignments. The available alignment-free dissimilarity approaches model the background seque...

Descripción completa

Detalles Bibliográficos
Autores principales: Liao, Weinan, Ren, Jie, Wang, Kun, Wang, Shun, Zeng, Feng, Wang, Ying, Sun, Fengzhu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5120338/
https://www.ncbi.nlm.nih.gov/pubmed/27876823
http://dx.doi.org/10.1038/srep37243
_version_ 1782469222258966528
author Liao, Weinan
Ren, Jie
Wang, Kun
Wang, Shun
Zeng, Feng
Wang, Ying
Sun, Fengzhu
author_facet Liao, Weinan
Ren, Jie
Wang, Kun
Wang, Shun
Zeng, Feng
Wang, Ying
Sun, Fengzhu
author_sort Liao, Weinan
collection PubMed
description The comparison between microbial sequencing data is critical to understand the dynamics of microbial communities. The alignment-based tools analyzing metagenomic datasets require reference sequences and read alignments. The available alignment-free dissimilarity approaches model the background sequences with Fixed Order Markov Chain (FOMC) yielding promising results for the comparison of microbial communities. However, in FOMC, the number of parameters grows exponentially with the increase of the order of Markov Chain (MC). Under a fixed high order of MC, the parameters might not be accurately estimated owing to the limitation of sequencing depth. In our study, we investigate an alternative to FOMC to model background sequences with the data-driven Variable Length Markov Chain (VLMC) in metatranscriptomic data. The VLMC originally designed for long sequences was extended to apply to high-throughput sequencing reads and the strategies to estimate the corresponding parameters were developed. The flexible number of parameters in VLMC avoids estimating the vast number of parameters of high-order MC under limited sequencing depth. Different from the manual selection in FOMC, VLMC determines the MC order adaptively. Several beta diversity measures based on VLMC were applied to compare the bacterial RNA-Seq and metatranscriptomic datasets. Experiments show that VLMC outperforms FOMC to model the background sequences in transcriptomic and metatranscriptomic samples. A software pipeline is available at https://d2vlmc.codeplex.com.
format Online
Article
Text
id pubmed-5120338
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-51203382016-11-28 Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains Liao, Weinan Ren, Jie Wang, Kun Wang, Shun Zeng, Feng Wang, Ying Sun, Fengzhu Sci Rep Article The comparison between microbial sequencing data is critical to understand the dynamics of microbial communities. The alignment-based tools analyzing metagenomic datasets require reference sequences and read alignments. The available alignment-free dissimilarity approaches model the background sequences with Fixed Order Markov Chain (FOMC) yielding promising results for the comparison of microbial communities. However, in FOMC, the number of parameters grows exponentially with the increase of the order of Markov Chain (MC). Under a fixed high order of MC, the parameters might not be accurately estimated owing to the limitation of sequencing depth. In our study, we investigate an alternative to FOMC to model background sequences with the data-driven Variable Length Markov Chain (VLMC) in metatranscriptomic data. The VLMC originally designed for long sequences was extended to apply to high-throughput sequencing reads and the strategies to estimate the corresponding parameters were developed. The flexible number of parameters in VLMC avoids estimating the vast number of parameters of high-order MC under limited sequencing depth. Different from the manual selection in FOMC, VLMC determines the MC order adaptively. Several beta diversity measures based on VLMC were applied to compare the bacterial RNA-Seq and metatranscriptomic datasets. Experiments show that VLMC outperforms FOMC to model the background sequences in transcriptomic and metatranscriptomic samples. A software pipeline is available at https://d2vlmc.codeplex.com. Nature Publishing Group 2016-11-23 /pmc/articles/PMC5120338/ /pubmed/27876823 http://dx.doi.org/10.1038/srep37243 Text en Copyright © 2016, The Author(s) http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Liao, Weinan
Ren, Jie
Wang, Kun
Wang, Shun
Zeng, Feng
Wang, Ying
Sun, Fengzhu
Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains
title Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains
title_full Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains
title_fullStr Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains
title_full_unstemmed Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains
title_short Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains
title_sort alignment-free transcriptomic and metatranscriptomic comparison using sequencing signatures with variable length markov chains
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5120338/
https://www.ncbi.nlm.nih.gov/pubmed/27876823
http://dx.doi.org/10.1038/srep37243
work_keys_str_mv AT liaoweinan alignmentfreetranscriptomicandmetatranscriptomiccomparisonusingsequencingsignatureswithvariablelengthmarkovchains
AT renjie alignmentfreetranscriptomicandmetatranscriptomiccomparisonusingsequencingsignatureswithvariablelengthmarkovchains
AT wangkun alignmentfreetranscriptomicandmetatranscriptomiccomparisonusingsequencingsignatureswithvariablelengthmarkovchains
AT wangshun alignmentfreetranscriptomicandmetatranscriptomiccomparisonusingsequencingsignatureswithvariablelengthmarkovchains
AT zengfeng alignmentfreetranscriptomicandmetatranscriptomiccomparisonusingsequencingsignatureswithvariablelengthmarkovchains
AT wangying alignmentfreetranscriptomicandmetatranscriptomiccomparisonusingsequencingsignatureswithvariablelengthmarkovchains
AT sunfengzhu alignmentfreetranscriptomicandmetatranscriptomiccomparisonusingsequencingsignatureswithvariablelengthmarkovchains