Cargando…
Kmerator Suite: design of specific k-mer signatures and automatic metadata discovery in large RNA-seq datasets
The huge body of publicly available RNA-sequencing (RNA-seq) libraries is a treasure of functional information allowing to quantify the expression of known or novel transcripts in tissues. However, transcript quantification commonly relies on alignment methods requiring a lot of computational resour...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8221386/ https://www.ncbi.nlm.nih.gov/pubmed/34179780 http://dx.doi.org/10.1093/nargab/lqab058 |
_version_ | 1783711320338595840 |
---|---|
author | Riquier, Sébastien Bessiere, Chloé Guibert, Benoit Bouge, Anne-Laure Boureux, Anthony Ruffle, Florence Audoux, Jérôme Gilbert, Nicolas Xue, Haoliang Gautheret, Daniel Commes, Thérèse |
author_facet | Riquier, Sébastien Bessiere, Chloé Guibert, Benoit Bouge, Anne-Laure Boureux, Anthony Ruffle, Florence Audoux, Jérôme Gilbert, Nicolas Xue, Haoliang Gautheret, Daniel Commes, Thérèse |
author_sort | Riquier, Sébastien |
collection | PubMed |
description | The huge body of publicly available RNA-sequencing (RNA-seq) libraries is a treasure of functional information allowing to quantify the expression of known or novel transcripts in tissues. However, transcript quantification commonly relies on alignment methods requiring a lot of computational resources and processing time, which does not scale easily to large datasets. K-mer decomposition constitutes a new way to process RNA-seq data for the identification of transcriptional signatures, as k-mers can be used to quantify accurately gene expression in a less resource-consuming way. We present the Kmerator Suite, a set of three tools designed to extract specific k-mer signatures, quantify these k-mers into RNA-seq datasets and quickly visualize large dataset characteristics. The core tool, Kmerator, produces specific k-mers for 97% of human genes, enabling the measure of gene expression with high accuracy in simulated datasets. KmerExploR, a direct application of Kmerator, uses a set of predictor gene-specific k-mers to infer metadata including library protocol, sample features or contaminations from RNA-seq datasets. KmerExploR results are visualized through a user-friendly interface. Moreover, we demonstrate that the Kmerator Suite can be used for advanced queries targeting known or new biomarkers such as mutations, gene fusions or long non-coding RNAs for human health applications. |
format | Online Article Text |
id | pubmed-8221386 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-82213862021-06-24 Kmerator Suite: design of specific k-mer signatures and automatic metadata discovery in large RNA-seq datasets Riquier, Sébastien Bessiere, Chloé Guibert, Benoit Bouge, Anne-Laure Boureux, Anthony Ruffle, Florence Audoux, Jérôme Gilbert, Nicolas Xue, Haoliang Gautheret, Daniel Commes, Thérèse NAR Genom Bioinform Methods Article The huge body of publicly available RNA-sequencing (RNA-seq) libraries is a treasure of functional information allowing to quantify the expression of known or novel transcripts in tissues. However, transcript quantification commonly relies on alignment methods requiring a lot of computational resources and processing time, which does not scale easily to large datasets. K-mer decomposition constitutes a new way to process RNA-seq data for the identification of transcriptional signatures, as k-mers can be used to quantify accurately gene expression in a less resource-consuming way. We present the Kmerator Suite, a set of three tools designed to extract specific k-mer signatures, quantify these k-mers into RNA-seq datasets and quickly visualize large dataset characteristics. The core tool, Kmerator, produces specific k-mers for 97% of human genes, enabling the measure of gene expression with high accuracy in simulated datasets. KmerExploR, a direct application of Kmerator, uses a set of predictor gene-specific k-mers to infer metadata including library protocol, sample features or contaminations from RNA-seq datasets. KmerExploR results are visualized through a user-friendly interface. Moreover, we demonstrate that the Kmerator Suite can be used for advanced queries targeting known or new biomarkers such as mutations, gene fusions or long non-coding RNAs for human health applications. Oxford University Press 2021-06-23 /pmc/articles/PMC8221386/ /pubmed/34179780 http://dx.doi.org/10.1093/nargab/lqab058 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Article Riquier, Sébastien Bessiere, Chloé Guibert, Benoit Bouge, Anne-Laure Boureux, Anthony Ruffle, Florence Audoux, Jérôme Gilbert, Nicolas Xue, Haoliang Gautheret, Daniel Commes, Thérèse Kmerator Suite: design of specific k-mer signatures and automatic metadata discovery in large RNA-seq datasets |
title | Kmerator Suite: design of specific k-mer signatures and automatic metadata discovery in large RNA-seq datasets |
title_full | Kmerator Suite: design of specific k-mer signatures and automatic metadata discovery in large RNA-seq datasets |
title_fullStr | Kmerator Suite: design of specific k-mer signatures and automatic metadata discovery in large RNA-seq datasets |
title_full_unstemmed | Kmerator Suite: design of specific k-mer signatures and automatic metadata discovery in large RNA-seq datasets |
title_short | Kmerator Suite: design of specific k-mer signatures and automatic metadata discovery in large RNA-seq datasets |
title_sort | kmerator suite: design of specific k-mer signatures and automatic metadata discovery in large rna-seq datasets |
topic | Methods Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8221386/ https://www.ncbi.nlm.nih.gov/pubmed/34179780 http://dx.doi.org/10.1093/nargab/lqab058 |
work_keys_str_mv | AT riquiersebastien kmeratorsuitedesignofspecifickmersignaturesandautomaticmetadatadiscoveryinlargernaseqdatasets AT bessierechloe kmeratorsuitedesignofspecifickmersignaturesandautomaticmetadatadiscoveryinlargernaseqdatasets AT guibertbenoit kmeratorsuitedesignofspecifickmersignaturesandautomaticmetadatadiscoveryinlargernaseqdatasets AT bougeannelaure kmeratorsuitedesignofspecifickmersignaturesandautomaticmetadatadiscoveryinlargernaseqdatasets AT boureuxanthony kmeratorsuitedesignofspecifickmersignaturesandautomaticmetadatadiscoveryinlargernaseqdatasets AT ruffleflorence kmeratorsuitedesignofspecifickmersignaturesandautomaticmetadatadiscoveryinlargernaseqdatasets AT audouxjerome kmeratorsuitedesignofspecifickmersignaturesandautomaticmetadatadiscoveryinlargernaseqdatasets AT gilbertnicolas kmeratorsuitedesignofspecifickmersignaturesandautomaticmetadatadiscoveryinlargernaseqdatasets AT xuehaoliang kmeratorsuitedesignofspecifickmersignaturesandautomaticmetadatadiscoveryinlargernaseqdatasets AT gautheretdaniel kmeratorsuitedesignofspecifickmersignaturesandautomaticmetadatadiscoveryinlargernaseqdatasets AT commestherese kmeratorsuitedesignofspecifickmersignaturesandautomaticmetadatadiscoveryinlargernaseqdatasets |