Cargando…

LSTrAP: efficiently combining RNA sequencing data into co-expression networks

BACKGROUND: Since experimental elucidation of gene function is often laborious, various in silico methods have been developed to predict gene function of uncharacterized genes. Since functionally related genes are often expressed in the same tissues, conditions and developmental stages (co-expressed...

Descripción completa

Detalles Bibliográficos
Autores principales:	Proost, Sebastian, Krawczyk, Agnieszka, Mutwil, Marek
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5634843/ https://www.ncbi.nlm.nih.gov/pubmed/29017446 http://dx.doi.org/10.1186/s12859-017-1861-z

_version_	1783270171321827328
author	Proost, Sebastian Krawczyk, Agnieszka Mutwil, Marek
author_facet	Proost, Sebastian Krawczyk, Agnieszka Mutwil, Marek
author_sort	Proost, Sebastian
collection	PubMed
description	BACKGROUND: Since experimental elucidation of gene function is often laborious, various in silico methods have been developed to predict gene function of uncharacterized genes. Since functionally related genes are often expressed in the same tissues, conditions and developmental stages (co-expressed), functional annotation of characterized genes can be transferred to co-expressed genes lacking annotation. With genome-wide expression data available, the construction of co-expression networks, where genes are nodes and edges connect significantly co-expressed genes, provides unprecedented opportunities to predict gene function. However, the construction of such networks requires large volumes of high-quality data, multiple processing steps and a considerable amount of computation power. While efficient tools exist to process RNA-Seq data, pipelines which combine them to construct co-expression networks efficiently are currently lacking. RESULTS: LSTrAP (Large-Scale Transcriptome Analysis Pipeline), presented here, combines all essential tools to construct co-expression networks based on RNA-Seq data into a single, efficient workflow. By supporting parallel computing on computer cluster infrastructure, processing hundreds of samples becomes feasible as shown here for Arabidopsis thaliana and Sorghum bicolor, which comprised 876 and 215 samples respectively. The former was used here to show how the quality control, included in LSTrAP, can detect spurious or low-quality samples. The latter was used to show how co-expression networks are able to group known photosynthesis genes and imply a role in this process of several, currently uncharacterized, genes. CONCLUSIONS: LSTrAP combines the most popular and performant methods to construct co-expression networks from RNA-Seq data into a single workflow. This allows large amounts of expression data, required to construct co-expression networks, to be processed efficiently and consistently across hundreds of samples. LSTrAP is implemented in Python 3.4 (or higher) and available under MIT license from https://github.molgen.mpg.de/proost/LSTrAP ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-017-1861-z) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5634843
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-56348432017-10-19 LSTrAP: efficiently combining RNA sequencing data into co-expression networks Proost, Sebastian Krawczyk, Agnieszka Mutwil, Marek BMC Bioinformatics Software BACKGROUND: Since experimental elucidation of gene function is often laborious, various in silico methods have been developed to predict gene function of uncharacterized genes. Since functionally related genes are often expressed in the same tissues, conditions and developmental stages (co-expressed), functional annotation of characterized genes can be transferred to co-expressed genes lacking annotation. With genome-wide expression data available, the construction of co-expression networks, where genes are nodes and edges connect significantly co-expressed genes, provides unprecedented opportunities to predict gene function. However, the construction of such networks requires large volumes of high-quality data, multiple processing steps and a considerable amount of computation power. While efficient tools exist to process RNA-Seq data, pipelines which combine them to construct co-expression networks efficiently are currently lacking. RESULTS: LSTrAP (Large-Scale Transcriptome Analysis Pipeline), presented here, combines all essential tools to construct co-expression networks based on RNA-Seq data into a single, efficient workflow. By supporting parallel computing on computer cluster infrastructure, processing hundreds of samples becomes feasible as shown here for Arabidopsis thaliana and Sorghum bicolor, which comprised 876 and 215 samples respectively. The former was used here to show how the quality control, included in LSTrAP, can detect spurious or low-quality samples. The latter was used to show how co-expression networks are able to group known photosynthesis genes and imply a role in this process of several, currently uncharacterized, genes. CONCLUSIONS: LSTrAP combines the most popular and performant methods to construct co-expression networks from RNA-Seq data into a single workflow. This allows large amounts of expression data, required to construct co-expression networks, to be processed efficiently and consistently across hundreds of samples. LSTrAP is implemented in Python 3.4 (or higher) and available under MIT license from https://github.molgen.mpg.de/proost/LSTrAP ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-017-1861-z) contains supplementary material, which is available to authorized users. BioMed Central 2017-10-10 /pmc/articles/PMC5634843/ /pubmed/29017446 http://dx.doi.org/10.1186/s12859-017-1861-z Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Proost, Sebastian Krawczyk, Agnieszka Mutwil, Marek LSTrAP: efficiently combining RNA sequencing data into co-expression networks
title	LSTrAP: efficiently combining RNA sequencing data into co-expression networks
title_full	LSTrAP: efficiently combining RNA sequencing data into co-expression networks
title_fullStr	LSTrAP: efficiently combining RNA sequencing data into co-expression networks
title_full_unstemmed	LSTrAP: efficiently combining RNA sequencing data into co-expression networks
title_short	LSTrAP: efficiently combining RNA sequencing data into co-expression networks
title_sort	lstrap: efficiently combining rna sequencing data into co-expression networks
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5634843/ https://www.ncbi.nlm.nih.gov/pubmed/29017446 http://dx.doi.org/10.1186/s12859-017-1861-z
work_keys_str_mv	AT proostsebastian lstrapefficientlycombiningrnasequencingdataintocoexpressionnetworks AT krawczykagnieszka lstrapefficientlycombiningrnasequencingdataintocoexpressionnetworks AT mutwilmarek lstrapefficientlycombiningrnasequencingdataintocoexpressionnetworks

LSTrAP: efficiently combining RNA sequencing data into co-expression networks

Ejemplares similares