Cargando…

Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size

BACKGROUND: Co-expression has been widely used to identify novel regulatory relationships using high throughput measurements, such as microarray and RNA-seq data. Evaluation studies on co-expression network analysis methods mostly focus on networks of small or medium size of up to a few hundred node...

Descripción completa

Detalles Bibliográficos
Autores principales:	Guo, Wenbin, Calixto, Cristiane P. G., Tzioutziou, Nikoleta, Lin, Ping, Waugh, Robbie, Brown, John W. S., Zhang, Runxuan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5477119/ https://www.ncbi.nlm.nih.gov/pubmed/28629365 http://dx.doi.org/10.1186/s12918-017-0440-2

_version_	1783244728256430080
author	Guo, Wenbin Calixto, Cristiane P. G. Tzioutziou, Nikoleta Lin, Ping Waugh, Robbie Brown, John W. S. Zhang, Runxuan
author_facet	Guo, Wenbin Calixto, Cristiane P. G. Tzioutziou, Nikoleta Lin, Ping Waugh, Robbie Brown, John W. S. Zhang, Runxuan
author_sort	Guo, Wenbin
collection	PubMed
description	BACKGROUND: Co-expression has been widely used to identify novel regulatory relationships using high throughput measurements, such as microarray and RNA-seq data. Evaluation studies on co-expression network analysis methods mostly focus on networks of small or medium size of up to a few hundred nodes. For large networks, simulated expression data usually consist of hundreds or thousands of profiles with different perturbations or knock-outs, which is uncommon in real experiments due to their cost and the amount of work required. Thus, the performances of co-expression network analysis methods on large co-expression networks consisting of a few thousand nodes, with only a small number of profiles with a single perturbation, which more accurately reflect normal experimental conditions, are generally uncharacterized and unknown. METHODS: We proposed a novel network inference methods based on Relevance Low order Partial Correlation (RLowPC). RLowPC method uses a two-step approach to select on the high-confidence edges first by reducing the search space by only picking the top ranked genes from an intial partial correlation analysis and, then computes the partial correlations in the confined search space by only removing the linear dependencies from the shared neighbours, largely ignoring the genes showing lower association. RESULTS: We selected six co-expression-based methods with good performance in evaluation studies from the literature: Partial correlation, PCIT, ARACNE, MRNET, MRNETB and CLR. The evaluation of these methods was carried out on simulated time-series data with various network sizes ranging from 100 to 3000 nodes. Simulation results show low precision and recall for all of the above methods for large networks with a small number of expression profiles. We improved the inference significantly by refinement of the top weighted edges in the pre-inferred partial correlation networks using RLowPC. We found improved performance by partitioning large networks into smaller co-expressed modules when assessing the method performance within these modules. CONCLUSIONS: The evaluation results show that current methods suffer from low precision and recall for large co-expression networks where only a small number of profiles are available. The proposed RLowPC method effectively reduces the indirect edges predicted as regulatory relationships and increases the precision of top ranked predictions. Partitioning large networks into smaller highly co-expressed modules also helps to improve the performance of network inference methods. The RLowPC R package for network construction, refinement and evaluation is available at GitHub: https://github.com/wyguo/RLowPC. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12918-017-0440-2) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5477119
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-54771192017-06-22 Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size Guo, Wenbin Calixto, Cristiane P. G. Tzioutziou, Nikoleta Lin, Ping Waugh, Robbie Brown, John W. S. Zhang, Runxuan BMC Syst Biol Methodology Article BACKGROUND: Co-expression has been widely used to identify novel regulatory relationships using high throughput measurements, such as microarray and RNA-seq data. Evaluation studies on co-expression network analysis methods mostly focus on networks of small or medium size of up to a few hundred nodes. For large networks, simulated expression data usually consist of hundreds or thousands of profiles with different perturbations or knock-outs, which is uncommon in real experiments due to their cost and the amount of work required. Thus, the performances of co-expression network analysis methods on large co-expression networks consisting of a few thousand nodes, with only a small number of profiles with a single perturbation, which more accurately reflect normal experimental conditions, are generally uncharacterized and unknown. METHODS: We proposed a novel network inference methods based on Relevance Low order Partial Correlation (RLowPC). RLowPC method uses a two-step approach to select on the high-confidence edges first by reducing the search space by only picking the top ranked genes from an intial partial correlation analysis and, then computes the partial correlations in the confined search space by only removing the linear dependencies from the shared neighbours, largely ignoring the genes showing lower association. RESULTS: We selected six co-expression-based methods with good performance in evaluation studies from the literature: Partial correlation, PCIT, ARACNE, MRNET, MRNETB and CLR. The evaluation of these methods was carried out on simulated time-series data with various network sizes ranging from 100 to 3000 nodes. Simulation results show low precision and recall for all of the above methods for large networks with a small number of expression profiles. We improved the inference significantly by refinement of the top weighted edges in the pre-inferred partial correlation networks using RLowPC. We found improved performance by partitioning large networks into smaller co-expressed modules when assessing the method performance within these modules. CONCLUSIONS: The evaluation results show that current methods suffer from low precision and recall for large co-expression networks where only a small number of profiles are available. The proposed RLowPC method effectively reduces the indirect edges predicted as regulatory relationships and increases the precision of top ranked predictions. Partitioning large networks into smaller highly co-expressed modules also helps to improve the performance of network inference methods. The RLowPC R package for network construction, refinement and evaluation is available at GitHub: https://github.com/wyguo/RLowPC. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12918-017-0440-2) contains supplementary material, which is available to authorized users. BioMed Central 2017-06-19 /pmc/articles/PMC5477119/ /pubmed/28629365 http://dx.doi.org/10.1186/s12918-017-0440-2 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Guo, Wenbin Calixto, Cristiane P. G. Tzioutziou, Nikoleta Lin, Ping Waugh, Robbie Brown, John W. S. Zhang, Runxuan Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size
title	Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size
title_full	Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size
title_fullStr	Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size
title_full_unstemmed	Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size
title_short	Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size
title_sort	evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5477119/ https://www.ncbi.nlm.nih.gov/pubmed/28629365 http://dx.doi.org/10.1186/s12918-017-0440-2
work_keys_str_mv	AT guowenbin evaluationandimprovementoftheregulatoryinferenceforlargecoexpressionnetworkswithlimitedsamplesize AT calixtocristianepg evaluationandimprovementoftheregulatoryinferenceforlargecoexpressionnetworkswithlimitedsamplesize AT tzioutziounikoleta evaluationandimprovementoftheregulatoryinferenceforlargecoexpressionnetworkswithlimitedsamplesize AT linping evaluationandimprovementoftheregulatoryinferenceforlargecoexpressionnetworkswithlimitedsamplesize AT waughrobbie evaluationandimprovementoftheregulatoryinferenceforlargecoexpressionnetworkswithlimitedsamplesize AT brownjohnws evaluationandimprovementoftheregulatoryinferenceforlargecoexpressionnetworkswithlimitedsamplesize AT zhangrunxuan evaluationandimprovementoftheregulatoryinferenceforlargecoexpressionnetworkswithlimitedsamplesize

Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size

Ejemplares similares