Cargando…

Denoising large-scale biological data using network filters

BACKGROUND: Large-scale biological data sets are often contaminated by noise, which can impede accurate inferences about underlying processes. Such measurement noise can arise from endogenous biological factors like cell cycle and life history variation, and from exogenous technical factors like sam...

Descripción completa

Detalles Bibliográficos
Autores principales: Kavran, Andrew J., Clauset, Aaron
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7992843/
https://www.ncbi.nlm.nih.gov/pubmed/33765911
http://dx.doi.org/10.1186/s12859-021-04075-x
_version_ 1783669463772561408
author Kavran, Andrew J.
Clauset, Aaron
author_facet Kavran, Andrew J.
Clauset, Aaron
author_sort Kavran, Andrew J.
collection PubMed
description BACKGROUND: Large-scale biological data sets are often contaminated by noise, which can impede accurate inferences about underlying processes. Such measurement noise can arise from endogenous biological factors like cell cycle and life history variation, and from exogenous technical factors like sample preparation and instrument variation. RESULTS: We describe a general method for automatically reducing noise in large-scale biological data sets. This method uses an interaction network to identify groups of correlated or anti-correlated measurements that can be combined or “filtered” to better recover an underlying biological signal. Similar to the process of denoising an image, a single network filter may be applied to an entire system, or the system may be first decomposed into distinct modules and a different filter applied to each. Applied to synthetic data with known network structure and signal, network filters accurately reduce noise across a wide range of noise levels and structures. Applied to a machine learning task of predicting changes in human protein expression in healthy and cancerous tissues, network filtering prior to training increases accuracy up to 43% compared to using unfiltered data. CONCLUSIONS: Network filters are a general way to denoise biological data and can account for both correlation and anti-correlation between different measurements. Furthermore, we find that partitioning a network prior to filtering can significantly reduce errors in networks with heterogenous data and correlation patterns, and this approach outperforms existing diffusion based methods. Our results on proteomics data indicate the broad potential utility of network filters to applications in systems biology. SUPPLEMENTARY INFORMATION: The online version supplementary material available at 10.1186/s12859-021-04075-x.
format Online
Article
Text
id pubmed-7992843
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-79928432021-03-25 Denoising large-scale biological data using network filters Kavran, Andrew J. Clauset, Aaron BMC Bioinformatics Research Article BACKGROUND: Large-scale biological data sets are often contaminated by noise, which can impede accurate inferences about underlying processes. Such measurement noise can arise from endogenous biological factors like cell cycle and life history variation, and from exogenous technical factors like sample preparation and instrument variation. RESULTS: We describe a general method for automatically reducing noise in large-scale biological data sets. This method uses an interaction network to identify groups of correlated or anti-correlated measurements that can be combined or “filtered” to better recover an underlying biological signal. Similar to the process of denoising an image, a single network filter may be applied to an entire system, or the system may be first decomposed into distinct modules and a different filter applied to each. Applied to synthetic data with known network structure and signal, network filters accurately reduce noise across a wide range of noise levels and structures. Applied to a machine learning task of predicting changes in human protein expression in healthy and cancerous tissues, network filtering prior to training increases accuracy up to 43% compared to using unfiltered data. CONCLUSIONS: Network filters are a general way to denoise biological data and can account for both correlation and anti-correlation between different measurements. Furthermore, we find that partitioning a network prior to filtering can significantly reduce errors in networks with heterogenous data and correlation patterns, and this approach outperforms existing diffusion based methods. Our results on proteomics data indicate the broad potential utility of network filters to applications in systems biology. SUPPLEMENTARY INFORMATION: The online version supplementary material available at 10.1186/s12859-021-04075-x. BioMed Central 2021-03-25 /pmc/articles/PMC7992843/ /pubmed/33765911 http://dx.doi.org/10.1186/s12859-021-04075-x Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Kavran, Andrew J.
Clauset, Aaron
Denoising large-scale biological data using network filters
title Denoising large-scale biological data using network filters
title_full Denoising large-scale biological data using network filters
title_fullStr Denoising large-scale biological data using network filters
title_full_unstemmed Denoising large-scale biological data using network filters
title_short Denoising large-scale biological data using network filters
title_sort denoising large-scale biological data using network filters
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7992843/
https://www.ncbi.nlm.nih.gov/pubmed/33765911
http://dx.doi.org/10.1186/s12859-021-04075-x
work_keys_str_mv AT kavranandrewj denoisinglargescalebiologicaldatausingnetworkfilters
AT clausetaaron denoisinglargescalebiologicaldatausingnetworkfilters