Cargando…

BatchI: Batch effect Identification in high-throughput screening data using a dynamic programming algorithm

MOTIVATION: In contemporary biological experiments, bias, which interferes with the measurements, requires attentive processing. Important sources of bias in high-throughput biological experiments are batch effects and diverse methods towards removal of batch effects have been established. These inc...

Descripción completa

Detalles Bibliográficos
Autores principales:	Papiez, Anna, Marczyk, Michal, Polanska, Joanna, Polanski, Andrzej
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2019
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6546123/ https://www.ncbi.nlm.nih.gov/pubmed/30357412 http://dx.doi.org/10.1093/bioinformatics/bty900

_version_	1783423498771759104
author	Papiez, Anna Marczyk, Michal Polanska, Joanna Polanski, Andrzej
author_facet	Papiez, Anna Marczyk, Michal Polanska, Joanna Polanski, Andrzej
author_sort	Papiez, Anna
collection	PubMed
description	MOTIVATION: In contemporary biological experiments, bias, which interferes with the measurements, requires attentive processing. Important sources of bias in high-throughput biological experiments are batch effects and diverse methods towards removal of batch effects have been established. These include various normalization techniques, yet many require knowledge on the number of batches and assignment of samples to batches. Only few can deal with the problem of identification of batch effect of unknown structure. For this reason, an original batch identification algorithm through dynamical programming is introduced for omics data that may be sorted on a timescale. RESULTS: BatchI algorithm is based on partitioning a series of high-throughput experiment samples into sub-series corresponding to estimated batches. The dynamic programming method is used for splitting data with maximal dispersion between batches, while maintaining minimal within batch dispersion. The procedure has been tested on a number of available datasets with and without prior information about batch partitioning. Datasets with a priori identified batches have been split accordingly, measured with weighted average Dice Index. Batch effect correction is justified by higher intra-group correlation. In the blank datasets, identified batch divisions lead to improvement of parameters and quality of biological information, shown by literature study and Information Content. The outcome of the algorithm serves as a starting point for correction methods. It has been demonstrated that omitting the essential step of batch effect control may lead to waste of valuable potential discoveries. AVAILABILITY AND IMPLEMENTATION: The implementation is available within the BatchI R package at http://zaed.aei.polsl.pl/index.php/pl/111-software. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-6546123
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-65461232019-06-13 BatchI: Batch effect Identification in high-throughput screening data using a dynamic programming algorithm Papiez, Anna Marczyk, Michal Polanska, Joanna Polanski, Andrzej Bioinformatics Original Papers MOTIVATION: In contemporary biological experiments, bias, which interferes with the measurements, requires attentive processing. Important sources of bias in high-throughput biological experiments are batch effects and diverse methods towards removal of batch effects have been established. These include various normalization techniques, yet many require knowledge on the number of batches and assignment of samples to batches. Only few can deal with the problem of identification of batch effect of unknown structure. For this reason, an original batch identification algorithm through dynamical programming is introduced for omics data that may be sorted on a timescale. RESULTS: BatchI algorithm is based on partitioning a series of high-throughput experiment samples into sub-series corresponding to estimated batches. The dynamic programming method is used for splitting data with maximal dispersion between batches, while maintaining minimal within batch dispersion. The procedure has been tested on a number of available datasets with and without prior information about batch partitioning. Datasets with a priori identified batches have been split accordingly, measured with weighted average Dice Index. Batch effect correction is justified by higher intra-group correlation. In the blank datasets, identified batch divisions lead to improvement of parameters and quality of biological information, shown by literature study and Information Content. The outcome of the algorithm serves as a starting point for correction methods. It has been demonstrated that omitting the essential step of batch effect control may lead to waste of valuable potential discoveries. AVAILABILITY AND IMPLEMENTATION: The implementation is available within the BatchI R package at http://zaed.aei.polsl.pl/index.php/pl/111-software. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-06-01 2018-10-24 /pmc/articles/PMC6546123/ /pubmed/30357412 http://dx.doi.org/10.1093/bioinformatics/bty900 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Original Papers Papiez, Anna Marczyk, Michal Polanska, Joanna Polanski, Andrzej BatchI: Batch effect Identification in high-throughput screening data using a dynamic programming algorithm
title	BatchI: Batch effect Identification in high-throughput screening data using a dynamic programming algorithm
title_full	BatchI: Batch effect Identification in high-throughput screening data using a dynamic programming algorithm
title_fullStr	BatchI: Batch effect Identification in high-throughput screening data using a dynamic programming algorithm
title_full_unstemmed	BatchI: Batch effect Identification in high-throughput screening data using a dynamic programming algorithm
title_short	BatchI: Batch effect Identification in high-throughput screening data using a dynamic programming algorithm
title_sort	batchi: batch effect identification in high-throughput screening data using a dynamic programming algorithm
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6546123/ https://www.ncbi.nlm.nih.gov/pubmed/30357412 http://dx.doi.org/10.1093/bioinformatics/bty900
work_keys_str_mv	AT papiezanna batchibatcheffectidentificationinhighthroughputscreeningdatausingadynamicprogrammingalgorithm AT marczykmichal batchibatcheffectidentificationinhighthroughputscreeningdatausingadynamicprogrammingalgorithm AT polanskajoanna batchibatcheffectidentificationinhighthroughputscreeningdatausingadynamicprogrammingalgorithm AT polanskiandrzej batchibatcheffectidentificationinhighthroughputscreeningdatausingadynamicprogrammingalgorithm

BatchI: Batch effect Identification in high-throughput screening data using a dynamic programming algorithm

Ejemplares similares