Cargando…
PeacoQC: Peak‐based selection of high quality cytometry data
In cytometry analysis, a large number of markers is measured for thousands or millions of cells, resulting in high‐dimensional datasets. During the measurement of these samples, erroneous events can occur such as clogs, speed changes, slow uptake of the sample etc., which can influence the downstrea...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley & Sons, Inc.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9293479/ https://www.ncbi.nlm.nih.gov/pubmed/34549881 http://dx.doi.org/10.1002/cyto.a.24501 |
_version_ | 1784749642094739456 |
---|---|
author | Emmaneel, Annelies Quintelier, Katrien Sichien, Dorine Rybakowska, Paulina Marañón, Concepción Alarcón‐Riquelme, Marta E. Van Isterdael, Gert Van Gassen, Sofie Saeys, Yvan |
author_facet | Emmaneel, Annelies Quintelier, Katrien Sichien, Dorine Rybakowska, Paulina Marañón, Concepción Alarcón‐Riquelme, Marta E. Van Isterdael, Gert Van Gassen, Sofie Saeys, Yvan |
author_sort | Emmaneel, Annelies |
collection | PubMed |
description | In cytometry analysis, a large number of markers is measured for thousands or millions of cells, resulting in high‐dimensional datasets. During the measurement of these samples, erroneous events can occur such as clogs, speed changes, slow uptake of the sample etc., which can influence the downstream analysis and can even lead to false discoveries. As these issues can be difficult to detect manually, an automated approach is recommended. In order to filter these erroneous events out, we created a novel quality control algorithm, Peak Extraction And Cleaning Oriented Quality Control (PeacoQC), that allows for automated cleaning of cytometry data. The algorithm will determine density peaks per channel on which it will remove low quality events based on their position in the isolation tree and on their mean absolute deviation distance to these density peaks. To evaluate PeacoQC's cleaning capability, it was compared to three other existing quality control algorithms (flowAI, flowClean and flowCut) on a wide variety of datasets. In comparison to the other algorithms, PeacoQC was able to filter out all different types of anomalies in flow, mass and spectral cytometry data, while the other methods struggled with at least one type. In the quantitative comparison, PeacoQC obtained the highest median balanced accuracy and a similar running time compared to the other algorithms while having a better scalability for large files. To ensure that the parameters chosen in the PeacoQC algorithm are robust, the cleaning tool was run on 16 public datasets. After inspection, only one sample was found where the parameters should be further optimized. The other 15 datasets were analyzed correctly indicating a robust parameter choice. Overall, we present a fast and accurate quality control algorithm that outperforms existing tools and ensures high‐quality data that can be used for further downstream analysis. An R implementation is available. |
format | Online Article Text |
id | pubmed-9293479 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | John Wiley & Sons, Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-92934792022-07-20 PeacoQC: Peak‐based selection of high quality cytometry data Emmaneel, Annelies Quintelier, Katrien Sichien, Dorine Rybakowska, Paulina Marañón, Concepción Alarcón‐Riquelme, Marta E. Van Isterdael, Gert Van Gassen, Sofie Saeys, Yvan Cytometry A Computational Article In cytometry analysis, a large number of markers is measured for thousands or millions of cells, resulting in high‐dimensional datasets. During the measurement of these samples, erroneous events can occur such as clogs, speed changes, slow uptake of the sample etc., which can influence the downstream analysis and can even lead to false discoveries. As these issues can be difficult to detect manually, an automated approach is recommended. In order to filter these erroneous events out, we created a novel quality control algorithm, Peak Extraction And Cleaning Oriented Quality Control (PeacoQC), that allows for automated cleaning of cytometry data. The algorithm will determine density peaks per channel on which it will remove low quality events based on their position in the isolation tree and on their mean absolute deviation distance to these density peaks. To evaluate PeacoQC's cleaning capability, it was compared to three other existing quality control algorithms (flowAI, flowClean and flowCut) on a wide variety of datasets. In comparison to the other algorithms, PeacoQC was able to filter out all different types of anomalies in flow, mass and spectral cytometry data, while the other methods struggled with at least one type. In the quantitative comparison, PeacoQC obtained the highest median balanced accuracy and a similar running time compared to the other algorithms while having a better scalability for large files. To ensure that the parameters chosen in the PeacoQC algorithm are robust, the cleaning tool was run on 16 public datasets. After inspection, only one sample was found where the parameters should be further optimized. The other 15 datasets were analyzed correctly indicating a robust parameter choice. Overall, we present a fast and accurate quality control algorithm that outperforms existing tools and ensures high‐quality data that can be used for further downstream analysis. An R implementation is available. John Wiley & Sons, Inc. 2021-10-03 2022-04 /pmc/articles/PMC9293479/ /pubmed/34549881 http://dx.doi.org/10.1002/cyto.a.24501 Text en © 2021 The Authors. Cytometry Part A published by Wiley Periodicals LLC. on behalf of International Society for Advancement of Cytometry. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes. |
spellingShingle | Computational Article Emmaneel, Annelies Quintelier, Katrien Sichien, Dorine Rybakowska, Paulina Marañón, Concepción Alarcón‐Riquelme, Marta E. Van Isterdael, Gert Van Gassen, Sofie Saeys, Yvan PeacoQC: Peak‐based selection of high quality cytometry data |
title | PeacoQC: Peak‐based selection of high quality cytometry data |
title_full | PeacoQC: Peak‐based selection of high quality cytometry data |
title_fullStr | PeacoQC: Peak‐based selection of high quality cytometry data |
title_full_unstemmed | PeacoQC: Peak‐based selection of high quality cytometry data |
title_short | PeacoQC: Peak‐based selection of high quality cytometry data |
title_sort | peacoqc: peak‐based selection of high quality cytometry data |
topic | Computational Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9293479/ https://www.ncbi.nlm.nih.gov/pubmed/34549881 http://dx.doi.org/10.1002/cyto.a.24501 |
work_keys_str_mv | AT emmaneelannelies peacoqcpeakbasedselectionofhighqualitycytometrydata AT quintelierkatrien peacoqcpeakbasedselectionofhighqualitycytometrydata AT sichiendorine peacoqcpeakbasedselectionofhighqualitycytometrydata AT rybakowskapaulina peacoqcpeakbasedselectionofhighqualitycytometrydata AT maranonconcepcion peacoqcpeakbasedselectionofhighqualitycytometrydata AT alarconriquelmemartae peacoqcpeakbasedselectionofhighqualitycytometrydata AT vanisterdaelgert peacoqcpeakbasedselectionofhighqualitycytometrydata AT vangassensofie peacoqcpeakbasedselectionofhighqualitycytometrydata AT saeysyvan peacoqcpeakbasedselectionofhighqualitycytometrydata |