Cargando…

COT: an efficient and accurate method for detecting marker genes among many subtypes

MOTIVATION: Ideally, a molecularly distinct subtype would be composed of molecular features that are expressed uniquely in the subtype of interest but in no others—so-called marker genes (MGs). MG plays a critical role in the characterization, classification or deconvolution of tissue or cell subtyp...

Descripción completa

Detalles Bibliográficos
Autores principales: Lu, Yingzhou, Wu, Chiung-Ting, Parker, Sarah J, Cheng, Zuolin, Saylor, Georgia, Van Eyk, Jennifer E, Yu, Guoqiang, Clarke, Robert, Herrington, David M, Wang, Yue
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9163574/
https://www.ncbi.nlm.nih.gov/pubmed/35673616
http://dx.doi.org/10.1093/bioadv/vbac037
_version_ 1784719950869430272
author Lu, Yingzhou
Wu, Chiung-Ting
Parker, Sarah J
Cheng, Zuolin
Saylor, Georgia
Van Eyk, Jennifer E
Yu, Guoqiang
Clarke, Robert
Herrington, David M
Wang, Yue
author_facet Lu, Yingzhou
Wu, Chiung-Ting
Parker, Sarah J
Cheng, Zuolin
Saylor, Georgia
Van Eyk, Jennifer E
Yu, Guoqiang
Clarke, Robert
Herrington, David M
Wang, Yue
author_sort Lu, Yingzhou
collection PubMed
description MOTIVATION: Ideally, a molecularly distinct subtype would be composed of molecular features that are expressed uniquely in the subtype of interest but in no others—so-called marker genes (MGs). MG plays a critical role in the characterization, classification or deconvolution of tissue or cell subtypes. We and others have recognized that the test statistics used by most methods do not exactly satisfy the MG definition and often identify inaccurate MG. RESULTS: We report an efficient and accurate data-driven method, formulated as a Cosine-based One-sample Test (COT) in scatter space, to detect MG among many subtypes using subtype expression profiles. Fundamentally different from existing approaches, the test statistic in COT precisely matches the mathematical definition of an ideal MG. We demonstrate the performance and utility of COT on both simulated and real gene expression and proteomics data. The open source Python/R tool will allow biologists to efficiently detect MG and perform a more comprehensive and unbiased molecular characterization of tissue or cell subtypes in many biomedical contexts. Nevertheless, COT complements not replaces existing methods. AVAILABILITY AND IMPLEMENTATION: The Python COT software with a detailed user’s manual and a vignette are freely available at https://github.com/MintaYLu/COT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format Online
Article
Text
id pubmed-9163574
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-91635742022-06-05 COT: an efficient and accurate method for detecting marker genes among many subtypes Lu, Yingzhou Wu, Chiung-Ting Parker, Sarah J Cheng, Zuolin Saylor, Georgia Van Eyk, Jennifer E Yu, Guoqiang Clarke, Robert Herrington, David M Wang, Yue Bioinform Adv Application Note MOTIVATION: Ideally, a molecularly distinct subtype would be composed of molecular features that are expressed uniquely in the subtype of interest but in no others—so-called marker genes (MGs). MG plays a critical role in the characterization, classification or deconvolution of tissue or cell subtypes. We and others have recognized that the test statistics used by most methods do not exactly satisfy the MG definition and often identify inaccurate MG. RESULTS: We report an efficient and accurate data-driven method, formulated as a Cosine-based One-sample Test (COT) in scatter space, to detect MG among many subtypes using subtype expression profiles. Fundamentally different from existing approaches, the test statistic in COT precisely matches the mathematical definition of an ideal MG. We demonstrate the performance and utility of COT on both simulated and real gene expression and proteomics data. The open source Python/R tool will allow biologists to efficiently detect MG and perform a more comprehensive and unbiased molecular characterization of tissue or cell subtypes in many biomedical contexts. Nevertheless, COT complements not replaces existing methods. AVAILABILITY AND IMPLEMENTATION: The Python COT software with a detailed user’s manual and a vignette are freely available at https://github.com/MintaYLu/COT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2022-05-27 /pmc/articles/PMC9163574/ /pubmed/35673616 http://dx.doi.org/10.1093/bioadv/vbac037 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Application Note
Lu, Yingzhou
Wu, Chiung-Ting
Parker, Sarah J
Cheng, Zuolin
Saylor, Georgia
Van Eyk, Jennifer E
Yu, Guoqiang
Clarke, Robert
Herrington, David M
Wang, Yue
COT: an efficient and accurate method for detecting marker genes among many subtypes
title COT: an efficient and accurate method for detecting marker genes among many subtypes
title_full COT: an efficient and accurate method for detecting marker genes among many subtypes
title_fullStr COT: an efficient and accurate method for detecting marker genes among many subtypes
title_full_unstemmed COT: an efficient and accurate method for detecting marker genes among many subtypes
title_short COT: an efficient and accurate method for detecting marker genes among many subtypes
title_sort cot: an efficient and accurate method for detecting marker genes among many subtypes
topic Application Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9163574/
https://www.ncbi.nlm.nih.gov/pubmed/35673616
http://dx.doi.org/10.1093/bioadv/vbac037
work_keys_str_mv AT luyingzhou cotanefficientandaccuratemethodfordetectingmarkergenesamongmanysubtypes
AT wuchiungting cotanefficientandaccuratemethodfordetectingmarkergenesamongmanysubtypes
AT parkersarahj cotanefficientandaccuratemethodfordetectingmarkergenesamongmanysubtypes
AT chengzuolin cotanefficientandaccuratemethodfordetectingmarkergenesamongmanysubtypes
AT saylorgeorgia cotanefficientandaccuratemethodfordetectingmarkergenesamongmanysubtypes
AT vaneykjennifere cotanefficientandaccuratemethodfordetectingmarkergenesamongmanysubtypes
AT yuguoqiang cotanefficientandaccuratemethodfordetectingmarkergenesamongmanysubtypes
AT clarkerobert cotanefficientandaccuratemethodfordetectingmarkergenesamongmanysubtypes
AT herringtondavidm cotanefficientandaccuratemethodfordetectingmarkergenesamongmanysubtypes
AT wangyue cotanefficientandaccuratemethodfordetectingmarkergenesamongmanysubtypes