Cargando…
SigMat: a classification scheme for gene signature matching
MOTIVATION: Several large-scale efforts have been made to collect gene expression signatures from a variety of biological conditions, such as response of cell lines to treatment with drugs, or tumor samples with different characteristics. These gene signature collections are utilized through bioinfo...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022536/ https://www.ncbi.nlm.nih.gov/pubmed/29950002 http://dx.doi.org/10.1093/bioinformatics/bty251 |
_version_ | 1783335699413467136 |
---|---|
author | Xiao, Jinfeng Blatti, Charles Sinha, Saurabh |
author_facet | Xiao, Jinfeng Blatti, Charles Sinha, Saurabh |
author_sort | Xiao, Jinfeng |
collection | PubMed |
description | MOTIVATION: Several large-scale efforts have been made to collect gene expression signatures from a variety of biological conditions, such as response of cell lines to treatment with drugs, or tumor samples with different characteristics. These gene signature collections are utilized through bioinformatics tools for ‘signature matching’, whereby a researcher studying an expression profile can identify previously cataloged biological conditions most related to their profile. Signature matching tools typically retrieve from the collection the signature that has highest similarity to the user-provided profile. Alternatively, classification models may be applied where each biological condition in the signature collection is a class label; however, such models are trained on the collection of available signatures and may not generalize to the novel cellular context or cell line of the researcher’s expression profile. RESULTS: We present an advanced multi-way classification algorithm for signature matching, called SigMat, that is trained on a large signature collection from a well-studied cellular context, but can also classify signatures from other cell types by relying on an additional, small collection of signatures representing the target cell type. It uses these ‘tuning data’ to learn two additional parameters that help adapt its predictions for other cellular contexts. SigMat outperforms other similarity scores and classification methods in identifying the correct label of a query expression profile from as many as 244 or 500 candidate classes (drug treatments) cataloged by the LINCS L1000 project. SigMat retains its high accuracy in cross-cell line applications even when the amount of tuning data is severely limited. AVAILABILITY AND IMPLEMENTATION: SigMat is available on GitHub at https://github.com/JinfengXiao/SigMat. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-6022536 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-60225362018-07-10 SigMat: a classification scheme for gene signature matching Xiao, Jinfeng Blatti, Charles Sinha, Saurabh Bioinformatics Ismb 2018–Intelligent Systems for Molecular Biology Proceedings MOTIVATION: Several large-scale efforts have been made to collect gene expression signatures from a variety of biological conditions, such as response of cell lines to treatment with drugs, or tumor samples with different characteristics. These gene signature collections are utilized through bioinformatics tools for ‘signature matching’, whereby a researcher studying an expression profile can identify previously cataloged biological conditions most related to their profile. Signature matching tools typically retrieve from the collection the signature that has highest similarity to the user-provided profile. Alternatively, classification models may be applied where each biological condition in the signature collection is a class label; however, such models are trained on the collection of available signatures and may not generalize to the novel cellular context or cell line of the researcher’s expression profile. RESULTS: We present an advanced multi-way classification algorithm for signature matching, called SigMat, that is trained on a large signature collection from a well-studied cellular context, but can also classify signatures from other cell types by relying on an additional, small collection of signatures representing the target cell type. It uses these ‘tuning data’ to learn two additional parameters that help adapt its predictions for other cellular contexts. SigMat outperforms other similarity scores and classification methods in identifying the correct label of a query expression profile from as many as 244 or 500 candidate classes (drug treatments) cataloged by the LINCS L1000 project. SigMat retains its high accuracy in cross-cell line applications even when the amount of tuning data is severely limited. AVAILABILITY AND IMPLEMENTATION: SigMat is available on GitHub at https://github.com/JinfengXiao/SigMat. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-07-01 2018-06-27 /pmc/articles/PMC6022536/ /pubmed/29950002 http://dx.doi.org/10.1093/bioinformatics/bty251 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Ismb 2018–Intelligent Systems for Molecular Biology Proceedings Xiao, Jinfeng Blatti, Charles Sinha, Saurabh SigMat: a classification scheme for gene signature matching |
title | SigMat: a classification scheme for gene signature matching |
title_full | SigMat: a classification scheme for gene signature matching |
title_fullStr | SigMat: a classification scheme for gene signature matching |
title_full_unstemmed | SigMat: a classification scheme for gene signature matching |
title_short | SigMat: a classification scheme for gene signature matching |
title_sort | sigmat: a classification scheme for gene signature matching |
topic | Ismb 2018–Intelligent Systems for Molecular Biology Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022536/ https://www.ncbi.nlm.nih.gov/pubmed/29950002 http://dx.doi.org/10.1093/bioinformatics/bty251 |
work_keys_str_mv | AT xiaojinfeng sigmataclassificationschemeforgenesignaturematching AT blatticharles sigmataclassificationschemeforgenesignaturematching AT sinhasaurabh sigmataclassificationschemeforgenesignaturematching |