Cargando…

Machine learning for RNA sequencing-based intrinsic subtyping of breast cancer

Stratification of breast cancer (BC) into molecular subtypes by multigene expression assays is of demonstrated clinical utility. In principle, global RNA-sequencing (RNA-seq) should enable reconstructing existing transcriptional classifications of BC samples. Yet, it is not clear whether adaptation...

Descripción completa

Detalles Bibliográficos
Autores principales: Cascianelli, Silvia, Molineris, Ivan, Isella, Claudio, Masseroli, Marco, Medico, Enzo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7442834/
https://www.ncbi.nlm.nih.gov/pubmed/32826944
http://dx.doi.org/10.1038/s41598-020-70832-2
_version_ 1783573514754719744
author Cascianelli, Silvia
Molineris, Ivan
Isella, Claudio
Masseroli, Marco
Medico, Enzo
author_facet Cascianelli, Silvia
Molineris, Ivan
Isella, Claudio
Masseroli, Marco
Medico, Enzo
author_sort Cascianelli, Silvia
collection PubMed
description Stratification of breast cancer (BC) into molecular subtypes by multigene expression assays is of demonstrated clinical utility. In principle, global RNA-sequencing (RNA-seq) should enable reconstructing existing transcriptional classifications of BC samples. Yet, it is not clear whether adaptation to RNA-seq of classifiers originally developed using PCR or microarrays, or reconstruction through machine learning (ML) is preferable. Hence, we focused on robustness and portability of PAM50, a nearest-centroid classifier developed on microarray data to identify five BC “intrinsic subtypes”. We found that standard PAM50 is profoundly affected by the composition of the sample cohort used for reference construction, and we propose a strategy, named AWCA, to mitigate this issue, improving classification robustness, with over 90% of concordance, and prognostic ability; we also show that AWCA-based PAM50 can even be applied as single-sample method. Furthermore, we explored five supervised learners to build robust, single-sample intrinsic subtype callers via RNA-seq. From our ML-based survey, regularized multiclass logistic regression (mLR) displayed the best performance, further increased by ad-hoc gene selection on the global transcriptome. On external test sets, mLR classifications reached 90% concordance with PAM50-based calls, without need of reference sample; mLR proven robustness and prognostic ability make it an equally valuable single-sample method to strengthen BC subtyping.
format Online
Article
Text
id pubmed-7442834
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-74428342020-08-26 Machine learning for RNA sequencing-based intrinsic subtyping of breast cancer Cascianelli, Silvia Molineris, Ivan Isella, Claudio Masseroli, Marco Medico, Enzo Sci Rep Article Stratification of breast cancer (BC) into molecular subtypes by multigene expression assays is of demonstrated clinical utility. In principle, global RNA-sequencing (RNA-seq) should enable reconstructing existing transcriptional classifications of BC samples. Yet, it is not clear whether adaptation to RNA-seq of classifiers originally developed using PCR or microarrays, or reconstruction through machine learning (ML) is preferable. Hence, we focused on robustness and portability of PAM50, a nearest-centroid classifier developed on microarray data to identify five BC “intrinsic subtypes”. We found that standard PAM50 is profoundly affected by the composition of the sample cohort used for reference construction, and we propose a strategy, named AWCA, to mitigate this issue, improving classification robustness, with over 90% of concordance, and prognostic ability; we also show that AWCA-based PAM50 can even be applied as single-sample method. Furthermore, we explored five supervised learners to build robust, single-sample intrinsic subtype callers via RNA-seq. From our ML-based survey, regularized multiclass logistic regression (mLR) displayed the best performance, further increased by ad-hoc gene selection on the global transcriptome. On external test sets, mLR classifications reached 90% concordance with PAM50-based calls, without need of reference sample; mLR proven robustness and prognostic ability make it an equally valuable single-sample method to strengthen BC subtyping. Nature Publishing Group UK 2020-08-21 /pmc/articles/PMC7442834/ /pubmed/32826944 http://dx.doi.org/10.1038/s41598-020-70832-2 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Cascianelli, Silvia
Molineris, Ivan
Isella, Claudio
Masseroli, Marco
Medico, Enzo
Machine learning for RNA sequencing-based intrinsic subtyping of breast cancer
title Machine learning for RNA sequencing-based intrinsic subtyping of breast cancer
title_full Machine learning for RNA sequencing-based intrinsic subtyping of breast cancer
title_fullStr Machine learning for RNA sequencing-based intrinsic subtyping of breast cancer
title_full_unstemmed Machine learning for RNA sequencing-based intrinsic subtyping of breast cancer
title_short Machine learning for RNA sequencing-based intrinsic subtyping of breast cancer
title_sort machine learning for rna sequencing-based intrinsic subtyping of breast cancer
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7442834/
https://www.ncbi.nlm.nih.gov/pubmed/32826944
http://dx.doi.org/10.1038/s41598-020-70832-2
work_keys_str_mv AT cascianellisilvia machinelearningforrnasequencingbasedintrinsicsubtypingofbreastcancer
AT molinerisivan machinelearningforrnasequencingbasedintrinsicsubtypingofbreastcancer
AT isellaclaudio machinelearningforrnasequencingbasedintrinsicsubtypingofbreastcancer
AT masserolimarco machinelearningforrnasequencingbasedintrinsicsubtypingofbreastcancer
AT medicoenzo machinelearningforrnasequencingbasedintrinsicsubtypingofbreastcancer