Cargando…

Machine Learning Classifiers for Endometriosis Using Transcriptomics and Methylomics Data

Endometriosis is a complex and common gynecological disorder yet a poorly understood disease affecting about 176 million women worldwide and causing significant impact on their quality of life and economic burden. Neither a definitive clinical symptom nor a minimally invasive diagnostic method is av...

Descripción completa

Detalles Bibliográficos
Autores principales: Akter, Sadia, Xu, Dong, Nagel, Susan C., Bromfield, John J., Pelch, Katherine, Wilshire, Gilbert B., Joshi, Trupti
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6737999/
https://www.ncbi.nlm.nih.gov/pubmed/31552087
http://dx.doi.org/10.3389/fgene.2019.00766
_version_ 1783450760368881664
author Akter, Sadia
Xu, Dong
Nagel, Susan C.
Bromfield, John J.
Pelch, Katherine
Wilshire, Gilbert B.
Joshi, Trupti
author_facet Akter, Sadia
Xu, Dong
Nagel, Susan C.
Bromfield, John J.
Pelch, Katherine
Wilshire, Gilbert B.
Joshi, Trupti
author_sort Akter, Sadia
collection PubMed
description Endometriosis is a complex and common gynecological disorder yet a poorly understood disease affecting about 176 million women worldwide and causing significant impact on their quality of life and economic burden. Neither a definitive clinical symptom nor a minimally invasive diagnostic method is available, thus leading to an average of 4 to 11 years of diagnostic latency. Discovery of relevant biological patterns from microarray expression or next generation sequencing (NGS) data has been advanced over the last several decades by applying various machine learning tools. We performed machine learning analysis using 38 RNA-seq and 80 enrichment-based DNA methylation (MBD-seq) datasets. We experimented how well various supervised machine learning methods such as decision tree, partial least squares discriminant analysis (PLSDA), support vector machine, and random forest perform in classifying endometriosis from the control samples trained on both transcriptomics and methylomics data. The assessment was done from two different perspectives for improving classification performances: a) implication of three different normalization techniques and b) implication of differential analysis using the generalized linear model (GLM). Several candidate biomarker genes were identified by multiple machine learning experiments including NOTCH3, SNAPC2, B4GALNT1, SMAP2, DDB2, GTF3C5, and PTOV1 from the transcriptomics data analysis and TRPM6, RASSF2, TNIP2, RP3-522J7.6, FGD3, and MFSD14B from the methylomics data analysis. We concluded that an appropriate machine learning diagnostic pipeline for endometriosis should use TMM normalization for transcriptomics data, and quantile or voom normalization for methylomics data, GLM for feature space reduction and classification performance maximization.
format Online
Article
Text
id pubmed-6737999
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-67379992019-09-24 Machine Learning Classifiers for Endometriosis Using Transcriptomics and Methylomics Data Akter, Sadia Xu, Dong Nagel, Susan C. Bromfield, John J. Pelch, Katherine Wilshire, Gilbert B. Joshi, Trupti Front Genet Genetics Endometriosis is a complex and common gynecological disorder yet a poorly understood disease affecting about 176 million women worldwide and causing significant impact on their quality of life and economic burden. Neither a definitive clinical symptom nor a minimally invasive diagnostic method is available, thus leading to an average of 4 to 11 years of diagnostic latency. Discovery of relevant biological patterns from microarray expression or next generation sequencing (NGS) data has been advanced over the last several decades by applying various machine learning tools. We performed machine learning analysis using 38 RNA-seq and 80 enrichment-based DNA methylation (MBD-seq) datasets. We experimented how well various supervised machine learning methods such as decision tree, partial least squares discriminant analysis (PLSDA), support vector machine, and random forest perform in classifying endometriosis from the control samples trained on both transcriptomics and methylomics data. The assessment was done from two different perspectives for improving classification performances: a) implication of three different normalization techniques and b) implication of differential analysis using the generalized linear model (GLM). Several candidate biomarker genes were identified by multiple machine learning experiments including NOTCH3, SNAPC2, B4GALNT1, SMAP2, DDB2, GTF3C5, and PTOV1 from the transcriptomics data analysis and TRPM6, RASSF2, TNIP2, RP3-522J7.6, FGD3, and MFSD14B from the methylomics data analysis. We concluded that an appropriate machine learning diagnostic pipeline for endometriosis should use TMM normalization for transcriptomics data, and quantile or voom normalization for methylomics data, GLM for feature space reduction and classification performance maximization. Frontiers Media S.A. 2019-09-04 /pmc/articles/PMC6737999/ /pubmed/31552087 http://dx.doi.org/10.3389/fgene.2019.00766 Text en Copyright © 2019 Akter, Xu, Nagel, Bromfield, Pelch, Wilshire and Joshi http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Akter, Sadia
Xu, Dong
Nagel, Susan C.
Bromfield, John J.
Pelch, Katherine
Wilshire, Gilbert B.
Joshi, Trupti
Machine Learning Classifiers for Endometriosis Using Transcriptomics and Methylomics Data
title Machine Learning Classifiers for Endometriosis Using Transcriptomics and Methylomics Data
title_full Machine Learning Classifiers for Endometriosis Using Transcriptomics and Methylomics Data
title_fullStr Machine Learning Classifiers for Endometriosis Using Transcriptomics and Methylomics Data
title_full_unstemmed Machine Learning Classifiers for Endometriosis Using Transcriptomics and Methylomics Data
title_short Machine Learning Classifiers for Endometriosis Using Transcriptomics and Methylomics Data
title_sort machine learning classifiers for endometriosis using transcriptomics and methylomics data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6737999/
https://www.ncbi.nlm.nih.gov/pubmed/31552087
http://dx.doi.org/10.3389/fgene.2019.00766
work_keys_str_mv AT aktersadia machinelearningclassifiersforendometriosisusingtranscriptomicsandmethylomicsdata
AT xudong machinelearningclassifiersforendometriosisusingtranscriptomicsandmethylomicsdata
AT nagelsusanc machinelearningclassifiersforendometriosisusingtranscriptomicsandmethylomicsdata
AT bromfieldjohnj machinelearningclassifiersforendometriosisusingtranscriptomicsandmethylomicsdata
AT pelchkatherine machinelearningclassifiersforendometriosisusingtranscriptomicsandmethylomicsdata
AT wilshiregilbertb machinelearningclassifiersforendometriosisusingtranscriptomicsandmethylomicsdata
AT joshitrupti machinelearningclassifiersforendometriosisusingtranscriptomicsandmethylomicsdata