Cargando…

Novel biomarker genes which distinguish between smokers and chronic obstructive pulmonary disease patients with machine learning approach

BACKGROUND: Chronic obstructive pulmonary disease (COPD) is combination of progressive lung diseases. The diagnosis of COPD is generally based on the pulmonary function testing, however, difficulties underlie in prognosis of smokers or early stage of COPD patients due to the complexity and heterogen...

Descripción completa

Detalles Bibliográficos
Autores principales: Matsumura, Kazushi, Ito, Shigeaki
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6998147/
https://www.ncbi.nlm.nih.gov/pubmed/32013930
http://dx.doi.org/10.1186/s12890-020-1062-9
_version_ 1783493807839379456
author Matsumura, Kazushi
Ito, Shigeaki
author_facet Matsumura, Kazushi
Ito, Shigeaki
author_sort Matsumura, Kazushi
collection PubMed
description BACKGROUND: Chronic obstructive pulmonary disease (COPD) is combination of progressive lung diseases. The diagnosis of COPD is generally based on the pulmonary function testing, however, difficulties underlie in prognosis of smokers or early stage of COPD patients due to the complexity and heterogeneity of the pathogenesis. Computational analyses of omics technologies are expected as one of the solutions to resolve such complexities. METHODS: We obtained transcriptomic data by in vitro testing with exposures of human bronchial epithelial cells to the inducers for early events of COPD to identify the potential descriptive marker genes. With the identified genes, the machine learning technique was employed with the publicly available transcriptome data obtained from the lung specimens of COPD and non-COPD patients to develop the model that can reflect the risk continuum across smoking and COPD. RESULTS: The expression levels of 15 genes were commonly altered among in vitro tissues exposed to known inducible factors for earlier events of COPD (exposure to cigarette smoke, DNA damage, oxidative stress, and inflammation), and 10 of these genes and their corresponding proteins have not previously reported as COPD biomarkers. Although these genes were able to predict each group with 65% accuracy, the accuracy with which they were able to discriminate COPD subjects from smokers was only 29%. Furthermore, logistic regression enabled the conversion of gene expression levels to a numerical index, which we named the “potential risk factor (PRF)” index. The highest significant index value was recorded in COPD subjects (0.56 at the median), followed by smokers (0.30) and non-smokers (0.02). In vitro tissues exposed to cigarette smoke displayed dose-dependent increases of PRF, suggesting its utility for prospective risk estimation of tobacco products. CONCLUSIONS: Our experimental-based transcriptomic analysis identified novel genes associated with COPD, and the 15 genes could distinguish smokers and COPD subjects from non-smokers via machine-learning classification with remarkable accuracy. We also suggested a PRF index that can quantitatively reflect the risk continuum across smoking and COPD pathogenesis, and we believe it will provide an improved understanding of smoking effects and new insights into COPD.
format Online
Article
Text
id pubmed-6998147
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69981472020-02-05 Novel biomarker genes which distinguish between smokers and chronic obstructive pulmonary disease patients with machine learning approach Matsumura, Kazushi Ito, Shigeaki BMC Pulm Med Technical Advance BACKGROUND: Chronic obstructive pulmonary disease (COPD) is combination of progressive lung diseases. The diagnosis of COPD is generally based on the pulmonary function testing, however, difficulties underlie in prognosis of smokers or early stage of COPD patients due to the complexity and heterogeneity of the pathogenesis. Computational analyses of omics technologies are expected as one of the solutions to resolve such complexities. METHODS: We obtained transcriptomic data by in vitro testing with exposures of human bronchial epithelial cells to the inducers for early events of COPD to identify the potential descriptive marker genes. With the identified genes, the machine learning technique was employed with the publicly available transcriptome data obtained from the lung specimens of COPD and non-COPD patients to develop the model that can reflect the risk continuum across smoking and COPD. RESULTS: The expression levels of 15 genes were commonly altered among in vitro tissues exposed to known inducible factors for earlier events of COPD (exposure to cigarette smoke, DNA damage, oxidative stress, and inflammation), and 10 of these genes and their corresponding proteins have not previously reported as COPD biomarkers. Although these genes were able to predict each group with 65% accuracy, the accuracy with which they were able to discriminate COPD subjects from smokers was only 29%. Furthermore, logistic regression enabled the conversion of gene expression levels to a numerical index, which we named the “potential risk factor (PRF)” index. The highest significant index value was recorded in COPD subjects (0.56 at the median), followed by smokers (0.30) and non-smokers (0.02). In vitro tissues exposed to cigarette smoke displayed dose-dependent increases of PRF, suggesting its utility for prospective risk estimation of tobacco products. CONCLUSIONS: Our experimental-based transcriptomic analysis identified novel genes associated with COPD, and the 15 genes could distinguish smokers and COPD subjects from non-smokers via machine-learning classification with remarkable accuracy. We also suggested a PRF index that can quantitatively reflect the risk continuum across smoking and COPD pathogenesis, and we believe it will provide an improved understanding of smoking effects and new insights into COPD. BioMed Central 2020-02-03 /pmc/articles/PMC6998147/ /pubmed/32013930 http://dx.doi.org/10.1186/s12890-020-1062-9 Text en © The Author(s). 2020 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Technical Advance
Matsumura, Kazushi
Ito, Shigeaki
Novel biomarker genes which distinguish between smokers and chronic obstructive pulmonary disease patients with machine learning approach
title Novel biomarker genes which distinguish between smokers and chronic obstructive pulmonary disease patients with machine learning approach
title_full Novel biomarker genes which distinguish between smokers and chronic obstructive pulmonary disease patients with machine learning approach
title_fullStr Novel biomarker genes which distinguish between smokers and chronic obstructive pulmonary disease patients with machine learning approach
title_full_unstemmed Novel biomarker genes which distinguish between smokers and chronic obstructive pulmonary disease patients with machine learning approach
title_short Novel biomarker genes which distinguish between smokers and chronic obstructive pulmonary disease patients with machine learning approach
title_sort novel biomarker genes which distinguish between smokers and chronic obstructive pulmonary disease patients with machine learning approach
topic Technical Advance
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6998147/
https://www.ncbi.nlm.nih.gov/pubmed/32013930
http://dx.doi.org/10.1186/s12890-020-1062-9
work_keys_str_mv AT matsumurakazushi novelbiomarkergeneswhichdistinguishbetweensmokersandchronicobstructivepulmonarydiseasepatientswithmachinelearningapproach
AT itoshigeaki novelbiomarkergeneswhichdistinguishbetweensmokersandchronicobstructivepulmonarydiseasepatientswithmachinelearningapproach