Cargando…

Identification of usual interstitial pneumonia pattern using RNA-Seq and machine learning: challenges and solutions

BACKGROUND: We developed a classifier using RNA sequencing data that identifies the usual interstitial pneumonia (UIP) pattern for the diagnosis of idiopathic pulmonary fibrosis. We addressed significant challenges, including limited sample size, biological and technical sample heterogeneity, and re...

Descripción completa

Detalles Bibliográficos
Autores principales: Choi, Yoonha, Liu, Tiffany Ting, Pankratz, Daniel G., Colby, Thomas V., Barth, Neil M., Lynch, David A., Walsh, P. Sean, Raghu, Ganesh, Kennedy, Giulia C., Huang, Jing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5954282/
https://www.ncbi.nlm.nih.gov/pubmed/29764379
http://dx.doi.org/10.1186/s12864-018-4467-6
_version_ 1783323490082881536
author Choi, Yoonha
Liu, Tiffany Ting
Pankratz, Daniel G.
Colby, Thomas V.
Barth, Neil M.
Lynch, David A.
Walsh, P. Sean
Raghu, Ganesh
Kennedy, Giulia C.
Huang, Jing
author_facet Choi, Yoonha
Liu, Tiffany Ting
Pankratz, Daniel G.
Colby, Thomas V.
Barth, Neil M.
Lynch, David A.
Walsh, P. Sean
Raghu, Ganesh
Kennedy, Giulia C.
Huang, Jing
author_sort Choi, Yoonha
collection PubMed
description BACKGROUND: We developed a classifier using RNA sequencing data that identifies the usual interstitial pneumonia (UIP) pattern for the diagnosis of idiopathic pulmonary fibrosis. We addressed significant challenges, including limited sample size, biological and technical sample heterogeneity, and reagent and assay batch effects. RESULTS: We identified inter- and intra-patient heterogeneity, particularly within the non-UIP group. The models classified UIP on transbronchial biopsy samples with a receiver-operating characteristic area under the curve of ~ 0.9 in cross-validation. Using in silico mixed samples in training, we prospectively defined a decision boundary to optimize specificity at ≥85%. The penalized logistic regression model showed greater reproducibility across technical replicates and was chosen as the final model. The final model showed sensitivity of 70% and specificity of 88% in the test set. CONCLUSIONS: We demonstrated that the suggested methodologies appropriately addressed challenges of the sample size, disease heterogeneity and technical batch effects and developed a highly accurate and robust classifier leveraging RNA sequencing for the classification of UIP. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-4467-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5954282
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-59542822018-05-21 Identification of usual interstitial pneumonia pattern using RNA-Seq and machine learning: challenges and solutions Choi, Yoonha Liu, Tiffany Ting Pankratz, Daniel G. Colby, Thomas V. Barth, Neil M. Lynch, David A. Walsh, P. Sean Raghu, Ganesh Kennedy, Giulia C. Huang, Jing BMC Genomics Research BACKGROUND: We developed a classifier using RNA sequencing data that identifies the usual interstitial pneumonia (UIP) pattern for the diagnosis of idiopathic pulmonary fibrosis. We addressed significant challenges, including limited sample size, biological and technical sample heterogeneity, and reagent and assay batch effects. RESULTS: We identified inter- and intra-patient heterogeneity, particularly within the non-UIP group. The models classified UIP on transbronchial biopsy samples with a receiver-operating characteristic area under the curve of ~ 0.9 in cross-validation. Using in silico mixed samples in training, we prospectively defined a decision boundary to optimize specificity at ≥85%. The penalized logistic regression model showed greater reproducibility across technical replicates and was chosen as the final model. The final model showed sensitivity of 70% and specificity of 88% in the test set. CONCLUSIONS: We demonstrated that the suggested methodologies appropriately addressed challenges of the sample size, disease heterogeneity and technical batch effects and developed a highly accurate and robust classifier leveraging RNA sequencing for the classification of UIP. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-4467-6) contains supplementary material, which is available to authorized users. BioMed Central 2018-05-09 /pmc/articles/PMC5954282/ /pubmed/29764379 http://dx.doi.org/10.1186/s12864-018-4467-6 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Choi, Yoonha
Liu, Tiffany Ting
Pankratz, Daniel G.
Colby, Thomas V.
Barth, Neil M.
Lynch, David A.
Walsh, P. Sean
Raghu, Ganesh
Kennedy, Giulia C.
Huang, Jing
Identification of usual interstitial pneumonia pattern using RNA-Seq and machine learning: challenges and solutions
title Identification of usual interstitial pneumonia pattern using RNA-Seq and machine learning: challenges and solutions
title_full Identification of usual interstitial pneumonia pattern using RNA-Seq and machine learning: challenges and solutions
title_fullStr Identification of usual interstitial pneumonia pattern using RNA-Seq and machine learning: challenges and solutions
title_full_unstemmed Identification of usual interstitial pneumonia pattern using RNA-Seq and machine learning: challenges and solutions
title_short Identification of usual interstitial pneumonia pattern using RNA-Seq and machine learning: challenges and solutions
title_sort identification of usual interstitial pneumonia pattern using rna-seq and machine learning: challenges and solutions
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5954282/
https://www.ncbi.nlm.nih.gov/pubmed/29764379
http://dx.doi.org/10.1186/s12864-018-4467-6
work_keys_str_mv AT choiyoonha identificationofusualinterstitialpneumoniapatternusingrnaseqandmachinelearningchallengesandsolutions
AT liutiffanyting identificationofusualinterstitialpneumoniapatternusingrnaseqandmachinelearningchallengesandsolutions
AT pankratzdanielg identificationofusualinterstitialpneumoniapatternusingrnaseqandmachinelearningchallengesandsolutions
AT colbythomasv identificationofusualinterstitialpneumoniapatternusingrnaseqandmachinelearningchallengesandsolutions
AT barthneilm identificationofusualinterstitialpneumoniapatternusingrnaseqandmachinelearningchallengesandsolutions
AT lynchdavida identificationofusualinterstitialpneumoniapatternusingrnaseqandmachinelearningchallengesandsolutions
AT walshpsean identificationofusualinterstitialpneumoniapatternusingrnaseqandmachinelearningchallengesandsolutions
AT raghuganesh identificationofusualinterstitialpneumoniapatternusingrnaseqandmachinelearningchallengesandsolutions
AT kennedygiuliac identificationofusualinterstitialpneumoniapatternusingrnaseqandmachinelearningchallengesandsolutions
AT huangjing identificationofusualinterstitialpneumoniapatternusingrnaseqandmachinelearningchallengesandsolutions