Cargando…

Complementary feature selection from alternative splicing events and gene expression for phenotype prediction

MOTIVATION: A central task of bioinformatics is to develop sensitive and specific means of providing medical prognoses from biomarker patterns. Common methods to predict phenotypes in RNA-Seq datasets utilize machine learning algorithms trained via gene expression. Isoforms, however, generated from...

Descripción completa

Detalles Bibliográficos
Autores principales: Labuzzetta, Charles J, Antonio, Margaret L, Watson, Patricia M, Wilson, Robert C, Laboissonniere, Lauren A, Trimarchi, Jeffrey M, Genc, Baris, Ozdinler, P Hande, Watson, Dennis K, Anderson, Paul E
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6276944/
https://www.ncbi.nlm.nih.gov/pubmed/27587658
http://dx.doi.org/10.1093/bioinformatics/btw430
_version_ 1783378086861996032
author Labuzzetta, Charles J
Antonio, Margaret L
Watson, Patricia M
Wilson, Robert C
Laboissonniere, Lauren A
Trimarchi, Jeffrey M
Genc, Baris
Ozdinler, P Hande
Watson, Dennis K
Anderson, Paul E
author_facet Labuzzetta, Charles J
Antonio, Margaret L
Watson, Patricia M
Wilson, Robert C
Laboissonniere, Lauren A
Trimarchi, Jeffrey M
Genc, Baris
Ozdinler, P Hande
Watson, Dennis K
Anderson, Paul E
author_sort Labuzzetta, Charles J
collection PubMed
description MOTIVATION: A central task of bioinformatics is to develop sensitive and specific means of providing medical prognoses from biomarker patterns. Common methods to predict phenotypes in RNA-Seq datasets utilize machine learning algorithms trained via gene expression. Isoforms, however, generated from alternative splicing, may provide a novel and complementary set of transcripts for phenotype prediction. In contrast to gene expression, the number of isoforms increases significantly due to numerous alternative splicing patterns, resulting in a prioritization problem for many machine learning algorithms. This study identifies the empirically optimal methods of transcript quantification, feature engineering and filtering steps using phenotype prediction accuracy as a metric. At the same time, the complementary nature of gene and isoform data is analyzed and the feasibility of identifying isoforms as biomarker candidates is examined. RESULTS: Isoform features are complementary to gene features, providing non-redundant information and enhanced predictive power when prioritized and filtered. A univariate filtering algorithm, which selects up to the N highest ranking features for phenotype prediction is described and evaluated in this study. An empirical comparison of pipelines for isoform quantification is reported by performing cross-validation prediction tests with datasets from human non-small cell lung cancer (NSCLC) patients, human patients with chronic obstructive pulmonary disease (COPD) and amyotrophic lateral sclerosis (ALS) transgenic mice, each including samples of diseased and non-diseased phenotypes. AVAILABILITY AND IMPLEMENTATION: https://github.com/clabuzze/Phenotype-Prediction-Pipeline.git CONTACT: clabuzze@iastate.edu, antoniom@bc.edu, watsondk@musc.edu, andersonpe2@cofc.edu
format Online
Article
Text
id pubmed-6276944
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-62769442018-12-11 Complementary feature selection from alternative splicing events and gene expression for phenotype prediction Labuzzetta, Charles J Antonio, Margaret L Watson, Patricia M Wilson, Robert C Laboissonniere, Lauren A Trimarchi, Jeffrey M Genc, Baris Ozdinler, P Hande Watson, Dennis K Anderson, Paul E Bioinformatics ECCB 2016: The 15th European Conference on Computational Biology MOTIVATION: A central task of bioinformatics is to develop sensitive and specific means of providing medical prognoses from biomarker patterns. Common methods to predict phenotypes in RNA-Seq datasets utilize machine learning algorithms trained via gene expression. Isoforms, however, generated from alternative splicing, may provide a novel and complementary set of transcripts for phenotype prediction. In contrast to gene expression, the number of isoforms increases significantly due to numerous alternative splicing patterns, resulting in a prioritization problem for many machine learning algorithms. This study identifies the empirically optimal methods of transcript quantification, feature engineering and filtering steps using phenotype prediction accuracy as a metric. At the same time, the complementary nature of gene and isoform data is analyzed and the feasibility of identifying isoforms as biomarker candidates is examined. RESULTS: Isoform features are complementary to gene features, providing non-redundant information and enhanced predictive power when prioritized and filtered. A univariate filtering algorithm, which selects up to the N highest ranking features for phenotype prediction is described and evaluated in this study. An empirical comparison of pipelines for isoform quantification is reported by performing cross-validation prediction tests with datasets from human non-small cell lung cancer (NSCLC) patients, human patients with chronic obstructive pulmonary disease (COPD) and amyotrophic lateral sclerosis (ALS) transgenic mice, each including samples of diseased and non-diseased phenotypes. AVAILABILITY AND IMPLEMENTATION: https://github.com/clabuzze/Phenotype-Prediction-Pipeline.git CONTACT: clabuzze@iastate.edu, antoniom@bc.edu, watsondk@musc.edu, andersonpe2@cofc.edu Oxford University Press 2016-09-01 2016-08-29 /pmc/articles/PMC6276944/ /pubmed/27587658 http://dx.doi.org/10.1093/bioinformatics/btw430 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle ECCB 2016: The 15th European Conference on Computational Biology
Labuzzetta, Charles J
Antonio, Margaret L
Watson, Patricia M
Wilson, Robert C
Laboissonniere, Lauren A
Trimarchi, Jeffrey M
Genc, Baris
Ozdinler, P Hande
Watson, Dennis K
Anderson, Paul E
Complementary feature selection from alternative splicing events and gene expression for phenotype prediction
title Complementary feature selection from alternative splicing events and gene expression for phenotype prediction
title_full Complementary feature selection from alternative splicing events and gene expression for phenotype prediction
title_fullStr Complementary feature selection from alternative splicing events and gene expression for phenotype prediction
title_full_unstemmed Complementary feature selection from alternative splicing events and gene expression for phenotype prediction
title_short Complementary feature selection from alternative splicing events and gene expression for phenotype prediction
title_sort complementary feature selection from alternative splicing events and gene expression for phenotype prediction
topic ECCB 2016: The 15th European Conference on Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6276944/
https://www.ncbi.nlm.nih.gov/pubmed/27587658
http://dx.doi.org/10.1093/bioinformatics/btw430
work_keys_str_mv AT labuzzettacharlesj complementaryfeatureselectionfromalternativesplicingeventsandgeneexpressionforphenotypeprediction
AT antoniomargaretl complementaryfeatureselectionfromalternativesplicingeventsandgeneexpressionforphenotypeprediction
AT watsonpatriciam complementaryfeatureselectionfromalternativesplicingeventsandgeneexpressionforphenotypeprediction
AT wilsonrobertc complementaryfeatureselectionfromalternativesplicingeventsandgeneexpressionforphenotypeprediction
AT laboissonnierelaurena complementaryfeatureselectionfromalternativesplicingeventsandgeneexpressionforphenotypeprediction
AT trimarchijeffreym complementaryfeatureselectionfromalternativesplicingeventsandgeneexpressionforphenotypeprediction
AT gencbaris complementaryfeatureselectionfromalternativesplicingeventsandgeneexpressionforphenotypeprediction
AT ozdinlerphande complementaryfeatureselectionfromalternativesplicingeventsandgeneexpressionforphenotypeprediction
AT watsondennisk complementaryfeatureselectionfromalternativesplicingeventsandgeneexpressionforphenotypeprediction
AT andersonpaule complementaryfeatureselectionfromalternativesplicingeventsandgeneexpressionforphenotypeprediction