Cargando…
Complementary feature selection from alternative splicing events and gene expression for phenotype prediction
MOTIVATION: A central task of bioinformatics is to develop sensitive and specific means of providing medical prognoses from biomarker patterns. Common methods to predict phenotypes in RNA-Seq datasets utilize machine learning algorithms trained via gene expression. Isoforms, however, generated from...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6276944/ https://www.ncbi.nlm.nih.gov/pubmed/27587658 http://dx.doi.org/10.1093/bioinformatics/btw430 |
_version_ | 1783378086861996032 |
---|---|
author | Labuzzetta, Charles J Antonio, Margaret L Watson, Patricia M Wilson, Robert C Laboissonniere, Lauren A Trimarchi, Jeffrey M Genc, Baris Ozdinler, P Hande Watson, Dennis K Anderson, Paul E |
author_facet | Labuzzetta, Charles J Antonio, Margaret L Watson, Patricia M Wilson, Robert C Laboissonniere, Lauren A Trimarchi, Jeffrey M Genc, Baris Ozdinler, P Hande Watson, Dennis K Anderson, Paul E |
author_sort | Labuzzetta, Charles J |
collection | PubMed |
description | MOTIVATION: A central task of bioinformatics is to develop sensitive and specific means of providing medical prognoses from biomarker patterns. Common methods to predict phenotypes in RNA-Seq datasets utilize machine learning algorithms trained via gene expression. Isoforms, however, generated from alternative splicing, may provide a novel and complementary set of transcripts for phenotype prediction. In contrast to gene expression, the number of isoforms increases significantly due to numerous alternative splicing patterns, resulting in a prioritization problem for many machine learning algorithms. This study identifies the empirically optimal methods of transcript quantification, feature engineering and filtering steps using phenotype prediction accuracy as a metric. At the same time, the complementary nature of gene and isoform data is analyzed and the feasibility of identifying isoforms as biomarker candidates is examined. RESULTS: Isoform features are complementary to gene features, providing non-redundant information and enhanced predictive power when prioritized and filtered. A univariate filtering algorithm, which selects up to the N highest ranking features for phenotype prediction is described and evaluated in this study. An empirical comparison of pipelines for isoform quantification is reported by performing cross-validation prediction tests with datasets from human non-small cell lung cancer (NSCLC) patients, human patients with chronic obstructive pulmonary disease (COPD) and amyotrophic lateral sclerosis (ALS) transgenic mice, each including samples of diseased and non-diseased phenotypes. AVAILABILITY AND IMPLEMENTATION: https://github.com/clabuzze/Phenotype-Prediction-Pipeline.git CONTACT: clabuzze@iastate.edu, antoniom@bc.edu, watsondk@musc.edu, andersonpe2@cofc.edu |
format | Online Article Text |
id | pubmed-6276944 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-62769442018-12-11 Complementary feature selection from alternative splicing events and gene expression for phenotype prediction Labuzzetta, Charles J Antonio, Margaret L Watson, Patricia M Wilson, Robert C Laboissonniere, Lauren A Trimarchi, Jeffrey M Genc, Baris Ozdinler, P Hande Watson, Dennis K Anderson, Paul E Bioinformatics ECCB 2016: The 15th European Conference on Computational Biology MOTIVATION: A central task of bioinformatics is to develop sensitive and specific means of providing medical prognoses from biomarker patterns. Common methods to predict phenotypes in RNA-Seq datasets utilize machine learning algorithms trained via gene expression. Isoforms, however, generated from alternative splicing, may provide a novel and complementary set of transcripts for phenotype prediction. In contrast to gene expression, the number of isoforms increases significantly due to numerous alternative splicing patterns, resulting in a prioritization problem for many machine learning algorithms. This study identifies the empirically optimal methods of transcript quantification, feature engineering and filtering steps using phenotype prediction accuracy as a metric. At the same time, the complementary nature of gene and isoform data is analyzed and the feasibility of identifying isoforms as biomarker candidates is examined. RESULTS: Isoform features are complementary to gene features, providing non-redundant information and enhanced predictive power when prioritized and filtered. A univariate filtering algorithm, which selects up to the N highest ranking features for phenotype prediction is described and evaluated in this study. An empirical comparison of pipelines for isoform quantification is reported by performing cross-validation prediction tests with datasets from human non-small cell lung cancer (NSCLC) patients, human patients with chronic obstructive pulmonary disease (COPD) and amyotrophic lateral sclerosis (ALS) transgenic mice, each including samples of diseased and non-diseased phenotypes. AVAILABILITY AND IMPLEMENTATION: https://github.com/clabuzze/Phenotype-Prediction-Pipeline.git CONTACT: clabuzze@iastate.edu, antoniom@bc.edu, watsondk@musc.edu, andersonpe2@cofc.edu Oxford University Press 2016-09-01 2016-08-29 /pmc/articles/PMC6276944/ /pubmed/27587658 http://dx.doi.org/10.1093/bioinformatics/btw430 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | ECCB 2016: The 15th European Conference on Computational Biology Labuzzetta, Charles J Antonio, Margaret L Watson, Patricia M Wilson, Robert C Laboissonniere, Lauren A Trimarchi, Jeffrey M Genc, Baris Ozdinler, P Hande Watson, Dennis K Anderson, Paul E Complementary feature selection from alternative splicing events and gene expression for phenotype prediction |
title | Complementary feature selection from alternative splicing events and gene
expression for phenotype prediction |
title_full | Complementary feature selection from alternative splicing events and gene
expression for phenotype prediction |
title_fullStr | Complementary feature selection from alternative splicing events and gene
expression for phenotype prediction |
title_full_unstemmed | Complementary feature selection from alternative splicing events and gene
expression for phenotype prediction |
title_short | Complementary feature selection from alternative splicing events and gene
expression for phenotype prediction |
title_sort | complementary feature selection from alternative splicing events and gene
expression for phenotype prediction |
topic | ECCB 2016: The 15th European Conference on Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6276944/ https://www.ncbi.nlm.nih.gov/pubmed/27587658 http://dx.doi.org/10.1093/bioinformatics/btw430 |
work_keys_str_mv | AT labuzzettacharlesj complementaryfeatureselectionfromalternativesplicingeventsandgeneexpressionforphenotypeprediction AT antoniomargaretl complementaryfeatureselectionfromalternativesplicingeventsandgeneexpressionforphenotypeprediction AT watsonpatriciam complementaryfeatureselectionfromalternativesplicingeventsandgeneexpressionforphenotypeprediction AT wilsonrobertc complementaryfeatureselectionfromalternativesplicingeventsandgeneexpressionforphenotypeprediction AT laboissonnierelaurena complementaryfeatureselectionfromalternativesplicingeventsandgeneexpressionforphenotypeprediction AT trimarchijeffreym complementaryfeatureselectionfromalternativesplicingeventsandgeneexpressionforphenotypeprediction AT gencbaris complementaryfeatureselectionfromalternativesplicingeventsandgeneexpressionforphenotypeprediction AT ozdinlerphande complementaryfeatureselectionfromalternativesplicingeventsandgeneexpressionforphenotypeprediction AT watsondennisk complementaryfeatureselectionfromalternativesplicingeventsandgeneexpressionforphenotypeprediction AT andersonpaule complementaryfeatureselectionfromalternativesplicingeventsandgeneexpressionforphenotypeprediction |