Cargando…

Complementary feature selection from alternative splicing events and gene expression for phenotype prediction

MOTIVATION: A central task of bioinformatics is to develop sensitive and specific means of providing medical prognoses from biomarker patterns. Common methods to predict phenotypes in RNA-Seq datasets utilize machine learning algorithms trained via gene expression. Isoforms, however, generated from...

Descripción completa

Detalles Bibliográficos
Autores principales:	Labuzzetta, Charles J, Antonio, Margaret L, Watson, Patricia M, Wilson, Robert C, Laboissonniere, Lauren A, Trimarchi, Jeffrey M, Genc, Baris, Ozdinler, P Hande, Watson, Dennis K, Anderson, Paul E
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2016
Materias:	ECCB 2016: The 15th European Conference on Computational Biology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6276944/ https://www.ncbi.nlm.nih.gov/pubmed/27587658 http://dx.doi.org/10.1093/bioinformatics/btw430

Descripción
Sumario:	MOTIVATION: A central task of bioinformatics is to develop sensitive and specific means of providing medical prognoses from biomarker patterns. Common methods to predict phenotypes in RNA-Seq datasets utilize machine learning algorithms trained via gene expression. Isoforms, however, generated from alternative splicing, may provide a novel and complementary set of transcripts for phenotype prediction. In contrast to gene expression, the number of isoforms increases significantly due to numerous alternative splicing patterns, resulting in a prioritization problem for many machine learning algorithms. This study identifies the empirically optimal methods of transcript quantification, feature engineering and filtering steps using phenotype prediction accuracy as a metric. At the same time, the complementary nature of gene and isoform data is analyzed and the feasibility of identifying isoforms as biomarker candidates is examined. RESULTS: Isoform features are complementary to gene features, providing non-redundant information and enhanced predictive power when prioritized and filtered. A univariate filtering algorithm, which selects up to the N highest ranking features for phenotype prediction is described and evaluated in this study. An empirical comparison of pipelines for isoform quantification is reported by performing cross-validation prediction tests with datasets from human non-small cell lung cancer (NSCLC) patients, human patients with chronic obstructive pulmonary disease (COPD) and amyotrophic lateral sclerosis (ALS) transgenic mice, each including samples of diseased and non-diseased phenotypes. AVAILABILITY AND IMPLEMENTATION: https://github.com/clabuzze/Phenotype-Prediction-Pipeline.git CONTACT: clabuzze@iastate.edu, antoniom@bc.edu, watsondk@musc.edu, andersonpe2@cofc.edu

Complementary feature selection from alternative splicing events and gene expression for phenotype prediction

Ejemplares similares