Cargando…

Multivariate models from RNA-Seq SNVs yield candidate molecular targets for biomarker discovery: SNV-DA

BACKGROUND: It has recently been shown that significant and accurate single nucleotide variants (SNVs) can be reliably called from RNA-Seq data. These may provide another source of features for multivariate predictive modeling of disease phenotype for the prioritization of candidate biomarkers. The...

Descripción completa

Detalles Bibliográficos
Autores principales:	Paul, Matt R., Levitt, Nicholas P., Moore, David E., Watson, Patricia M., Wilson, Robert C., Denlinger, Chadrick E., Watson, Dennis K., Anderson, Paul E.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4815211/ https://www.ncbi.nlm.nih.gov/pubmed/27029813 http://dx.doi.org/10.1186/s12864-016-2542-4

_version_	1782424563616841728
author	Paul, Matt R. Levitt, Nicholas P. Moore, David E. Watson, Patricia M. Wilson, Robert C. Denlinger, Chadrick E. Watson, Dennis K. Anderson, Paul E.
author_facet	Paul, Matt R. Levitt, Nicholas P. Moore, David E. Watson, Patricia M. Wilson, Robert C. Denlinger, Chadrick E. Watson, Dennis K. Anderson, Paul E.
author_sort	Paul, Matt R.
collection	PubMed
description	BACKGROUND: It has recently been shown that significant and accurate single nucleotide variants (SNVs) can be reliably called from RNA-Seq data. These may provide another source of features for multivariate predictive modeling of disease phenotype for the prioritization of candidate biomarkers. The continuous nature of SNV allele fraction features allows the concurrent investigation of several genomic phenomena, including allele specific expression, clonal expansion and/or deletion, and copy number variation. RESULTS: The proposed software pipeline and package, SNV Discriminant Analysis (SNV-DA), was applied on two RNA-Seq datasets with varying sample sizes sequenced at different depths: a dataset containing primary tumors from twenty patients with different disease outcomes in lung adenocarcinoma and a larger dataset of primary tumors representing two major breast cancer subtypes, estrogen receptor positive and triple negative. Predictive models were generated using the machine learning algorithm, sparse projections to latent structures discriminant analysis. Training sets composed of RNA-Seq SNV features limited to genomic regions of origin (e.g. exonic or intronic) and/or RNA-editing sites were shown to produce models with accurate predictive performances, were discriminant towards true label groupings, and were able to produce SNV rankings significantly different from than univariate tests. Furthermore, the utility of the proposed methodology is supported by its comparable performance to traditional models as well as the enrichment of selected SNVs located in genes previously associated with cancer and genes showing allele-specific expression. As proof of concept, we highlight the discovery of a previously unannotated intergenic locus that is associated with epigenetic regulatory marks in cancer and whose significant allele-specific expression is correlated with ER+ status; hereafter named ER+ associated hotspot (ERPAHS). CONCLUSION: The use of models from RNA-Seq SNVs to identify and prioritize candidate molecular targets for biomarker discovery is supported by the ability of the proposed method to produce significantly accurate predictive models that are discriminant towards true label groupings. Importantly, the proposed methodology allows investigation of mutations outside of exonic regions and identification of interesting expressed loci not included in traditional gene annotations. An implementation of the proposed methodology is provided that allows the user to specify SNV filtering criteria and cross-validation design during model creation and evaluation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2542-4) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4815211
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-48152112016-04-01 Multivariate models from RNA-Seq SNVs yield candidate molecular targets for biomarker discovery: SNV-DA Paul, Matt R. Levitt, Nicholas P. Moore, David E. Watson, Patricia M. Wilson, Robert C. Denlinger, Chadrick E. Watson, Dennis K. Anderson, Paul E. BMC Genomics Research Article BACKGROUND: It has recently been shown that significant and accurate single nucleotide variants (SNVs) can be reliably called from RNA-Seq data. These may provide another source of features for multivariate predictive modeling of disease phenotype for the prioritization of candidate biomarkers. The continuous nature of SNV allele fraction features allows the concurrent investigation of several genomic phenomena, including allele specific expression, clonal expansion and/or deletion, and copy number variation. RESULTS: The proposed software pipeline and package, SNV Discriminant Analysis (SNV-DA), was applied on two RNA-Seq datasets with varying sample sizes sequenced at different depths: a dataset containing primary tumors from twenty patients with different disease outcomes in lung adenocarcinoma and a larger dataset of primary tumors representing two major breast cancer subtypes, estrogen receptor positive and triple negative. Predictive models were generated using the machine learning algorithm, sparse projections to latent structures discriminant analysis. Training sets composed of RNA-Seq SNV features limited to genomic regions of origin (e.g. exonic or intronic) and/or RNA-editing sites were shown to produce models with accurate predictive performances, were discriminant towards true label groupings, and were able to produce SNV rankings significantly different from than univariate tests. Furthermore, the utility of the proposed methodology is supported by its comparable performance to traditional models as well as the enrichment of selected SNVs located in genes previously associated with cancer and genes showing allele-specific expression. As proof of concept, we highlight the discovery of a previously unannotated intergenic locus that is associated with epigenetic regulatory marks in cancer and whose significant allele-specific expression is correlated with ER+ status; hereafter named ER+ associated hotspot (ERPAHS). CONCLUSION: The use of models from RNA-Seq SNVs to identify and prioritize candidate molecular targets for biomarker discovery is supported by the ability of the proposed method to produce significantly accurate predictive models that are discriminant towards true label groupings. Importantly, the proposed methodology allows investigation of mutations outside of exonic regions and identification of interesting expressed loci not included in traditional gene annotations. An implementation of the proposed methodology is provided that allows the user to specify SNV filtering criteria and cross-validation design during model creation and evaluation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2542-4) contains supplementary material, which is available to authorized users. BioMed Central 2016-03-31 /pmc/articles/PMC4815211/ /pubmed/27029813 http://dx.doi.org/10.1186/s12864-016-2542-4 Text en © Paul et al. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Paul, Matt R. Levitt, Nicholas P. Moore, David E. Watson, Patricia M. Wilson, Robert C. Denlinger, Chadrick E. Watson, Dennis K. Anderson, Paul E. Multivariate models from RNA-Seq SNVs yield candidate molecular targets for biomarker discovery: SNV-DA
title	Multivariate models from RNA-Seq SNVs yield candidate molecular targets for biomarker discovery: SNV-DA
title_full	Multivariate models from RNA-Seq SNVs yield candidate molecular targets for biomarker discovery: SNV-DA
title_fullStr	Multivariate models from RNA-Seq SNVs yield candidate molecular targets for biomarker discovery: SNV-DA
title_full_unstemmed	Multivariate models from RNA-Seq SNVs yield candidate molecular targets for biomarker discovery: SNV-DA
title_short	Multivariate models from RNA-Seq SNVs yield candidate molecular targets for biomarker discovery: SNV-DA
title_sort	multivariate models from rna-seq snvs yield candidate molecular targets for biomarker discovery: snv-da
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4815211/ https://www.ncbi.nlm.nih.gov/pubmed/27029813 http://dx.doi.org/10.1186/s12864-016-2542-4
work_keys_str_mv	AT paulmattr multivariatemodelsfromrnaseqsnvsyieldcandidatemoleculartargetsforbiomarkerdiscoverysnvda AT levittnicholasp multivariatemodelsfromrnaseqsnvsyieldcandidatemoleculartargetsforbiomarkerdiscoverysnvda AT mooredavide multivariatemodelsfromrnaseqsnvsyieldcandidatemoleculartargetsforbiomarkerdiscoverysnvda AT watsonpatriciam multivariatemodelsfromrnaseqsnvsyieldcandidatemoleculartargetsforbiomarkerdiscoverysnvda AT wilsonrobertc multivariatemodelsfromrnaseqsnvsyieldcandidatemoleculartargetsforbiomarkerdiscoverysnvda AT denlingerchadricke multivariatemodelsfromrnaseqsnvsyieldcandidatemoleculartargetsforbiomarkerdiscoverysnvda AT watsondennisk multivariatemodelsfromrnaseqsnvsyieldcandidatemoleculartargetsforbiomarkerdiscoverysnvda AT andersonpaule multivariatemodelsfromrnaseqsnvsyieldcandidatemoleculartargetsforbiomarkerdiscoverysnvda

Multivariate models from RNA-Seq SNVs yield candidate molecular targets for biomarker discovery: SNV-DA

Ejemplares similares