Cargando…

CC-PROMISE effectively integrates two forms of molecular data with multiple biologically related endpoints

BACKGROUND: As new technologies allow investigators to collect multiple forms of molecular data (genomic, epigenomic, transcriptomic, etc) and multiple endpoints on a clinical trial cohort, it will become necessary to effectively integrate all these data in a way that reliably identifies biologicall...

Descripción completa

Detalles Bibliográficos
Autores principales: Cao, Xueyuan, Crews, Kristine R., Downing, James, Lamba, Jatinder, Pounds, Stanley B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5073973/
https://www.ncbi.nlm.nih.gov/pubmed/27766934
http://dx.doi.org/10.1186/s12859-016-1217-0
_version_ 1782461669443633152
author Cao, Xueyuan
Crews, Kristine R.
Downing, James
Lamba, Jatinder
Pounds, Stanley B.
author_facet Cao, Xueyuan
Crews, Kristine R.
Downing, James
Lamba, Jatinder
Pounds, Stanley B.
author_sort Cao, Xueyuan
collection PubMed
description BACKGROUND: As new technologies allow investigators to collect multiple forms of molecular data (genomic, epigenomic, transcriptomic, etc) and multiple endpoints on a clinical trial cohort, it will become necessary to effectively integrate all these data in a way that reliably identifies biologically important genes. METHODS: We introduce CC-PROMISE as an integrated data analysis method that combines components of canonical correlation (CC) and projection onto the most interesting evidence (PROMISE). For each gene, CC-PROMISE first uses CC to compute scores that represent the association of two forms of molecular data with each other. Next, these scores are substituted into PROMISE to evaluate the statistical evidence that the molecular data show a biologically meaningful relationship with the endpoints. RESULTS: CC-PROMISE shows outstanding performance in simulation studies and an example application involving pediatric leukemia. In simulation studies, CC-PROMISE controls the type I error (misleading significance) rate very near the nominal level across 100 distinct null settings in which no molecular-endpoint association exists. Also, CC-PROMISE has better statistical power than three other methods that control type I error in 396 of 400 (99 %) alternative settings for which a molecular-endpoint association is present; the power advantage of CC-PROMISE exceeds 30 % in 127 of the 400 (32 %) alternative settings. These advantages of CC-PROMISE are also observed in an example application. CONCLUSION: CC-PROMISE very effectively identifies genes for which some form of molecular data shows a biologically meaningful association with multiple related endpoints. AVAILABILITY: The R package CCPROMISE is currently available from www.stjuderesearch.org/site/depts/biostats/software. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1217-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5073973
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50739732016-10-27 CC-PROMISE effectively integrates two forms of molecular data with multiple biologically related endpoints Cao, Xueyuan Crews, Kristine R. Downing, James Lamba, Jatinder Pounds, Stanley B. BMC Bioinformatics Research BACKGROUND: As new technologies allow investigators to collect multiple forms of molecular data (genomic, epigenomic, transcriptomic, etc) and multiple endpoints on a clinical trial cohort, it will become necessary to effectively integrate all these data in a way that reliably identifies biologically important genes. METHODS: We introduce CC-PROMISE as an integrated data analysis method that combines components of canonical correlation (CC) and projection onto the most interesting evidence (PROMISE). For each gene, CC-PROMISE first uses CC to compute scores that represent the association of two forms of molecular data with each other. Next, these scores are substituted into PROMISE to evaluate the statistical evidence that the molecular data show a biologically meaningful relationship with the endpoints. RESULTS: CC-PROMISE shows outstanding performance in simulation studies and an example application involving pediatric leukemia. In simulation studies, CC-PROMISE controls the type I error (misleading significance) rate very near the nominal level across 100 distinct null settings in which no molecular-endpoint association exists. Also, CC-PROMISE has better statistical power than three other methods that control type I error in 396 of 400 (99 %) alternative settings for which a molecular-endpoint association is present; the power advantage of CC-PROMISE exceeds 30 % in 127 of the 400 (32 %) alternative settings. These advantages of CC-PROMISE are also observed in an example application. CONCLUSION: CC-PROMISE very effectively identifies genes for which some form of molecular data shows a biologically meaningful association with multiple related endpoints. AVAILABILITY: The R package CCPROMISE is currently available from www.stjuderesearch.org/site/depts/biostats/software. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1217-0) contains supplementary material, which is available to authorized users. BioMed Central 2016-10-06 /pmc/articles/PMC5073973/ /pubmed/27766934 http://dx.doi.org/10.1186/s12859-016-1217-0 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Cao, Xueyuan
Crews, Kristine R.
Downing, James
Lamba, Jatinder
Pounds, Stanley B.
CC-PROMISE effectively integrates two forms of molecular data with multiple biologically related endpoints
title CC-PROMISE effectively integrates two forms of molecular data with multiple biologically related endpoints
title_full CC-PROMISE effectively integrates two forms of molecular data with multiple biologically related endpoints
title_fullStr CC-PROMISE effectively integrates two forms of molecular data with multiple biologically related endpoints
title_full_unstemmed CC-PROMISE effectively integrates two forms of molecular data with multiple biologically related endpoints
title_short CC-PROMISE effectively integrates two forms of molecular data with multiple biologically related endpoints
title_sort cc-promise effectively integrates two forms of molecular data with multiple biologically related endpoints
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5073973/
https://www.ncbi.nlm.nih.gov/pubmed/27766934
http://dx.doi.org/10.1186/s12859-016-1217-0
work_keys_str_mv AT caoxueyuan ccpromiseeffectivelyintegratestwoformsofmoleculardatawithmultiplebiologicallyrelatedendpoints
AT crewskristiner ccpromiseeffectivelyintegratestwoformsofmoleculardatawithmultiplebiologicallyrelatedendpoints
AT downingjames ccpromiseeffectivelyintegratestwoformsofmoleculardatawithmultiplebiologicallyrelatedendpoints
AT lambajatinder ccpromiseeffectivelyintegratestwoformsofmoleculardatawithmultiplebiologicallyrelatedendpoints
AT poundsstanleyb ccpromiseeffectivelyintegratestwoformsofmoleculardatawithmultiplebiologicallyrelatedendpoints