Cargando…

Methodological considerations for identifying multiple plasma proteins associated with all-cause mortality in a population-based prospective cohort

Novel methods to characterize the plasma proteome has made it possible to examine a wide range of proteins in large longitudinal cohort studies, but the complexity of the human proteome makes it difficult to identify robust protein-disease associations. Nevertheless, identification of individuals at...

Descripción completa

Detalles Bibliográficos
Autores principales: Drake, Isabel, Hindy, George, Almgren, Peter, Engström, Gunnar, Nilsson, Jan, Melander, Olle, Orho-Melander, Marju
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7990913/
https://www.ncbi.nlm.nih.gov/pubmed/33762603
http://dx.doi.org/10.1038/s41598-021-85991-z
_version_ 1783669148840099840
author Drake, Isabel
Hindy, George
Almgren, Peter
Engström, Gunnar
Nilsson, Jan
Melander, Olle
Orho-Melander, Marju
author_facet Drake, Isabel
Hindy, George
Almgren, Peter
Engström, Gunnar
Nilsson, Jan
Melander, Olle
Orho-Melander, Marju
author_sort Drake, Isabel
collection PubMed
description Novel methods to characterize the plasma proteome has made it possible to examine a wide range of proteins in large longitudinal cohort studies, but the complexity of the human proteome makes it difficult to identify robust protein-disease associations. Nevertheless, identification of individuals at high risk of early mortality is a central issue in clinical decision making and novel biomarkers may be useful to improve risk stratification. With adjustment for established risk factors, we examined the associations between 138 plasma proteins measured using two proximity extension assays and long-term risk of all-cause mortality in 3,918 participants of the population-based Malmö Diet and Cancer Study. To examine the reproducibility of protein-mortality associations we used a two-step random-split approach to simulate a discovery and replication cohort and conducted analyses using four different methods: Cox regression, stepwise Cox regression, Lasso-Cox regression, and random survival forest (RSF). In the total study population, we identified eight proteins that associated with all-cause mortality after adjustment for established risk factors and with Bonferroni correction for multiple testing. In the two-step analyses, the number of proteins selected for model inclusion in both random samples ranged from 6 to 21 depending on the method used. However, only three proteins were consistently included in both samples across all four methods (growth/differentiation factor-15 (GDF-15), N-terminal pro-B-type natriuretic peptide, and epididymal secretory protein E4). Using the total study population, the C-statistic for a model including established risk factors was 0.7222 and increased to 0.7284 with inclusion of the most predictive protein (GDF-15; P < 0.0001). All multiple protein models showed additional improvement in the C-statistic compared to the single protein model (all P < 0.0001). We identified several plasma proteins associated with increased risk of all-cause mortality independently of established risk factors. Further investigation into the putatively causal role of these proteins for longevity is needed. In addition, the examined methods for identifying multiple proteins showed tendencies for overfitting by including several putatively false positive findings. Thus, the reproducibility of findings using such approaches may be limited.
format Online
Article
Text
id pubmed-7990913
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-79909132021-03-26 Methodological considerations for identifying multiple plasma proteins associated with all-cause mortality in a population-based prospective cohort Drake, Isabel Hindy, George Almgren, Peter Engström, Gunnar Nilsson, Jan Melander, Olle Orho-Melander, Marju Sci Rep Article Novel methods to characterize the plasma proteome has made it possible to examine a wide range of proteins in large longitudinal cohort studies, but the complexity of the human proteome makes it difficult to identify robust protein-disease associations. Nevertheless, identification of individuals at high risk of early mortality is a central issue in clinical decision making and novel biomarkers may be useful to improve risk stratification. With adjustment for established risk factors, we examined the associations between 138 plasma proteins measured using two proximity extension assays and long-term risk of all-cause mortality in 3,918 participants of the population-based Malmö Diet and Cancer Study. To examine the reproducibility of protein-mortality associations we used a two-step random-split approach to simulate a discovery and replication cohort and conducted analyses using four different methods: Cox regression, stepwise Cox regression, Lasso-Cox regression, and random survival forest (RSF). In the total study population, we identified eight proteins that associated with all-cause mortality after adjustment for established risk factors and with Bonferroni correction for multiple testing. In the two-step analyses, the number of proteins selected for model inclusion in both random samples ranged from 6 to 21 depending on the method used. However, only three proteins were consistently included in both samples across all four methods (growth/differentiation factor-15 (GDF-15), N-terminal pro-B-type natriuretic peptide, and epididymal secretory protein E4). Using the total study population, the C-statistic for a model including established risk factors was 0.7222 and increased to 0.7284 with inclusion of the most predictive protein (GDF-15; P < 0.0001). All multiple protein models showed additional improvement in the C-statistic compared to the single protein model (all P < 0.0001). We identified several plasma proteins associated with increased risk of all-cause mortality independently of established risk factors. Further investigation into the putatively causal role of these proteins for longevity is needed. In addition, the examined methods for identifying multiple proteins showed tendencies for overfitting by including several putatively false positive findings. Thus, the reproducibility of findings using such approaches may be limited. Nature Publishing Group UK 2021-03-24 /pmc/articles/PMC7990913/ /pubmed/33762603 http://dx.doi.org/10.1038/s41598-021-85991-z Text en © The Author(s) 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Drake, Isabel
Hindy, George
Almgren, Peter
Engström, Gunnar
Nilsson, Jan
Melander, Olle
Orho-Melander, Marju
Methodological considerations for identifying multiple plasma proteins associated with all-cause mortality in a population-based prospective cohort
title Methodological considerations for identifying multiple plasma proteins associated with all-cause mortality in a population-based prospective cohort
title_full Methodological considerations for identifying multiple plasma proteins associated with all-cause mortality in a population-based prospective cohort
title_fullStr Methodological considerations for identifying multiple plasma proteins associated with all-cause mortality in a population-based prospective cohort
title_full_unstemmed Methodological considerations for identifying multiple plasma proteins associated with all-cause mortality in a population-based prospective cohort
title_short Methodological considerations for identifying multiple plasma proteins associated with all-cause mortality in a population-based prospective cohort
title_sort methodological considerations for identifying multiple plasma proteins associated with all-cause mortality in a population-based prospective cohort
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7990913/
https://www.ncbi.nlm.nih.gov/pubmed/33762603
http://dx.doi.org/10.1038/s41598-021-85991-z
work_keys_str_mv AT drakeisabel methodologicalconsiderationsforidentifyingmultipleplasmaproteinsassociatedwithallcausemortalityinapopulationbasedprospectivecohort
AT hindygeorge methodologicalconsiderationsforidentifyingmultipleplasmaproteinsassociatedwithallcausemortalityinapopulationbasedprospectivecohort
AT almgrenpeter methodologicalconsiderationsforidentifyingmultipleplasmaproteinsassociatedwithallcausemortalityinapopulationbasedprospectivecohort
AT engstromgunnar methodologicalconsiderationsforidentifyingmultipleplasmaproteinsassociatedwithallcausemortalityinapopulationbasedprospectivecohort
AT nilssonjan methodologicalconsiderationsforidentifyingmultipleplasmaproteinsassociatedwithallcausemortalityinapopulationbasedprospectivecohort
AT melanderolle methodologicalconsiderationsforidentifyingmultipleplasmaproteinsassociatedwithallcausemortalityinapopulationbasedprospectivecohort
AT orhomelandermarju methodologicalconsiderationsforidentifyingmultipleplasmaproteinsassociatedwithallcausemortalityinapopulationbasedprospectivecohort