Cargando…

Accurate Computation of Survival Statistics in Genome-Wide Studies

A key challenge in genomics is to identify genetic variants that distinguish patients with different survival time following diagnosis or treatment. While the log-rank test is widely used for this purpose, nearly all implementations of the log-rank test rely on an asymptotic approximation that is no...

Descripción completa

Detalles Bibliográficos
Autores principales: Vandin, Fabio, Papoutsaki, Alexandra, Raphael, Benjamin J., Upfal, Eli
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4423942/
https://www.ncbi.nlm.nih.gov/pubmed/25950620
http://dx.doi.org/10.1371/journal.pcbi.1004071
_version_ 1782370280733147136
author Vandin, Fabio
Papoutsaki, Alexandra
Raphael, Benjamin J.
Upfal, Eli
author_facet Vandin, Fabio
Papoutsaki, Alexandra
Raphael, Benjamin J.
Upfal, Eli
author_sort Vandin, Fabio
collection PubMed
description A key challenge in genomics is to identify genetic variants that distinguish patients with different survival time following diagnosis or treatment. While the log-rank test is widely used for this purpose, nearly all implementations of the log-rank test rely on an asymptotic approximation that is not appropriate in many genomics applications. This is because: the two populations determined by a genetic variant may have very different sizes; and the evaluation of many possible variants demands highly accurate computation of very small p-values. We demonstrate this problem for cancer genomics data where the standard log-rank test leads to many false positive associations between somatic mutations and survival time. We develop and analyze a novel algorithm, Exact Log-rank Test (ExaLT), that accurately computes the p-value of the log-rank statistic under an exact distribution that is appropriate for any size populations. We demonstrate the advantages of ExaLT on data from published cancer genomics studies, finding significant differences from the reported p-values. We analyze somatic mutations in six cancer types from The Cancer Genome Atlas (TCGA), finding mutations with known association to survival as well as several novel associations. In contrast, standard implementations of the log-rank test report dozens-hundreds of likely false positive associations as more significant than these known associations.
format Online
Article
Text
id pubmed-4423942
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44239422015-05-13 Accurate Computation of Survival Statistics in Genome-Wide Studies Vandin, Fabio Papoutsaki, Alexandra Raphael, Benjamin J. Upfal, Eli PLoS Comput Biol Research Article A key challenge in genomics is to identify genetic variants that distinguish patients with different survival time following diagnosis or treatment. While the log-rank test is widely used for this purpose, nearly all implementations of the log-rank test rely on an asymptotic approximation that is not appropriate in many genomics applications. This is because: the two populations determined by a genetic variant may have very different sizes; and the evaluation of many possible variants demands highly accurate computation of very small p-values. We demonstrate this problem for cancer genomics data where the standard log-rank test leads to many false positive associations between somatic mutations and survival time. We develop and analyze a novel algorithm, Exact Log-rank Test (ExaLT), that accurately computes the p-value of the log-rank statistic under an exact distribution that is appropriate for any size populations. We demonstrate the advantages of ExaLT on data from published cancer genomics studies, finding significant differences from the reported p-values. We analyze somatic mutations in six cancer types from The Cancer Genome Atlas (TCGA), finding mutations with known association to survival as well as several novel associations. In contrast, standard implementations of the log-rank test report dozens-hundreds of likely false positive associations as more significant than these known associations. Public Library of Science 2015-05-07 /pmc/articles/PMC4423942/ /pubmed/25950620 http://dx.doi.org/10.1371/journal.pcbi.1004071 Text en © 2015 Vandin et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Vandin, Fabio
Papoutsaki, Alexandra
Raphael, Benjamin J.
Upfal, Eli
Accurate Computation of Survival Statistics in Genome-Wide Studies
title Accurate Computation of Survival Statistics in Genome-Wide Studies
title_full Accurate Computation of Survival Statistics in Genome-Wide Studies
title_fullStr Accurate Computation of Survival Statistics in Genome-Wide Studies
title_full_unstemmed Accurate Computation of Survival Statistics in Genome-Wide Studies
title_short Accurate Computation of Survival Statistics in Genome-Wide Studies
title_sort accurate computation of survival statistics in genome-wide studies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4423942/
https://www.ncbi.nlm.nih.gov/pubmed/25950620
http://dx.doi.org/10.1371/journal.pcbi.1004071
work_keys_str_mv AT vandinfabio accuratecomputationofsurvivalstatisticsingenomewidestudies
AT papoutsakialexandra accuratecomputationofsurvivalstatisticsingenomewidestudies
AT raphaelbenjaminj accuratecomputationofsurvivalstatisticsingenomewidestudies
AT upfaleli accuratecomputationofsurvivalstatisticsingenomewidestudies