Cargando…

Recovering the raw data behind a non-parametric survival curve

BACKGROUND: Researchers often wish to carry out additional calculations or analyses using the survival data from one or more studies of other authors. When it is not possible to obtain the raw data directly, reconstruction techniques provide a valuable alternative. Several authors have proposed meth...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Zhihui, Rich, Benjamin, Hanley, James A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4293001/
https://www.ncbi.nlm.nih.gov/pubmed/25551437
http://dx.doi.org/10.1186/2046-4053-3-151
_version_ 1782352563690012672
author Liu, Zhihui
Rich, Benjamin
Hanley, James A
author_facet Liu, Zhihui
Rich, Benjamin
Hanley, James A
author_sort Liu, Zhihui
collection PubMed
description BACKGROUND: Researchers often wish to carry out additional calculations or analyses using the survival data from one or more studies of other authors. When it is not possible to obtain the raw data directly, reconstruction techniques provide a valuable alternative. Several authors have proposed methods/tools for extracting data from such curves using a digitizing software. Instead of using a digitizer to read in the coordinates from a raster image, we propose directly reading in the lines of the PostScript file of a vector image. METHODS: Using examples, and a formal error analysis, we illustrate the extent to which, with what accuracy and precision, and in what circumstances, this information can be recovered from the various electronic formats in which such curves are published. We focus on the additional precision, and elimination of observer variation, achieved by using vector-based formats rendered by PostScript, rather than the lower resolution image-based formats that have been analyzed up to now. We provide some R code to process these. RESULTS: If the raster-based images are available, one can reliably recover much of the original information that seems to be ‘hidden’ beneath published survival curves. If the original images can be obtained as a PostScript file, the data recovered from it can then be either input into these tools or processed directly. We found that the PostScript used by Stata discloses considerably more of the data hidden behind survival curves than that generated by other statistical packages. CONCLUSIONS: When it is not possible to obtain the raw data from the authors, reconstruction techniques are a valuable alternative. Compared with previous approaches, one advantage of ours is that there is no observer variation: there is no need to repeat the digitization process, since the extraction is completely replicable.
format Online
Article
Text
id pubmed-4293001
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42930012015-01-14 Recovering the raw data behind a non-parametric survival curve Liu, Zhihui Rich, Benjamin Hanley, James A Syst Rev Methodology BACKGROUND: Researchers often wish to carry out additional calculations or analyses using the survival data from one or more studies of other authors. When it is not possible to obtain the raw data directly, reconstruction techniques provide a valuable alternative. Several authors have proposed methods/tools for extracting data from such curves using a digitizing software. Instead of using a digitizer to read in the coordinates from a raster image, we propose directly reading in the lines of the PostScript file of a vector image. METHODS: Using examples, and a formal error analysis, we illustrate the extent to which, with what accuracy and precision, and in what circumstances, this information can be recovered from the various electronic formats in which such curves are published. We focus on the additional precision, and elimination of observer variation, achieved by using vector-based formats rendered by PostScript, rather than the lower resolution image-based formats that have been analyzed up to now. We provide some R code to process these. RESULTS: If the raster-based images are available, one can reliably recover much of the original information that seems to be ‘hidden’ beneath published survival curves. If the original images can be obtained as a PostScript file, the data recovered from it can then be either input into these tools or processed directly. We found that the PostScript used by Stata discloses considerably more of the data hidden behind survival curves than that generated by other statistical packages. CONCLUSIONS: When it is not possible to obtain the raw data from the authors, reconstruction techniques are a valuable alternative. Compared with previous approaches, one advantage of ours is that there is no observer variation: there is no need to repeat the digitization process, since the extraction is completely replicable. BioMed Central 2014-12-30 /pmc/articles/PMC4293001/ /pubmed/25551437 http://dx.doi.org/10.1186/2046-4053-3-151 Text en © Liu et al.; licensee BioMed Central. 2015 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Liu, Zhihui
Rich, Benjamin
Hanley, James A
Recovering the raw data behind a non-parametric survival curve
title Recovering the raw data behind a non-parametric survival curve
title_full Recovering the raw data behind a non-parametric survival curve
title_fullStr Recovering the raw data behind a non-parametric survival curve
title_full_unstemmed Recovering the raw data behind a non-parametric survival curve
title_short Recovering the raw data behind a non-parametric survival curve
title_sort recovering the raw data behind a non-parametric survival curve
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4293001/
https://www.ncbi.nlm.nih.gov/pubmed/25551437
http://dx.doi.org/10.1186/2046-4053-3-151
work_keys_str_mv AT liuzhihui recoveringtherawdatabehindanonparametricsurvivalcurve
AT richbenjamin recoveringtherawdatabehindanonparametricsurvivalcurve
AT hanleyjamesa recoveringtherawdatabehindanonparametricsurvivalcurve