Cargando…

Curated compendium of human transcriptional biomarker data

One important use of genome-wide transcriptional profiles is to identify relationships between transcription levels and patient outcomes. These translational insights can guide the development of biomarkers for clinical application. Data from thousands of translational-biomarker studies have been de...

Descripción completa

Detalles Bibliográficos
Autores principales: Golightly, Nathan P., Bell, Avery, Bischoff, Anna I., Hollingsworth, Parker D., Piccolo, Stephen R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5903354/
https://www.ncbi.nlm.nih.gov/pubmed/29664470
http://dx.doi.org/10.1038/sdata.2018.66
_version_ 1783314929452843008
author Golightly, Nathan P.
Bell, Avery
Bischoff, Anna I.
Hollingsworth, Parker D.
Piccolo, Stephen R.
author_facet Golightly, Nathan P.
Bell, Avery
Bischoff, Anna I.
Hollingsworth, Parker D.
Piccolo, Stephen R.
author_sort Golightly, Nathan P.
collection PubMed
description One important use of genome-wide transcriptional profiles is to identify relationships between transcription levels and patient outcomes. These translational insights can guide the development of biomarkers for clinical application. Data from thousands of translational-biomarker studies have been deposited in public repositories, enabling reuse. However, data-reuse efforts require considerable time and expertise because transcriptional data are generated using heterogeneous profiling technologies, preprocessed using diverse normalization procedures, and annotated in non-standard ways. To address this problem, we curated 45 publicly available, translational-biomarker datasets from a variety of human diseases. To increase the data's utility, we reprocessed the raw expression data using a uniform computational pipeline, addressed quality-control problems, mapped the clinical annotations to a controlled vocabulary, and prepared consistently structured, analysis-ready data files. These data, along with scripts we used to prepare the data, are available in a public repository. We believe these data will be particularly useful to researchers seeking to perform benchmarking studies—for example, to compare and optimize machine-learning algorithms' ability to predict biomedical outcomes.
format Online
Article
Text
id pubmed-5903354
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-59033542018-05-01 Curated compendium of human transcriptional biomarker data Golightly, Nathan P. Bell, Avery Bischoff, Anna I. Hollingsworth, Parker D. Piccolo, Stephen R. Sci Data Data Descriptor One important use of genome-wide transcriptional profiles is to identify relationships between transcription levels and patient outcomes. These translational insights can guide the development of biomarkers for clinical application. Data from thousands of translational-biomarker studies have been deposited in public repositories, enabling reuse. However, data-reuse efforts require considerable time and expertise because transcriptional data are generated using heterogeneous profiling technologies, preprocessed using diverse normalization procedures, and annotated in non-standard ways. To address this problem, we curated 45 publicly available, translational-biomarker datasets from a variety of human diseases. To increase the data's utility, we reprocessed the raw expression data using a uniform computational pipeline, addressed quality-control problems, mapped the clinical annotations to a controlled vocabulary, and prepared consistently structured, analysis-ready data files. These data, along with scripts we used to prepare the data, are available in a public repository. We believe these data will be particularly useful to researchers seeking to perform benchmarking studies—for example, to compare and optimize machine-learning algorithms' ability to predict biomedical outcomes. Nature Publishing Group 2018-04-17 /pmc/articles/PMC5903354/ /pubmed/29664470 http://dx.doi.org/10.1038/sdata.2018.66 Text en Copyright © 2018, The Author(s) http://creativecommons.org/licenses/by/4.0/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files made available in this article.
spellingShingle Data Descriptor
Golightly, Nathan P.
Bell, Avery
Bischoff, Anna I.
Hollingsworth, Parker D.
Piccolo, Stephen R.
Curated compendium of human transcriptional biomarker data
title Curated compendium of human transcriptional biomarker data
title_full Curated compendium of human transcriptional biomarker data
title_fullStr Curated compendium of human transcriptional biomarker data
title_full_unstemmed Curated compendium of human transcriptional biomarker data
title_short Curated compendium of human transcriptional biomarker data
title_sort curated compendium of human transcriptional biomarker data
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5903354/
https://www.ncbi.nlm.nih.gov/pubmed/29664470
http://dx.doi.org/10.1038/sdata.2018.66
work_keys_str_mv AT golightlynathanp curatedcompendiumofhumantranscriptionalbiomarkerdata
AT bellavery curatedcompendiumofhumantranscriptionalbiomarkerdata
AT bischoffannai curatedcompendiumofhumantranscriptionalbiomarkerdata
AT hollingsworthparkerd curatedcompendiumofhumantranscriptionalbiomarkerdata
AT piccolostephenr curatedcompendiumofhumantranscriptionalbiomarkerdata