Cargando…
Curated compendium of human transcriptional biomarker data
One important use of genome-wide transcriptional profiles is to identify relationships between transcription levels and patient outcomes. These translational insights can guide the development of biomarkers for clinical application. Data from thousands of translational-biomarker studies have been de...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5903354/ https://www.ncbi.nlm.nih.gov/pubmed/29664470 http://dx.doi.org/10.1038/sdata.2018.66 |
_version_ | 1783314929452843008 |
---|---|
author | Golightly, Nathan P. Bell, Avery Bischoff, Anna I. Hollingsworth, Parker D. Piccolo, Stephen R. |
author_facet | Golightly, Nathan P. Bell, Avery Bischoff, Anna I. Hollingsworth, Parker D. Piccolo, Stephen R. |
author_sort | Golightly, Nathan P. |
collection | PubMed |
description | One important use of genome-wide transcriptional profiles is to identify relationships between transcription levels and patient outcomes. These translational insights can guide the development of biomarkers for clinical application. Data from thousands of translational-biomarker studies have been deposited in public repositories, enabling reuse. However, data-reuse efforts require considerable time and expertise because transcriptional data are generated using heterogeneous profiling technologies, preprocessed using diverse normalization procedures, and annotated in non-standard ways. To address this problem, we curated 45 publicly available, translational-biomarker datasets from a variety of human diseases. To increase the data's utility, we reprocessed the raw expression data using a uniform computational pipeline, addressed quality-control problems, mapped the clinical annotations to a controlled vocabulary, and prepared consistently structured, analysis-ready data files. These data, along with scripts we used to prepare the data, are available in a public repository. We believe these data will be particularly useful to researchers seeking to perform benchmarking studies—for example, to compare and optimize machine-learning algorithms' ability to predict biomedical outcomes. |
format | Online Article Text |
id | pubmed-5903354 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-59033542018-05-01 Curated compendium of human transcriptional biomarker data Golightly, Nathan P. Bell, Avery Bischoff, Anna I. Hollingsworth, Parker D. Piccolo, Stephen R. Sci Data Data Descriptor One important use of genome-wide transcriptional profiles is to identify relationships between transcription levels and patient outcomes. These translational insights can guide the development of biomarkers for clinical application. Data from thousands of translational-biomarker studies have been deposited in public repositories, enabling reuse. However, data-reuse efforts require considerable time and expertise because transcriptional data are generated using heterogeneous profiling technologies, preprocessed using diverse normalization procedures, and annotated in non-standard ways. To address this problem, we curated 45 publicly available, translational-biomarker datasets from a variety of human diseases. To increase the data's utility, we reprocessed the raw expression data using a uniform computational pipeline, addressed quality-control problems, mapped the clinical annotations to a controlled vocabulary, and prepared consistently structured, analysis-ready data files. These data, along with scripts we used to prepare the data, are available in a public repository. We believe these data will be particularly useful to researchers seeking to perform benchmarking studies—for example, to compare and optimize machine-learning algorithms' ability to predict biomedical outcomes. Nature Publishing Group 2018-04-17 /pmc/articles/PMC5903354/ /pubmed/29664470 http://dx.doi.org/10.1038/sdata.2018.66 Text en Copyright © 2018, The Author(s) http://creativecommons.org/licenses/by/4.0/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files made available in this article. |
spellingShingle | Data Descriptor Golightly, Nathan P. Bell, Avery Bischoff, Anna I. Hollingsworth, Parker D. Piccolo, Stephen R. Curated compendium of human transcriptional biomarker data |
title | Curated compendium of human transcriptional biomarker data |
title_full | Curated compendium of human transcriptional biomarker data |
title_fullStr | Curated compendium of human transcriptional biomarker data |
title_full_unstemmed | Curated compendium of human transcriptional biomarker data |
title_short | Curated compendium of human transcriptional biomarker data |
title_sort | curated compendium of human transcriptional biomarker data |
topic | Data Descriptor |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5903354/ https://www.ncbi.nlm.nih.gov/pubmed/29664470 http://dx.doi.org/10.1038/sdata.2018.66 |
work_keys_str_mv | AT golightlynathanp curatedcompendiumofhumantranscriptionalbiomarkerdata AT bellavery curatedcompendiumofhumantranscriptionalbiomarkerdata AT bischoffannai curatedcompendiumofhumantranscriptionalbiomarkerdata AT hollingsworthparkerd curatedcompendiumofhumantranscriptionalbiomarkerdata AT piccolostephenr curatedcompendiumofhumantranscriptionalbiomarkerdata |