Cargando…

Large-scale labeling and assessment of sex bias in publicly available expression data

BACKGROUND: Women are at more than 1.5-fold higher risk for clinically relevant adverse drug events. While this higher prevalence is partially due to gender-related effects, biological sex differences likely also impact drug response. Publicly available gene expression databases provide a unique opp...

Descripción completa

Detalles Bibliográficos
Autores principales: Flynn, Emily, Chang, Annie, Altman, Russ B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8011224/
https://www.ncbi.nlm.nih.gov/pubmed/33784977
http://dx.doi.org/10.1186/s12859-021-04070-2
_version_ 1783673205223849984
author Flynn, Emily
Chang, Annie
Altman, Russ B.
author_facet Flynn, Emily
Chang, Annie
Altman, Russ B.
author_sort Flynn, Emily
collection PubMed
description BACKGROUND: Women are at more than 1.5-fold higher risk for clinically relevant adverse drug events. While this higher prevalence is partially due to gender-related effects, biological sex differences likely also impact drug response. Publicly available gene expression databases provide a unique opportunity for examining drug response at a cellular level. However, missingness and heterogeneity of metadata prevent large-scale identification of drug exposure studies and limit assessments of sex bias. To address this, we trained organism-specific models to infer sample sex from gene expression data, and used entity normalization to map metadata cell line and drug mentions to existing ontologies. Using this method, we inferred sex labels for 450,371 human and 245,107 mouse microarray and RNA-seq samples from refine.bio. RESULTS: Overall, we find slight female bias (52.1%) in human samples and (62.5%) male bias in mouse samples; this corresponds to a majority of mixed sex studies in humans and single sex studies in mice, split between female-only and male-only (25.8% vs. 18.9% in human and 21.6% vs. 31.1% in mouse, respectively). In drug studies, we find limited evidence for sex-sampling bias overall; however, specific categories of drugs, including human cancer and mouse nervous system drugs, are enriched in female-only and male-only studies, respectively. We leverage our expression-based sex labels to further examine the complexity of cell line sex and assess the frequency of metadata sex label misannotations (2–5%). CONCLUSIONS: Our results demonstrate limited overall sex bias, while highlighting high bias in specific subfields and underscoring the importance of including sex labels to better understand the underlying biology. We make our inferred and normalized labels, along with flags for misannotated samples, publicly available to catalyze the routine use of sex as a study variable in future analyses. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04070-2.
format Online
Article
Text
id pubmed-8011224
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-80112242021-04-01 Large-scale labeling and assessment of sex bias in publicly available expression data Flynn, Emily Chang, Annie Altman, Russ B. BMC Bioinformatics Research Article BACKGROUND: Women are at more than 1.5-fold higher risk for clinically relevant adverse drug events. While this higher prevalence is partially due to gender-related effects, biological sex differences likely also impact drug response. Publicly available gene expression databases provide a unique opportunity for examining drug response at a cellular level. However, missingness and heterogeneity of metadata prevent large-scale identification of drug exposure studies and limit assessments of sex bias. To address this, we trained organism-specific models to infer sample sex from gene expression data, and used entity normalization to map metadata cell line and drug mentions to existing ontologies. Using this method, we inferred sex labels for 450,371 human and 245,107 mouse microarray and RNA-seq samples from refine.bio. RESULTS: Overall, we find slight female bias (52.1%) in human samples and (62.5%) male bias in mouse samples; this corresponds to a majority of mixed sex studies in humans and single sex studies in mice, split between female-only and male-only (25.8% vs. 18.9% in human and 21.6% vs. 31.1% in mouse, respectively). In drug studies, we find limited evidence for sex-sampling bias overall; however, specific categories of drugs, including human cancer and mouse nervous system drugs, are enriched in female-only and male-only studies, respectively. We leverage our expression-based sex labels to further examine the complexity of cell line sex and assess the frequency of metadata sex label misannotations (2–5%). CONCLUSIONS: Our results demonstrate limited overall sex bias, while highlighting high bias in specific subfields and underscoring the importance of including sex labels to better understand the underlying biology. We make our inferred and normalized labels, along with flags for misannotated samples, publicly available to catalyze the routine use of sex as a study variable in future analyses. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04070-2. BioMed Central 2021-03-30 /pmc/articles/PMC8011224/ /pubmed/33784977 http://dx.doi.org/10.1186/s12859-021-04070-2 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Flynn, Emily
Chang, Annie
Altman, Russ B.
Large-scale labeling and assessment of sex bias in publicly available expression data
title Large-scale labeling and assessment of sex bias in publicly available expression data
title_full Large-scale labeling and assessment of sex bias in publicly available expression data
title_fullStr Large-scale labeling and assessment of sex bias in publicly available expression data
title_full_unstemmed Large-scale labeling and assessment of sex bias in publicly available expression data
title_short Large-scale labeling and assessment of sex bias in publicly available expression data
title_sort large-scale labeling and assessment of sex bias in publicly available expression data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8011224/
https://www.ncbi.nlm.nih.gov/pubmed/33784977
http://dx.doi.org/10.1186/s12859-021-04070-2
work_keys_str_mv AT flynnemily largescalelabelingandassessmentofsexbiasinpubliclyavailableexpressiondata
AT changannie largescalelabelingandassessmentofsexbiasinpubliclyavailableexpressiondata
AT altmanrussb largescalelabelingandassessmentofsexbiasinpubliclyavailableexpressiondata