Cargando…

Reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from RNA-Seq data

BACKGROUND: Human X and Y chromosomes share an evolutionary origin and, as a consequence, sequence similarity. We investigated whether the sequence homology between the X and Y chromosomes affects the alignment of RNA-Seq reads and estimates of differential expression. We tested the effects of using...

Descripción completa

Detalles Bibliográficos
Autores principales: Olney, Kimberly C., Brotman, Sarah M., Andrews, Jocelyn P., Valverde-Vesling, Valeria A., Wilson, Melissa A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7374973/
https://www.ncbi.nlm.nih.gov/pubmed/32693839
http://dx.doi.org/10.1186/s13293-020-00312-9
_version_ 1783561794316402688
author Olney, Kimberly C.
Brotman, Sarah M.
Andrews, Jocelyn P.
Valverde-Vesling, Valeria A.
Wilson, Melissa A.
author_facet Olney, Kimberly C.
Brotman, Sarah M.
Andrews, Jocelyn P.
Valverde-Vesling, Valeria A.
Wilson, Melissa A.
author_sort Olney, Kimberly C.
collection PubMed
description BACKGROUND: Human X and Y chromosomes share an evolutionary origin and, as a consequence, sequence similarity. We investigated whether the sequence homology between the X and Y chromosomes affects the alignment of RNA-Seq reads and estimates of differential expression. We tested the effects of using reference genomes and reference transcriptomes informed by the sex chromosome complement of the sample’s genome on the measurements of RNA-Seq abundance and sex differences in expression. RESULTS: The default genome includes the entire human reference genome (GRCh38), including the entire sequence of the X and Y chromosomes. We created two sex chromosome complement informed reference genomes. One sex chromosome complement informed reference genome was used for samples that lacked a Y chromosome; for this reference genome version, we hard-masked the entire Y chromosome. For the other sex chromosome complement informed reference genome, to be used for samples with a Y chromosome, we hard-masked only the pseudoautosomal regions of the Y chromosome, because these regions are duplicated identically in the reference genome on the X chromosome. We analyzed the transcript abundance in the whole blood, brain cortex, breast, liver, and thyroid tissues from 20 genetic female (46, XX) and 20 genetic male (46, XY) samples. Each sample was aligned twice: once to the default reference genome and then independently aligned to a reference genome informed by the sex chromosome complement of the sample, repeated using two different read aligners, HISAT and STAR. We then quantified sex differences in gene expression using featureCounts to get the raw count estimates followed by Limma/Voom for normalization and differential expression. We additionally created sex chromosome complement informed transcriptome references for use in pseudo-alignment using Salmon. Transcript abundance was quantified twice for each sample: once to the default target transcripts and then independently to target transcripts informed by the sex chromosome complement of the sample. CONCLUSIONS: We show that regardless of the choice of the read aligner, using an alignment protocol informed by the sex chromosome complement of the sample results in higher expression estimates on the pseudoautosomal regions of the X chromosome in both genetic male and genetic female samples, as well as an increased number of unique genes being called as differentially expressed between the sexes. We additionally show that using a pseudo-alignment approach informed on the sex chromosome complement of the sample eliminates Y-linked expression in female XX samples.
format Online
Article
Text
id pubmed-7374973
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-73749732020-07-22 Reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from RNA-Seq data Olney, Kimberly C. Brotman, Sarah M. Andrews, Jocelyn P. Valverde-Vesling, Valeria A. Wilson, Melissa A. Biol Sex Differ Research BACKGROUND: Human X and Y chromosomes share an evolutionary origin and, as a consequence, sequence similarity. We investigated whether the sequence homology between the X and Y chromosomes affects the alignment of RNA-Seq reads and estimates of differential expression. We tested the effects of using reference genomes and reference transcriptomes informed by the sex chromosome complement of the sample’s genome on the measurements of RNA-Seq abundance and sex differences in expression. RESULTS: The default genome includes the entire human reference genome (GRCh38), including the entire sequence of the X and Y chromosomes. We created two sex chromosome complement informed reference genomes. One sex chromosome complement informed reference genome was used for samples that lacked a Y chromosome; for this reference genome version, we hard-masked the entire Y chromosome. For the other sex chromosome complement informed reference genome, to be used for samples with a Y chromosome, we hard-masked only the pseudoautosomal regions of the Y chromosome, because these regions are duplicated identically in the reference genome on the X chromosome. We analyzed the transcript abundance in the whole blood, brain cortex, breast, liver, and thyroid tissues from 20 genetic female (46, XX) and 20 genetic male (46, XY) samples. Each sample was aligned twice: once to the default reference genome and then independently aligned to a reference genome informed by the sex chromosome complement of the sample, repeated using two different read aligners, HISAT and STAR. We then quantified sex differences in gene expression using featureCounts to get the raw count estimates followed by Limma/Voom for normalization and differential expression. We additionally created sex chromosome complement informed transcriptome references for use in pseudo-alignment using Salmon. Transcript abundance was quantified twice for each sample: once to the default target transcripts and then independently to target transcripts informed by the sex chromosome complement of the sample. CONCLUSIONS: We show that regardless of the choice of the read aligner, using an alignment protocol informed by the sex chromosome complement of the sample results in higher expression estimates on the pseudoautosomal regions of the X chromosome in both genetic male and genetic female samples, as well as an increased number of unique genes being called as differentially expressed between the sexes. We additionally show that using a pseudo-alignment approach informed on the sex chromosome complement of the sample eliminates Y-linked expression in female XX samples. BioMed Central 2020-07-21 /pmc/articles/PMC7374973/ /pubmed/32693839 http://dx.doi.org/10.1186/s13293-020-00312-9 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Olney, Kimberly C.
Brotman, Sarah M.
Andrews, Jocelyn P.
Valverde-Vesling, Valeria A.
Wilson, Melissa A.
Reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from RNA-Seq data
title Reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from RNA-Seq data
title_full Reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from RNA-Seq data
title_fullStr Reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from RNA-Seq data
title_full_unstemmed Reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from RNA-Seq data
title_short Reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from RNA-Seq data
title_sort reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from rna-seq data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7374973/
https://www.ncbi.nlm.nih.gov/pubmed/32693839
http://dx.doi.org/10.1186/s13293-020-00312-9
work_keys_str_mv AT olneykimberlyc referencegenomeandtranscriptomeinformedbythesexchromosomecomplementofthesampleincreaseabilitytodetectsexdifferencesingeneexpressionfromrnaseqdata
AT brotmansarahm referencegenomeandtranscriptomeinformedbythesexchromosomecomplementofthesampleincreaseabilitytodetectsexdifferencesingeneexpressionfromrnaseqdata
AT andrewsjocelynp referencegenomeandtranscriptomeinformedbythesexchromosomecomplementofthesampleincreaseabilitytodetectsexdifferencesingeneexpressionfromrnaseqdata
AT valverdeveslingvaleriaa referencegenomeandtranscriptomeinformedbythesexchromosomecomplementofthesampleincreaseabilitytodetectsexdifferencesingeneexpressionfromrnaseqdata
AT wilsonmelissaa referencegenomeandtranscriptomeinformedbythesexchromosomecomplementofthesampleincreaseabilitytodetectsexdifferencesingeneexpressionfromrnaseqdata