Cargando…

Confounding factors in profiling of locus-specific human endogenous retrovirus (HERV) transcript signatures in primary T cells using multi-study-derived datasets

BACKGROUND: Human endogenous retroviruses (HERV) are repetitive sequence elements and a substantial part of the human genome. Their role in development has been well documented and there is now mounting evidence that dysregulated HERV expression also contributes to various human diseases. While rese...

Descripción completa

Detalles Bibliográficos
Autores principales: Hamann, Martin V., Adiba, Maisha, Lange, Ulrike C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10068191/
https://www.ncbi.nlm.nih.gov/pubmed/37013607
http://dx.doi.org/10.1186/s12920-023-01486-y
_version_ 1785018629073403904
author Hamann, Martin V.
Adiba, Maisha
Lange, Ulrike C.
author_facet Hamann, Martin V.
Adiba, Maisha
Lange, Ulrike C.
author_sort Hamann, Martin V.
collection PubMed
description BACKGROUND: Human endogenous retroviruses (HERV) are repetitive sequence elements and a substantial part of the human genome. Their role in development has been well documented and there is now mounting evidence that dysregulated HERV expression also contributes to various human diseases. While research on HERV elements has in the past been hampered by their high sequence similarity, advanced sequencing technology and analytical tools have empowered the field. For the first time, we are now able to undertake locus-specific HERV analysis, deciphering expression patterns, regulatory networks and biological functions of these elements. To do so, we inevitable rely on omics datasets available through the public domain. However, technical parameters inevitably differ, making inter-study analysis challenging. We here address the issue of confounding factors for profiling locus-specific HERV transcriptomes using datasets from multiple sources. METHODS: We collected RNAseq datasets of CD4 and CD8 primary T cells and extracted HERV expression profiles for 3220 elements, resembling most intact, near full-length proviruses. Looking at sequencing parameters and batch effects, we compared HERV signatures across datasets and determined permissive features for HERV expression analysis from multiple-source data. RESULTS: We could demonstrate that considering sequencing parameters, sequencing-depth is most influential on HERV signature outcome. Sequencing samples deeper broadens the spectrum of expressed HERV elements. Sequencing mode and read length are secondary parameters. Nevertheless, we find that HERV signatures from smaller RNAseq datasets do reliably reveal most abundantly expressed HERV elements. Overall, HERV signatures between samples and studies overlap substantially, indicating a robust HERV transcript signature in CD4 and CD8 T cells. Moreover, we find that measures of batch effect reduction are critical to uncover genic and HERV expression differences between cell types. After doing so, differences in the HERV transcriptome between ontologically closely related CD4 and CD8 T cells became apparent. CONCLUSION: In our systematic approach to determine sequencing and analysis parameters for detection of locus-specific HERV expression, we provide evidence that analysis of RNAseq datasets from multiple studies can aid confidence of biological findings. When generating de novo HERV expression datasets we recommend increased sequence depth ( > = 100 mio reads) compared to standard genic transcriptome pipelines. Finally, batch effect reduction measures need to be implemented to allow for differential expression analysis. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12920-023-01486-y.
format Online
Article
Text
id pubmed-10068191
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-100681912023-04-03 Confounding factors in profiling of locus-specific human endogenous retrovirus (HERV) transcript signatures in primary T cells using multi-study-derived datasets Hamann, Martin V. Adiba, Maisha Lange, Ulrike C. BMC Med Genomics Research BACKGROUND: Human endogenous retroviruses (HERV) are repetitive sequence elements and a substantial part of the human genome. Their role in development has been well documented and there is now mounting evidence that dysregulated HERV expression also contributes to various human diseases. While research on HERV elements has in the past been hampered by their high sequence similarity, advanced sequencing technology and analytical tools have empowered the field. For the first time, we are now able to undertake locus-specific HERV analysis, deciphering expression patterns, regulatory networks and biological functions of these elements. To do so, we inevitable rely on omics datasets available through the public domain. However, technical parameters inevitably differ, making inter-study analysis challenging. We here address the issue of confounding factors for profiling locus-specific HERV transcriptomes using datasets from multiple sources. METHODS: We collected RNAseq datasets of CD4 and CD8 primary T cells and extracted HERV expression profiles for 3220 elements, resembling most intact, near full-length proviruses. Looking at sequencing parameters and batch effects, we compared HERV signatures across datasets and determined permissive features for HERV expression analysis from multiple-source data. RESULTS: We could demonstrate that considering sequencing parameters, sequencing-depth is most influential on HERV signature outcome. Sequencing samples deeper broadens the spectrum of expressed HERV elements. Sequencing mode and read length are secondary parameters. Nevertheless, we find that HERV signatures from smaller RNAseq datasets do reliably reveal most abundantly expressed HERV elements. Overall, HERV signatures between samples and studies overlap substantially, indicating a robust HERV transcript signature in CD4 and CD8 T cells. Moreover, we find that measures of batch effect reduction are critical to uncover genic and HERV expression differences between cell types. After doing so, differences in the HERV transcriptome between ontologically closely related CD4 and CD8 T cells became apparent. CONCLUSION: In our systematic approach to determine sequencing and analysis parameters for detection of locus-specific HERV expression, we provide evidence that analysis of RNAseq datasets from multiple studies can aid confidence of biological findings. When generating de novo HERV expression datasets we recommend increased sequence depth ( > = 100 mio reads) compared to standard genic transcriptome pipelines. Finally, batch effect reduction measures need to be implemented to allow for differential expression analysis. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12920-023-01486-y. BioMed Central 2023-04-03 /pmc/articles/PMC10068191/ /pubmed/37013607 http://dx.doi.org/10.1186/s12920-023-01486-y Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Hamann, Martin V.
Adiba, Maisha
Lange, Ulrike C.
Confounding factors in profiling of locus-specific human endogenous retrovirus (HERV) transcript signatures in primary T cells using multi-study-derived datasets
title Confounding factors in profiling of locus-specific human endogenous retrovirus (HERV) transcript signatures in primary T cells using multi-study-derived datasets
title_full Confounding factors in profiling of locus-specific human endogenous retrovirus (HERV) transcript signatures in primary T cells using multi-study-derived datasets
title_fullStr Confounding factors in profiling of locus-specific human endogenous retrovirus (HERV) transcript signatures in primary T cells using multi-study-derived datasets
title_full_unstemmed Confounding factors in profiling of locus-specific human endogenous retrovirus (HERV) transcript signatures in primary T cells using multi-study-derived datasets
title_short Confounding factors in profiling of locus-specific human endogenous retrovirus (HERV) transcript signatures in primary T cells using multi-study-derived datasets
title_sort confounding factors in profiling of locus-specific human endogenous retrovirus (herv) transcript signatures in primary t cells using multi-study-derived datasets
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10068191/
https://www.ncbi.nlm.nih.gov/pubmed/37013607
http://dx.doi.org/10.1186/s12920-023-01486-y
work_keys_str_mv AT hamannmartinv confoundingfactorsinprofilingoflocusspecifichumanendogenousretrovirushervtranscriptsignaturesinprimarytcellsusingmultistudyderiveddatasets
AT adibamaisha confoundingfactorsinprofilingoflocusspecifichumanendogenousretrovirushervtranscriptsignaturesinprimarytcellsusingmultistudyderiveddatasets
AT langeulrikec confoundingfactorsinprofilingoflocusspecifichumanendogenousretrovirushervtranscriptsignaturesinprimarytcellsusingmultistudyderiveddatasets