Cargando…

An analytical pipeline for identifying and mapping the integration sites of HIV and other retroviruses

BACKGROUND: All retroviruses, including human immunodeficiency virus (HIV), must integrate a DNA copy of their genomes into the genome of the infected host cell to replicate. Although integrated retroviral DNA, known as a provirus, can be found at many sites in the host genome, integration is not ra...

Descripción completa

Detalles Bibliográficos
Autores principales: Wells, Daria W., Guo, Shuang, Shao, Wei, Bale, Michael J., Coffin, John M., Hughes, Stephen H., Wu, Xiaolin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7063773/
https://www.ncbi.nlm.nih.gov/pubmed/32151239
http://dx.doi.org/10.1186/s12864-020-6647-4
_version_ 1783504756097941504
author Wells, Daria W.
Guo, Shuang
Shao, Wei
Bale, Michael J.
Coffin, John M.
Hughes, Stephen H.
Wu, Xiaolin
author_facet Wells, Daria W.
Guo, Shuang
Shao, Wei
Bale, Michael J.
Coffin, John M.
Hughes, Stephen H.
Wu, Xiaolin
author_sort Wells, Daria W.
collection PubMed
description BACKGROUND: All retroviruses, including human immunodeficiency virus (HIV), must integrate a DNA copy of their genomes into the genome of the infected host cell to replicate. Although integrated retroviral DNA, known as a provirus, can be found at many sites in the host genome, integration is not random. The adaption of linker-mediated PCR (LM-PCR) protocols for high-throughput integration site mapping, using randomly-sheared genomic DNA and Illumina paired-end sequencing, has dramatically increased the number of mapped integration sites. Analysis of samples from human donors has shown that there is clonal expansion of HIV infected cells and that clonal expansion makes an important contribution to HIV persistence. However, analysis of HIV integration sites in samples taken from patients requires extensive PCR amplification and high-throughput sequencing, which makes the methodology prone to certain specific artifacts. RESULTS: To address the problems with artifacts, we use a comprehensive approach involving experimental procedures linked to a bioinformatics analysis pipeline. Using this combined approach, we are able to reduce the number of PCR/sequencing artifacts that arise and identify the ones that remain. Our streamlined workflow combines random cleavage of the DNA in the samples, end repair, and linker ligation in a single step. We provide guidance on primer and linker design that reduces some of the common artifacts. We also discuss how to identify and remove some of the common artifacts, including the products of PCR mispriming and PCR recombination, that have appeared in some published studies. Our improved bioinformatics pipeline rapidly parses the sequencing data and identifies bona fide integration sites in clonally expanded cells, producing an Excel-formatted report that can be used for additional data processing. CONCLUSIONS: We provide a detailed protocol that reduces the prevalence of artifacts that arise in the analysis of retroviral integration site data generated from in vivo samples and a bioinformatics pipeline that is able to remove the artifacts that remain.
format Online
Article
Text
id pubmed-7063773
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-70637732020-03-13 An analytical pipeline for identifying and mapping the integration sites of HIV and other retroviruses Wells, Daria W. Guo, Shuang Shao, Wei Bale, Michael J. Coffin, John M. Hughes, Stephen H. Wu, Xiaolin BMC Genomics Methodology Article BACKGROUND: All retroviruses, including human immunodeficiency virus (HIV), must integrate a DNA copy of their genomes into the genome of the infected host cell to replicate. Although integrated retroviral DNA, known as a provirus, can be found at many sites in the host genome, integration is not random. The adaption of linker-mediated PCR (LM-PCR) protocols for high-throughput integration site mapping, using randomly-sheared genomic DNA and Illumina paired-end sequencing, has dramatically increased the number of mapped integration sites. Analysis of samples from human donors has shown that there is clonal expansion of HIV infected cells and that clonal expansion makes an important contribution to HIV persistence. However, analysis of HIV integration sites in samples taken from patients requires extensive PCR amplification and high-throughput sequencing, which makes the methodology prone to certain specific artifacts. RESULTS: To address the problems with artifacts, we use a comprehensive approach involving experimental procedures linked to a bioinformatics analysis pipeline. Using this combined approach, we are able to reduce the number of PCR/sequencing artifacts that arise and identify the ones that remain. Our streamlined workflow combines random cleavage of the DNA in the samples, end repair, and linker ligation in a single step. We provide guidance on primer and linker design that reduces some of the common artifacts. We also discuss how to identify and remove some of the common artifacts, including the products of PCR mispriming and PCR recombination, that have appeared in some published studies. Our improved bioinformatics pipeline rapidly parses the sequencing data and identifies bona fide integration sites in clonally expanded cells, producing an Excel-formatted report that can be used for additional data processing. CONCLUSIONS: We provide a detailed protocol that reduces the prevalence of artifacts that arise in the analysis of retroviral integration site data generated from in vivo samples and a bioinformatics pipeline that is able to remove the artifacts that remain. BioMed Central 2020-03-09 /pmc/articles/PMC7063773/ /pubmed/32151239 http://dx.doi.org/10.1186/s12864-020-6647-4 Text en © The Author(s). 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Wells, Daria W.
Guo, Shuang
Shao, Wei
Bale, Michael J.
Coffin, John M.
Hughes, Stephen H.
Wu, Xiaolin
An analytical pipeline for identifying and mapping the integration sites of HIV and other retroviruses
title An analytical pipeline for identifying and mapping the integration sites of HIV and other retroviruses
title_full An analytical pipeline for identifying and mapping the integration sites of HIV and other retroviruses
title_fullStr An analytical pipeline for identifying and mapping the integration sites of HIV and other retroviruses
title_full_unstemmed An analytical pipeline for identifying and mapping the integration sites of HIV and other retroviruses
title_short An analytical pipeline for identifying and mapping the integration sites of HIV and other retroviruses
title_sort analytical pipeline for identifying and mapping the integration sites of hiv and other retroviruses
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7063773/
https://www.ncbi.nlm.nih.gov/pubmed/32151239
http://dx.doi.org/10.1186/s12864-020-6647-4
work_keys_str_mv AT wellsdariaw ananalyticalpipelineforidentifyingandmappingtheintegrationsitesofhivandotherretroviruses
AT guoshuang ananalyticalpipelineforidentifyingandmappingtheintegrationsitesofhivandotherretroviruses
AT shaowei ananalyticalpipelineforidentifyingandmappingtheintegrationsitesofhivandotherretroviruses
AT balemichaelj ananalyticalpipelineforidentifyingandmappingtheintegrationsitesofhivandotherretroviruses
AT coffinjohnm ananalyticalpipelineforidentifyingandmappingtheintegrationsitesofhivandotherretroviruses
AT hughesstephenh ananalyticalpipelineforidentifyingandmappingtheintegrationsitesofhivandotherretroviruses
AT wuxiaolin ananalyticalpipelineforidentifyingandmappingtheintegrationsitesofhivandotherretroviruses
AT wellsdariaw analyticalpipelineforidentifyingandmappingtheintegrationsitesofhivandotherretroviruses
AT guoshuang analyticalpipelineforidentifyingandmappingtheintegrationsitesofhivandotherretroviruses
AT shaowei analyticalpipelineforidentifyingandmappingtheintegrationsitesofhivandotherretroviruses
AT balemichaelj analyticalpipelineforidentifyingandmappingtheintegrationsitesofhivandotherretroviruses
AT coffinjohnm analyticalpipelineforidentifyingandmappingtheintegrationsitesofhivandotherretroviruses
AT hughesstephenh analyticalpipelineforidentifyingandmappingtheintegrationsitesofhivandotherretroviruses
AT wuxiaolin analyticalpipelineforidentifyingandmappingtheintegrationsitesofhivandotherretroviruses