Cargando…

HGT-ID: an efficient and sensitive workflow to detect human-viral insertion sites using next-generation sequencing data

BACKGROUND: Transfer of genetic material from microbes or viruses into the host genome is known as horizontal gene transfer (HGT). The integration of viruses into the human genome is associated with multiple cancers, and these can now be detected using next-generation sequencing methods such as whol...

Descripción completa

Detalles Bibliográficos
Autores principales: Baheti, Saurabh, Tang, Xiaojia, O’Brien, Daniel R., Chia, Nicholas, Roberts, Lewis R., Nelson, Heidi, Boughey, Judy C., Wang, Liewei, Goetz, Matthew P., Kocher, Jean-Pierre A., Kalari, Krishna R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6050683/
https://www.ncbi.nlm.nih.gov/pubmed/30016933
http://dx.doi.org/10.1186/s12859-018-2260-9
_version_ 1783340388644290560
author Baheti, Saurabh
Tang, Xiaojia
O’Brien, Daniel R.
Chia, Nicholas
Roberts, Lewis R.
Nelson, Heidi
Boughey, Judy C.
Wang, Liewei
Goetz, Matthew P.
Kocher, Jean-Pierre A.
Kalari, Krishna R.
author_facet Baheti, Saurabh
Tang, Xiaojia
O’Brien, Daniel R.
Chia, Nicholas
Roberts, Lewis R.
Nelson, Heidi
Boughey, Judy C.
Wang, Liewei
Goetz, Matthew P.
Kocher, Jean-Pierre A.
Kalari, Krishna R.
author_sort Baheti, Saurabh
collection PubMed
description BACKGROUND: Transfer of genetic material from microbes or viruses into the host genome is known as horizontal gene transfer (HGT). The integration of viruses into the human genome is associated with multiple cancers, and these can now be detected using next-generation sequencing methods such as whole genome sequencing and RNA-sequencing. RESULTS: We designed a novel computational workflow, HGT-ID, to identify the integration of viruses into the human genome using the sequencing data. The HGT-ID workflow primarily follows a four-step procedure: i) pre-processing of unaligned reads, ii) virus detection using subtraction approach, iii) identification of virus integration site using discordant and soft-clipped reads and iv) HGT candidates prioritization through a scoring function. Annotation and visualization of the events, as well as primer design for experimental validation, are also provided in the final report. We evaluated the tool performance with the well-understood cervical cancer samples. The HGT-ID workflow accurately detected known human papillomavirus (HPV) integration sites with high sensitivity and specificity compared to previous HGT methods. We applied HGT-ID to The Cancer Genome Atlas (TCGA) whole-genome sequencing data (WGS) from liver tumor-normal pairs. Multiple hepatitis B virus (HBV) integration sites were identified in TCGA liver samples and confirmed by HGT-ID using the RNA-Seq data from the matched liver pairs. This shows the applicability of the method in both the data types and cross-validation of the HGT events in liver samples. We also processed 220 breast tumor WGS data through the workflow; however, there were no HGT events detected in those samples. CONCLUSIONS: HGT-ID is a novel computational workflow to detect the integration of viruses in the human genome using the sequencing data. It is fast and accurate with functions such as prioritization, annotation, visualization and primer design for future validation of HGTs. The HGT-ID workflow is released under the MIT License and available at http://kalarikrlab.org/Software/HGT-ID.html.
format Online
Article
Text
id pubmed-6050683
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-60506832018-07-19 HGT-ID: an efficient and sensitive workflow to detect human-viral insertion sites using next-generation sequencing data Baheti, Saurabh Tang, Xiaojia O’Brien, Daniel R. Chia, Nicholas Roberts, Lewis R. Nelson, Heidi Boughey, Judy C. Wang, Liewei Goetz, Matthew P. Kocher, Jean-Pierre A. Kalari, Krishna R. BMC Bioinformatics Software BACKGROUND: Transfer of genetic material from microbes or viruses into the host genome is known as horizontal gene transfer (HGT). The integration of viruses into the human genome is associated with multiple cancers, and these can now be detected using next-generation sequencing methods such as whole genome sequencing and RNA-sequencing. RESULTS: We designed a novel computational workflow, HGT-ID, to identify the integration of viruses into the human genome using the sequencing data. The HGT-ID workflow primarily follows a four-step procedure: i) pre-processing of unaligned reads, ii) virus detection using subtraction approach, iii) identification of virus integration site using discordant and soft-clipped reads and iv) HGT candidates prioritization through a scoring function. Annotation and visualization of the events, as well as primer design for experimental validation, are also provided in the final report. We evaluated the tool performance with the well-understood cervical cancer samples. The HGT-ID workflow accurately detected known human papillomavirus (HPV) integration sites with high sensitivity and specificity compared to previous HGT methods. We applied HGT-ID to The Cancer Genome Atlas (TCGA) whole-genome sequencing data (WGS) from liver tumor-normal pairs. Multiple hepatitis B virus (HBV) integration sites were identified in TCGA liver samples and confirmed by HGT-ID using the RNA-Seq data from the matched liver pairs. This shows the applicability of the method in both the data types and cross-validation of the HGT events in liver samples. We also processed 220 breast tumor WGS data through the workflow; however, there were no HGT events detected in those samples. CONCLUSIONS: HGT-ID is a novel computational workflow to detect the integration of viruses in the human genome using the sequencing data. It is fast and accurate with functions such as prioritization, annotation, visualization and primer design for future validation of HGTs. The HGT-ID workflow is released under the MIT License and available at http://kalarikrlab.org/Software/HGT-ID.html. BioMed Central 2018-07-17 /pmc/articles/PMC6050683/ /pubmed/30016933 http://dx.doi.org/10.1186/s12859-018-2260-9 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Baheti, Saurabh
Tang, Xiaojia
O’Brien, Daniel R.
Chia, Nicholas
Roberts, Lewis R.
Nelson, Heidi
Boughey, Judy C.
Wang, Liewei
Goetz, Matthew P.
Kocher, Jean-Pierre A.
Kalari, Krishna R.
HGT-ID: an efficient and sensitive workflow to detect human-viral insertion sites using next-generation sequencing data
title HGT-ID: an efficient and sensitive workflow to detect human-viral insertion sites using next-generation sequencing data
title_full HGT-ID: an efficient and sensitive workflow to detect human-viral insertion sites using next-generation sequencing data
title_fullStr HGT-ID: an efficient and sensitive workflow to detect human-viral insertion sites using next-generation sequencing data
title_full_unstemmed HGT-ID: an efficient and sensitive workflow to detect human-viral insertion sites using next-generation sequencing data
title_short HGT-ID: an efficient and sensitive workflow to detect human-viral insertion sites using next-generation sequencing data
title_sort hgt-id: an efficient and sensitive workflow to detect human-viral insertion sites using next-generation sequencing data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6050683/
https://www.ncbi.nlm.nih.gov/pubmed/30016933
http://dx.doi.org/10.1186/s12859-018-2260-9
work_keys_str_mv AT bahetisaurabh hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata
AT tangxiaojia hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata
AT obriendanielr hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata
AT chianicholas hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata
AT robertslewisr hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata
AT nelsonheidi hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata
AT bougheyjudyc hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata
AT wangliewei hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata
AT goetzmatthewp hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata
AT kocherjeanpierrea hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata
AT kalarikrishnar hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata