Cargando…
HGT-ID: an efficient and sensitive workflow to detect human-viral insertion sites using next-generation sequencing data
BACKGROUND: Transfer of genetic material from microbes or viruses into the host genome is known as horizontal gene transfer (HGT). The integration of viruses into the human genome is associated with multiple cancers, and these can now be detected using next-generation sequencing methods such as whol...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6050683/ https://www.ncbi.nlm.nih.gov/pubmed/30016933 http://dx.doi.org/10.1186/s12859-018-2260-9 |
_version_ | 1783340388644290560 |
---|---|
author | Baheti, Saurabh Tang, Xiaojia O’Brien, Daniel R. Chia, Nicholas Roberts, Lewis R. Nelson, Heidi Boughey, Judy C. Wang, Liewei Goetz, Matthew P. Kocher, Jean-Pierre A. Kalari, Krishna R. |
author_facet | Baheti, Saurabh Tang, Xiaojia O’Brien, Daniel R. Chia, Nicholas Roberts, Lewis R. Nelson, Heidi Boughey, Judy C. Wang, Liewei Goetz, Matthew P. Kocher, Jean-Pierre A. Kalari, Krishna R. |
author_sort | Baheti, Saurabh |
collection | PubMed |
description | BACKGROUND: Transfer of genetic material from microbes or viruses into the host genome is known as horizontal gene transfer (HGT). The integration of viruses into the human genome is associated with multiple cancers, and these can now be detected using next-generation sequencing methods such as whole genome sequencing and RNA-sequencing. RESULTS: We designed a novel computational workflow, HGT-ID, to identify the integration of viruses into the human genome using the sequencing data. The HGT-ID workflow primarily follows a four-step procedure: i) pre-processing of unaligned reads, ii) virus detection using subtraction approach, iii) identification of virus integration site using discordant and soft-clipped reads and iv) HGT candidates prioritization through a scoring function. Annotation and visualization of the events, as well as primer design for experimental validation, are also provided in the final report. We evaluated the tool performance with the well-understood cervical cancer samples. The HGT-ID workflow accurately detected known human papillomavirus (HPV) integration sites with high sensitivity and specificity compared to previous HGT methods. We applied HGT-ID to The Cancer Genome Atlas (TCGA) whole-genome sequencing data (WGS) from liver tumor-normal pairs. Multiple hepatitis B virus (HBV) integration sites were identified in TCGA liver samples and confirmed by HGT-ID using the RNA-Seq data from the matched liver pairs. This shows the applicability of the method in both the data types and cross-validation of the HGT events in liver samples. We also processed 220 breast tumor WGS data through the workflow; however, there were no HGT events detected in those samples. CONCLUSIONS: HGT-ID is a novel computational workflow to detect the integration of viruses in the human genome using the sequencing data. It is fast and accurate with functions such as prioritization, annotation, visualization and primer design for future validation of HGTs. The HGT-ID workflow is released under the MIT License and available at http://kalarikrlab.org/Software/HGT-ID.html. |
format | Online Article Text |
id | pubmed-6050683 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-60506832018-07-19 HGT-ID: an efficient and sensitive workflow to detect human-viral insertion sites using next-generation sequencing data Baheti, Saurabh Tang, Xiaojia O’Brien, Daniel R. Chia, Nicholas Roberts, Lewis R. Nelson, Heidi Boughey, Judy C. Wang, Liewei Goetz, Matthew P. Kocher, Jean-Pierre A. Kalari, Krishna R. BMC Bioinformatics Software BACKGROUND: Transfer of genetic material from microbes or viruses into the host genome is known as horizontal gene transfer (HGT). The integration of viruses into the human genome is associated with multiple cancers, and these can now be detected using next-generation sequencing methods such as whole genome sequencing and RNA-sequencing. RESULTS: We designed a novel computational workflow, HGT-ID, to identify the integration of viruses into the human genome using the sequencing data. The HGT-ID workflow primarily follows a four-step procedure: i) pre-processing of unaligned reads, ii) virus detection using subtraction approach, iii) identification of virus integration site using discordant and soft-clipped reads and iv) HGT candidates prioritization through a scoring function. Annotation and visualization of the events, as well as primer design for experimental validation, are also provided in the final report. We evaluated the tool performance with the well-understood cervical cancer samples. The HGT-ID workflow accurately detected known human papillomavirus (HPV) integration sites with high sensitivity and specificity compared to previous HGT methods. We applied HGT-ID to The Cancer Genome Atlas (TCGA) whole-genome sequencing data (WGS) from liver tumor-normal pairs. Multiple hepatitis B virus (HBV) integration sites were identified in TCGA liver samples and confirmed by HGT-ID using the RNA-Seq data from the matched liver pairs. This shows the applicability of the method in both the data types and cross-validation of the HGT events in liver samples. We also processed 220 breast tumor WGS data through the workflow; however, there were no HGT events detected in those samples. CONCLUSIONS: HGT-ID is a novel computational workflow to detect the integration of viruses in the human genome using the sequencing data. It is fast and accurate with functions such as prioritization, annotation, visualization and primer design for future validation of HGTs. The HGT-ID workflow is released under the MIT License and available at http://kalarikrlab.org/Software/HGT-ID.html. BioMed Central 2018-07-17 /pmc/articles/PMC6050683/ /pubmed/30016933 http://dx.doi.org/10.1186/s12859-018-2260-9 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Baheti, Saurabh Tang, Xiaojia O’Brien, Daniel R. Chia, Nicholas Roberts, Lewis R. Nelson, Heidi Boughey, Judy C. Wang, Liewei Goetz, Matthew P. Kocher, Jean-Pierre A. Kalari, Krishna R. HGT-ID: an efficient and sensitive workflow to detect human-viral insertion sites using next-generation sequencing data |
title | HGT-ID: an efficient and sensitive workflow to detect human-viral insertion sites using next-generation sequencing data |
title_full | HGT-ID: an efficient and sensitive workflow to detect human-viral insertion sites using next-generation sequencing data |
title_fullStr | HGT-ID: an efficient and sensitive workflow to detect human-viral insertion sites using next-generation sequencing data |
title_full_unstemmed | HGT-ID: an efficient and sensitive workflow to detect human-viral insertion sites using next-generation sequencing data |
title_short | HGT-ID: an efficient and sensitive workflow to detect human-viral insertion sites using next-generation sequencing data |
title_sort | hgt-id: an efficient and sensitive workflow to detect human-viral insertion sites using next-generation sequencing data |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6050683/ https://www.ncbi.nlm.nih.gov/pubmed/30016933 http://dx.doi.org/10.1186/s12859-018-2260-9 |
work_keys_str_mv | AT bahetisaurabh hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata AT tangxiaojia hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata AT obriendanielr hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata AT chianicholas hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata AT robertslewisr hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata AT nelsonheidi hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata AT bougheyjudyc hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata AT wangliewei hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata AT goetzmatthewp hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata AT kocherjeanpierrea hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata AT kalarikrishnar hgtidanefficientandsensitiveworkflowtodetecthumanviralinsertionsitesusingnextgenerationsequencingdata |