Cargando…
An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data
There is a growing interest in the study of human endogenous retroviruses (HERVs) given the substantial body of evidence that implicates them in many human diseases. Although their genomic characterization presents numerous technical challenges, next-generation sequencing (NGS) has shown potential t...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9945273/ https://www.ncbi.nlm.nih.gov/pubmed/36845320 http://dx.doi.org/10.3389/fbinf.2022.1062328 |
_version_ | 1784892104940453888 |
---|---|
author | Bowles, Harry Kabiljo, Renata Al Khleifat, Ahmad Jones, Ashley Quinn, John P. Dobson, Richard J. B. Swanson, Chad M. Al-Chalabi, Ammar Iacoangeli, Alfredo |
author_facet | Bowles, Harry Kabiljo, Renata Al Khleifat, Ahmad Jones, Ashley Quinn, John P. Dobson, Richard J. B. Swanson, Chad M. Al-Chalabi, Ammar Iacoangeli, Alfredo |
author_sort | Bowles, Harry |
collection | PubMed |
description | There is a growing interest in the study of human endogenous retroviruses (HERVs) given the substantial body of evidence that implicates them in many human diseases. Although their genomic characterization presents numerous technical challenges, next-generation sequencing (NGS) has shown potential to detect HERV insertions and their polymorphisms in humans. Currently, a number of computational tools to detect them in short-read NGS data exist. In order to design optimal analysis pipelines, an independent evaluation of the available tools is required. We evaluated the performance of a set of such tools using a variety of experimental designs and datasets. These included 50 human short-read whole-genome sequencing samples, matching long and short-read sequencing data, and simulated short-read NGS data. Our results highlight a great performance variability of the tools across the datasets and suggest that different tools might be suitable for different study designs. However, specialized tools designed to detect exclusively human endogenous retroviruses consistently outperformed generalist tools that detect a wider range of transposable elements. We suggest that, if sufficient computing resources are available, using multiple HERV detection tools to obtain a consensus set of insertion loci may be ideal. Furthermore, given that the false positive discovery rate of the tools varied between 8% and 55% across tools and datasets, we recommend the wet lab validation of predicted insertions if DNA samples are available. |
format | Online Article Text |
id | pubmed-9945273 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-99452732023-02-23 An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data Bowles, Harry Kabiljo, Renata Al Khleifat, Ahmad Jones, Ashley Quinn, John P. Dobson, Richard J. B. Swanson, Chad M. Al-Chalabi, Ammar Iacoangeli, Alfredo Front Bioinform Bioinformatics There is a growing interest in the study of human endogenous retroviruses (HERVs) given the substantial body of evidence that implicates them in many human diseases. Although their genomic characterization presents numerous technical challenges, next-generation sequencing (NGS) has shown potential to detect HERV insertions and their polymorphisms in humans. Currently, a number of computational tools to detect them in short-read NGS data exist. In order to design optimal analysis pipelines, an independent evaluation of the available tools is required. We evaluated the performance of a set of such tools using a variety of experimental designs and datasets. These included 50 human short-read whole-genome sequencing samples, matching long and short-read sequencing data, and simulated short-read NGS data. Our results highlight a great performance variability of the tools across the datasets and suggest that different tools might be suitable for different study designs. However, specialized tools designed to detect exclusively human endogenous retroviruses consistently outperformed generalist tools that detect a wider range of transposable elements. We suggest that, if sufficient computing resources are available, using multiple HERV detection tools to obtain a consensus set of insertion loci may be ideal. Furthermore, given that the false positive discovery rate of the tools varied between 8% and 55% across tools and datasets, we recommend the wet lab validation of predicted insertions if DNA samples are available. Frontiers Media S.A. 2023-02-08 /pmc/articles/PMC9945273/ /pubmed/36845320 http://dx.doi.org/10.3389/fbinf.2022.1062328 Text en Copyright © 2023 Bowles, Kabiljo, Al Khleifat, Jones, Quinn, Dobson, Swanson, Al-Chalabi and Iacoangeli. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Bioinformatics Bowles, Harry Kabiljo, Renata Al Khleifat, Ahmad Jones, Ashley Quinn, John P. Dobson, Richard J. B. Swanson, Chad M. Al-Chalabi, Ammar Iacoangeli, Alfredo An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data |
title | An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data |
title_full | An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data |
title_fullStr | An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data |
title_full_unstemmed | An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data |
title_short | An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data |
title_sort | assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9945273/ https://www.ncbi.nlm.nih.gov/pubmed/36845320 http://dx.doi.org/10.3389/fbinf.2022.1062328 |
work_keys_str_mv | AT bowlesharry anassessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata AT kabiljorenata anassessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata AT alkhleifatahmad anassessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata AT jonesashley anassessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata AT quinnjohnp anassessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata AT dobsonrichardjb anassessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata AT swansonchadm anassessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata AT alchalabiammar anassessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata AT iacoangelialfredo anassessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata AT bowlesharry assessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata AT kabiljorenata assessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata AT alkhleifatahmad assessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata AT jonesashley assessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata AT quinnjohnp assessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata AT dobsonrichardjb assessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata AT swansonchadm assessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata AT alchalabiammar assessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata AT iacoangelialfredo assessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata |