Cargando…

An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data

There is a growing interest in the study of human endogenous retroviruses (HERVs) given the substantial body of evidence that implicates them in many human diseases. Although their genomic characterization presents numerous technical challenges, next-generation sequencing (NGS) has shown potential t...

Descripción completa

Detalles Bibliográficos
Autores principales: Bowles, Harry, Kabiljo, Renata, Al Khleifat, Ahmad, Jones, Ashley, Quinn, John P., Dobson, Richard J. B., Swanson, Chad M., Al-Chalabi, Ammar, Iacoangeli, Alfredo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9945273/
https://www.ncbi.nlm.nih.gov/pubmed/36845320
http://dx.doi.org/10.3389/fbinf.2022.1062328
_version_ 1784892104940453888
author Bowles, Harry
Kabiljo, Renata
Al Khleifat, Ahmad
Jones, Ashley
Quinn, John P.
Dobson, Richard J. B.
Swanson, Chad M.
Al-Chalabi, Ammar
Iacoangeli, Alfredo
author_facet Bowles, Harry
Kabiljo, Renata
Al Khleifat, Ahmad
Jones, Ashley
Quinn, John P.
Dobson, Richard J. B.
Swanson, Chad M.
Al-Chalabi, Ammar
Iacoangeli, Alfredo
author_sort Bowles, Harry
collection PubMed
description There is a growing interest in the study of human endogenous retroviruses (HERVs) given the substantial body of evidence that implicates them in many human diseases. Although their genomic characterization presents numerous technical challenges, next-generation sequencing (NGS) has shown potential to detect HERV insertions and their polymorphisms in humans. Currently, a number of computational tools to detect them in short-read NGS data exist. In order to design optimal analysis pipelines, an independent evaluation of the available tools is required. We evaluated the performance of a set of such tools using a variety of experimental designs and datasets. These included 50 human short-read whole-genome sequencing samples, matching long and short-read sequencing data, and simulated short-read NGS data. Our results highlight a great performance variability of the tools across the datasets and suggest that different tools might be suitable for different study designs. However, specialized tools designed to detect exclusively human endogenous retroviruses consistently outperformed generalist tools that detect a wider range of transposable elements. We suggest that, if sufficient computing resources are available, using multiple HERV detection tools to obtain a consensus set of insertion loci may be ideal. Furthermore, given that the false positive discovery rate of the tools varied between 8% and 55% across tools and datasets, we recommend the wet lab validation of predicted insertions if DNA samples are available.
format Online
Article
Text
id pubmed-9945273
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-99452732023-02-23 An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data Bowles, Harry Kabiljo, Renata Al Khleifat, Ahmad Jones, Ashley Quinn, John P. Dobson, Richard J. B. Swanson, Chad M. Al-Chalabi, Ammar Iacoangeli, Alfredo Front Bioinform Bioinformatics There is a growing interest in the study of human endogenous retroviruses (HERVs) given the substantial body of evidence that implicates them in many human diseases. Although their genomic characterization presents numerous technical challenges, next-generation sequencing (NGS) has shown potential to detect HERV insertions and their polymorphisms in humans. Currently, a number of computational tools to detect them in short-read NGS data exist. In order to design optimal analysis pipelines, an independent evaluation of the available tools is required. We evaluated the performance of a set of such tools using a variety of experimental designs and datasets. These included 50 human short-read whole-genome sequencing samples, matching long and short-read sequencing data, and simulated short-read NGS data. Our results highlight a great performance variability of the tools across the datasets and suggest that different tools might be suitable for different study designs. However, specialized tools designed to detect exclusively human endogenous retroviruses consistently outperformed generalist tools that detect a wider range of transposable elements. We suggest that, if sufficient computing resources are available, using multiple HERV detection tools to obtain a consensus set of insertion loci may be ideal. Furthermore, given that the false positive discovery rate of the tools varied between 8% and 55% across tools and datasets, we recommend the wet lab validation of predicted insertions if DNA samples are available. Frontiers Media S.A. 2023-02-08 /pmc/articles/PMC9945273/ /pubmed/36845320 http://dx.doi.org/10.3389/fbinf.2022.1062328 Text en Copyright © 2023 Bowles, Kabiljo, Al Khleifat, Jones, Quinn, Dobson, Swanson, Al-Chalabi and Iacoangeli. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Bioinformatics
Bowles, Harry
Kabiljo, Renata
Al Khleifat, Ahmad
Jones, Ashley
Quinn, John P.
Dobson, Richard J. B.
Swanson, Chad M.
Al-Chalabi, Ammar
Iacoangeli, Alfredo
An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data
title An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data
title_full An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data
title_fullStr An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data
title_full_unstemmed An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data
title_short An assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data
title_sort assessment of bioinformatics tools for the detection of human endogenous retroviral insertions in short-read genome sequencing data
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9945273/
https://www.ncbi.nlm.nih.gov/pubmed/36845320
http://dx.doi.org/10.3389/fbinf.2022.1062328
work_keys_str_mv AT bowlesharry anassessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata
AT kabiljorenata anassessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata
AT alkhleifatahmad anassessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata
AT jonesashley anassessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata
AT quinnjohnp anassessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata
AT dobsonrichardjb anassessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata
AT swansonchadm anassessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata
AT alchalabiammar anassessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata
AT iacoangelialfredo anassessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata
AT bowlesharry assessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata
AT kabiljorenata assessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata
AT alkhleifatahmad assessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata
AT jonesashley assessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata
AT quinnjohnp assessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata
AT dobsonrichardjb assessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata
AT swansonchadm assessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata
AT alchalabiammar assessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata
AT iacoangelialfredo assessmentofbioinformaticstoolsforthedetectionofhumanendogenousretroviralinsertionsinshortreadgenomesequencingdata