Cargando…

A benchmark of transposon insertion detection tools using real data

BACKGROUND: Transposable elements (TEs) are an important source of genomic variability in eukaryotic genomes. Their activity impacts genome architecture and gene expression and can lead to drastic phenotypic changes. Therefore, identifying TE polymorphisms is key to better understand the link betwee...

Descripción completa

Detalles Bibliográficos
Autores principales: Vendrell-Mir, Pol, Barteri, Fabio, Merenciano, Miriam, González, Josefa, Casacuberta, Josep M., Castanera, Raúl
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6937713/
https://www.ncbi.nlm.nih.gov/pubmed/31892957
http://dx.doi.org/10.1186/s13100-019-0197-9
_version_ 1783483919486681088
author Vendrell-Mir, Pol
Barteri, Fabio
Merenciano, Miriam
González, Josefa
Casacuberta, Josep M.
Castanera, Raúl
author_facet Vendrell-Mir, Pol
Barteri, Fabio
Merenciano, Miriam
González, Josefa
Casacuberta, Josep M.
Castanera, Raúl
author_sort Vendrell-Mir, Pol
collection PubMed
description BACKGROUND: Transposable elements (TEs) are an important source of genomic variability in eukaryotic genomes. Their activity impacts genome architecture and gene expression and can lead to drastic phenotypic changes. Therefore, identifying TE polymorphisms is key to better understand the link between genotype and phenotype. However, most genotype-to-phenotype analyses have concentrated on single nucleotide polymorphisms as they are easier to reliable detect using short-read data. Many bioinformatic tools have been developed to identify transposon insertions from resequencing data using short reads. Nevertheless, the performance of most of these tools has been tested using simulated insertions, which do not accurately reproduce the complexity of natural insertions. RESULTS: We have overcome this limitation by building a dataset of insertions from the comparison of two high-quality rice genomes, followed by extensive manual curation. This dataset contains validated insertions of two very different types of TEs, LTR-retrotransposons and MITEs. Using this dataset, we have benchmarked the sensitivity and precision of 12 commonly used tools, and our results suggest that in general their sensitivity was previously overestimated when using simulated data. Our results also show that, increasing coverage leads to a better sensitivity but with a cost in precision. Moreover, we found important differences in tool performance, with some tools performing better on a specific type of TEs. We have also used two sets of experimentally validated insertions in Drosophila and humans and show that this trend is maintained in genomes of different size and complexity. CONCLUSIONS: We discuss the possible choice of tools depending on the goals of the study and show that the appropriate combination of tools could be an option for most approaches, increasing the sensitivity while maintaining a good precision.
format Online
Article
Text
id pubmed-6937713
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69377132019-12-31 A benchmark of transposon insertion detection tools using real data Vendrell-Mir, Pol Barteri, Fabio Merenciano, Miriam González, Josefa Casacuberta, Josep M. Castanera, Raúl Mob DNA Methodology BACKGROUND: Transposable elements (TEs) are an important source of genomic variability in eukaryotic genomes. Their activity impacts genome architecture and gene expression and can lead to drastic phenotypic changes. Therefore, identifying TE polymorphisms is key to better understand the link between genotype and phenotype. However, most genotype-to-phenotype analyses have concentrated on single nucleotide polymorphisms as they are easier to reliable detect using short-read data. Many bioinformatic tools have been developed to identify transposon insertions from resequencing data using short reads. Nevertheless, the performance of most of these tools has been tested using simulated insertions, which do not accurately reproduce the complexity of natural insertions. RESULTS: We have overcome this limitation by building a dataset of insertions from the comparison of two high-quality rice genomes, followed by extensive manual curation. This dataset contains validated insertions of two very different types of TEs, LTR-retrotransposons and MITEs. Using this dataset, we have benchmarked the sensitivity and precision of 12 commonly used tools, and our results suggest that in general their sensitivity was previously overestimated when using simulated data. Our results also show that, increasing coverage leads to a better sensitivity but with a cost in precision. Moreover, we found important differences in tool performance, with some tools performing better on a specific type of TEs. We have also used two sets of experimentally validated insertions in Drosophila and humans and show that this trend is maintained in genomes of different size and complexity. CONCLUSIONS: We discuss the possible choice of tools depending on the goals of the study and show that the appropriate combination of tools could be an option for most approaches, increasing the sensitivity while maintaining a good precision. BioMed Central 2019-12-30 /pmc/articles/PMC6937713/ /pubmed/31892957 http://dx.doi.org/10.1186/s13100-019-0197-9 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Vendrell-Mir, Pol
Barteri, Fabio
Merenciano, Miriam
González, Josefa
Casacuberta, Josep M.
Castanera, Raúl
A benchmark of transposon insertion detection tools using real data
title A benchmark of transposon insertion detection tools using real data
title_full A benchmark of transposon insertion detection tools using real data
title_fullStr A benchmark of transposon insertion detection tools using real data
title_full_unstemmed A benchmark of transposon insertion detection tools using real data
title_short A benchmark of transposon insertion detection tools using real data
title_sort benchmark of transposon insertion detection tools using real data
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6937713/
https://www.ncbi.nlm.nih.gov/pubmed/31892957
http://dx.doi.org/10.1186/s13100-019-0197-9
work_keys_str_mv AT vendrellmirpol abenchmarkoftransposoninsertiondetectiontoolsusingrealdata
AT barterifabio abenchmarkoftransposoninsertiondetectiontoolsusingrealdata
AT merencianomiriam abenchmarkoftransposoninsertiondetectiontoolsusingrealdata
AT gonzalezjosefa abenchmarkoftransposoninsertiondetectiontoolsusingrealdata
AT casacubertajosepm abenchmarkoftransposoninsertiondetectiontoolsusingrealdata
AT castaneraraul abenchmarkoftransposoninsertiondetectiontoolsusingrealdata
AT vendrellmirpol benchmarkoftransposoninsertiondetectiontoolsusingrealdata
AT barterifabio benchmarkoftransposoninsertiondetectiontoolsusingrealdata
AT merencianomiriam benchmarkoftransposoninsertiondetectiontoolsusingrealdata
AT gonzalezjosefa benchmarkoftransposoninsertiondetectiontoolsusingrealdata
AT casacubertajosepm benchmarkoftransposoninsertiondetectiontoolsusingrealdata
AT castaneraraul benchmarkoftransposoninsertiondetectiontoolsusingrealdata