Cargando…

Hecaton: reliably detecting copy number variation in plant genomes using short read sequencing data

BACKGROUND: Copy number variation (CNV) is thought to actively contribute to adaptive evolution of plant species. While many computational algorithms are available to detect copy number variation from whole genome sequencing datasets, the typical complexity of plant data likely introduces false posi...

Descripción completa

Detalles Bibliográficos
Autores principales: Wijfjes, Raúl Y., Smit, Sandra, de Ridder, Dick
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6836508/
https://www.ncbi.nlm.nih.gov/pubmed/31699036
http://dx.doi.org/10.1186/s12864-019-6153-8
_version_ 1783466922315087872
author Wijfjes, Raúl Y.
Smit, Sandra
de Ridder, Dick
author_facet Wijfjes, Raúl Y.
Smit, Sandra
de Ridder, Dick
author_sort Wijfjes, Raúl Y.
collection PubMed
description BACKGROUND: Copy number variation (CNV) is thought to actively contribute to adaptive evolution of plant species. While many computational algorithms are available to detect copy number variation from whole genome sequencing datasets, the typical complexity of plant data likely introduces false positive calls. RESULTS: To enable reliable and comprehensive detection of CNV in plant genomes, we developed Hecaton, a novel computational workflow tailored to plants, that integrates calls from multiple state-of-the-art algorithms through a machine-learning approach. In this paper, we demonstrate that Hecaton outperforms current methods when applied to short read sequencing data of Arabidopsis thaliana, rice, maize, and tomato. Moreover, it correctly detects dispersed duplications, a type of CNV commonly found in plant species, in contrast to several state-of-the-art tools that erroneously represent this type of CNV as overlapping deletions and tandem duplications. Finally, Hecaton scales well in terms of memory usage and running time when applied to short read datasets of domesticated and wild tomato accessions. CONCLUSIONS: Hecaton provides a robust method to detect CNV in plants. We expect it to be of immediate interest to both applied and fundamental research on the relationship between genotype and phenotype in plants.
format Online
Article
Text
id pubmed-6836508
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68365082019-11-12 Hecaton: reliably detecting copy number variation in plant genomes using short read sequencing data Wijfjes, Raúl Y. Smit, Sandra de Ridder, Dick BMC Genomics Software BACKGROUND: Copy number variation (CNV) is thought to actively contribute to adaptive evolution of plant species. While many computational algorithms are available to detect copy number variation from whole genome sequencing datasets, the typical complexity of plant data likely introduces false positive calls. RESULTS: To enable reliable and comprehensive detection of CNV in plant genomes, we developed Hecaton, a novel computational workflow tailored to plants, that integrates calls from multiple state-of-the-art algorithms through a machine-learning approach. In this paper, we demonstrate that Hecaton outperforms current methods when applied to short read sequencing data of Arabidopsis thaliana, rice, maize, and tomato. Moreover, it correctly detects dispersed duplications, a type of CNV commonly found in plant species, in contrast to several state-of-the-art tools that erroneously represent this type of CNV as overlapping deletions and tandem duplications. Finally, Hecaton scales well in terms of memory usage and running time when applied to short read datasets of domesticated and wild tomato accessions. CONCLUSIONS: Hecaton provides a robust method to detect CNV in plants. We expect it to be of immediate interest to both applied and fundamental research on the relationship between genotype and phenotype in plants. BioMed Central 2019-11-07 /pmc/articles/PMC6836508/ /pubmed/31699036 http://dx.doi.org/10.1186/s12864-019-6153-8 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Wijfjes, Raúl Y.
Smit, Sandra
de Ridder, Dick
Hecaton: reliably detecting copy number variation in plant genomes using short read sequencing data
title Hecaton: reliably detecting copy number variation in plant genomes using short read sequencing data
title_full Hecaton: reliably detecting copy number variation in plant genomes using short read sequencing data
title_fullStr Hecaton: reliably detecting copy number variation in plant genomes using short read sequencing data
title_full_unstemmed Hecaton: reliably detecting copy number variation in plant genomes using short read sequencing data
title_short Hecaton: reliably detecting copy number variation in plant genomes using short read sequencing data
title_sort hecaton: reliably detecting copy number variation in plant genomes using short read sequencing data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6836508/
https://www.ncbi.nlm.nih.gov/pubmed/31699036
http://dx.doi.org/10.1186/s12864-019-6153-8
work_keys_str_mv AT wijfjesrauly hecatonreliablydetectingcopynumbervariationinplantgenomesusingshortreadsequencingdata
AT smitsandra hecatonreliablydetectingcopynumbervariationinplantgenomesusingshortreadsequencingdata
AT deridderdick hecatonreliablydetectingcopynumbervariationinplantgenomesusingshortreadsequencingdata