Cargando…

Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data

BACKGROUND: Detection of copy number variations (CNVs) from high-throughput next-generation whole-genome sequencing (WGS) data has become a widely used research method during the recent years. However, only a little is known about the applicability of the developed algorithms to ultra-low-coverage (...

Descripción completa

Detalles Bibliográficos
Autores principales: Smolander, Johannes, Khan, Sofia, Singaravelu, Kalaimathy, Kauko, Leni, Lund, Riikka J., Laiho, Asta, Elo, Laura L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8130438/
https://www.ncbi.nlm.nih.gov/pubmed/34000988
http://dx.doi.org/10.1186/s12864-021-07686-z
_version_ 1783694529204846592
author Smolander, Johannes
Khan, Sofia
Singaravelu, Kalaimathy
Kauko, Leni
Lund, Riikka J.
Laiho, Asta
Elo, Laura L.
author_facet Smolander, Johannes
Khan, Sofia
Singaravelu, Kalaimathy
Kauko, Leni
Lund, Riikka J.
Laiho, Asta
Elo, Laura L.
author_sort Smolander, Johannes
collection PubMed
description BACKGROUND: Detection of copy number variations (CNVs) from high-throughput next-generation whole-genome sequencing (WGS) data has become a widely used research method during the recent years. However, only a little is known about the applicability of the developed algorithms to ultra-low-coverage (0.0005–0.8×) data that is used in various research and clinical applications, such as digital karyotyping and single-cell CNV detection. RESULT: Here, the performance of six popular read-depth based CNV detection algorithms (BIC-seq2, Canvas, CNVnator, FREEC, HMMcopy, and QDNAseq) was studied using ultra-low-coverage WGS data. Real-world array- and karyotyping kit-based validation were used as a benchmark in the evaluation. Additionally, ultra-low-coverage WGS data was simulated to investigate the ability of the algorithms to identify CNVs in the sex chromosomes and the theoretical minimum coverage at which these tools can accurately function. Our results suggest that while all the methods were able to detect large CNVs, many methods were susceptible to producing false positives when smaller CNVs (< 2 Mbp) were detected. There was also significant variability in their ability to identify CNVs in the sex chromosomes. Overall, BIC-seq2 was found to be the best method in terms of statistical performance. However, its significant drawback was by far the slowest runtime among the methods (> 3 h) compared with FREEC (~ 3 min), which we considered the second-best method. CONCLUSIONS: Our comparative analysis demonstrates that CNV detection from ultra-low-coverage WGS data can be a highly accurate method for the detection of large copy number variations when their length is in millions of base pairs. These findings facilitate applications that utilize ultra-low-coverage CNV detection. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-021-07686-z.
format Online
Article
Text
id pubmed-8130438
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-81304382021-05-19 Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data Smolander, Johannes Khan, Sofia Singaravelu, Kalaimathy Kauko, Leni Lund, Riikka J. Laiho, Asta Elo, Laura L. BMC Genomics Research Article BACKGROUND: Detection of copy number variations (CNVs) from high-throughput next-generation whole-genome sequencing (WGS) data has become a widely used research method during the recent years. However, only a little is known about the applicability of the developed algorithms to ultra-low-coverage (0.0005–0.8×) data that is used in various research and clinical applications, such as digital karyotyping and single-cell CNV detection. RESULT: Here, the performance of six popular read-depth based CNV detection algorithms (BIC-seq2, Canvas, CNVnator, FREEC, HMMcopy, and QDNAseq) was studied using ultra-low-coverage WGS data. Real-world array- and karyotyping kit-based validation were used as a benchmark in the evaluation. Additionally, ultra-low-coverage WGS data was simulated to investigate the ability of the algorithms to identify CNVs in the sex chromosomes and the theoretical minimum coverage at which these tools can accurately function. Our results suggest that while all the methods were able to detect large CNVs, many methods were susceptible to producing false positives when smaller CNVs (< 2 Mbp) were detected. There was also significant variability in their ability to identify CNVs in the sex chromosomes. Overall, BIC-seq2 was found to be the best method in terms of statistical performance. However, its significant drawback was by far the slowest runtime among the methods (> 3 h) compared with FREEC (~ 3 min), which we considered the second-best method. CONCLUSIONS: Our comparative analysis demonstrates that CNV detection from ultra-low-coverage WGS data can be a highly accurate method for the detection of large copy number variations when their length is in millions of base pairs. These findings facilitate applications that utilize ultra-low-coverage CNV detection. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-021-07686-z. BioMed Central 2021-05-17 /pmc/articles/PMC8130438/ /pubmed/34000988 http://dx.doi.org/10.1186/s12864-021-07686-z Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Smolander, Johannes
Khan, Sofia
Singaravelu, Kalaimathy
Kauko, Leni
Lund, Riikka J.
Laiho, Asta
Elo, Laura L.
Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data
title Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data
title_full Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data
title_fullStr Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data
title_full_unstemmed Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data
title_short Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data
title_sort evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8130438/
https://www.ncbi.nlm.nih.gov/pubmed/34000988
http://dx.doi.org/10.1186/s12864-021-07686-z
work_keys_str_mv AT smolanderjohannes evaluationoftoolsforidentifyinglargecopynumbervariationsfromultralowcoveragewholegenomesequencingdata
AT khansofia evaluationoftoolsforidentifyinglargecopynumbervariationsfromultralowcoveragewholegenomesequencingdata
AT singaravelukalaimathy evaluationoftoolsforidentifyinglargecopynumbervariationsfromultralowcoveragewholegenomesequencingdata
AT kaukoleni evaluationoftoolsforidentifyinglargecopynumbervariationsfromultralowcoveragewholegenomesequencingdata
AT lundriikkaj evaluationoftoolsforidentifyinglargecopynumbervariationsfromultralowcoveragewholegenomesequencingdata
AT laihoasta evaluationoftoolsforidentifyinglargecopynumbervariationsfromultralowcoveragewholegenomesequencingdata
AT elolaural evaluationoftoolsforidentifyinglargecopynumbervariationsfromultralowcoveragewholegenomesequencingdata