Cargando…

Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data

BACKGROUND: SNP arrays, short- and long-read genome sequencing are genome-wide high-throughput technologies that may be used to assay copy number variants (CNVs) in a personal genome. Each of these technologies comes with its own limitations and biases, many of which are well-known, but not all of t...

Descripción completa

Detalles Bibliográficos
Autores principales: Lavrichenko, Ksenia, Johansson, Stefan, Jonassen, Inge
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8596897/
https://www.ncbi.nlm.nih.gov/pubmed/34789167
http://dx.doi.org/10.1186/s12864-021-08082-3
_version_ 1784600492562710528
author Lavrichenko, Ksenia
Johansson, Stefan
Jonassen, Inge
author_facet Lavrichenko, Ksenia
Johansson, Stefan
Jonassen, Inge
author_sort Lavrichenko, Ksenia
collection PubMed
description BACKGROUND: SNP arrays, short- and long-read genome sequencing are genome-wide high-throughput technologies that may be used to assay copy number variants (CNVs) in a personal genome. Each of these technologies comes with its own limitations and biases, many of which are well-known, but not all of them are thoroughly quantified. RESULTS: We assembled an ensemble of public datasets of published CNV calls and raw data for the well-studied Genome in a Bottle individual NA12878. This assembly represents a variety of methods and pipelines used for CNV calling from array, short- and long-read technologies. We then performed cross-technology comparisons regarding their ability to call CNVs. Different from other studies, we refrained from using the golden standard. Instead, we attempted to validate the CNV calls by the raw data of each technology. CONCLUSIONS: Our study confirms that long-read platforms enable recalling CNVs in genomic regions inaccessible to arrays or short reads. We also found that the reproducibility of a CNV by different pipelines within each technology is strongly linked to other CNV evidence measures. Importantly, the three technologies show distinct public database frequency profiles, which differ depending on what technology the database was built on. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12864-021-08082-3).
format Online
Article
Text
id pubmed-8596897
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-85968972021-11-17 Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data Lavrichenko, Ksenia Johansson, Stefan Jonassen, Inge BMC Genomics Research BACKGROUND: SNP arrays, short- and long-read genome sequencing are genome-wide high-throughput technologies that may be used to assay copy number variants (CNVs) in a personal genome. Each of these technologies comes with its own limitations and biases, many of which are well-known, but not all of them are thoroughly quantified. RESULTS: We assembled an ensemble of public datasets of published CNV calls and raw data for the well-studied Genome in a Bottle individual NA12878. This assembly represents a variety of methods and pipelines used for CNV calling from array, short- and long-read technologies. We then performed cross-technology comparisons regarding their ability to call CNVs. Different from other studies, we refrained from using the golden standard. Instead, we attempted to validate the CNV calls by the raw data of each technology. CONCLUSIONS: Our study confirms that long-read platforms enable recalling CNVs in genomic regions inaccessible to arrays or short reads. We also found that the reproducibility of a CNV by different pipelines within each technology is strongly linked to other CNV evidence measures. Importantly, the three technologies show distinct public database frequency profiles, which differ depending on what technology the database was built on. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12864-021-08082-3). BioMed Central 2021-11-17 /pmc/articles/PMC8596897/ /pubmed/34789167 http://dx.doi.org/10.1186/s12864-021-08082-3 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Lavrichenko, Ksenia
Johansson, Stefan
Jonassen, Inge
Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data
title Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data
title_full Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data
title_fullStr Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data
title_full_unstemmed Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data
title_short Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data
title_sort comprehensive characterization of copy number variation (cnv) called from array, long- and short-read data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8596897/
https://www.ncbi.nlm.nih.gov/pubmed/34789167
http://dx.doi.org/10.1186/s12864-021-08082-3
work_keys_str_mv AT lavrichenkoksenia comprehensivecharacterizationofcopynumbervariationcnvcalledfromarraylongandshortreaddata
AT johanssonstefan comprehensivecharacterizationofcopynumbervariationcnvcalledfromarraylongandshortreaddata
AT jonasseninge comprehensivecharacterizationofcopynumbervariationcnvcalledfromarraylongandshortreaddata