Cargando…
Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data
BACKGROUND: SNP arrays, short- and long-read genome sequencing are genome-wide high-throughput technologies that may be used to assay copy number variants (CNVs) in a personal genome. Each of these technologies comes with its own limitations and biases, many of which are well-known, but not all of t...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8596897/ https://www.ncbi.nlm.nih.gov/pubmed/34789167 http://dx.doi.org/10.1186/s12864-021-08082-3 |
_version_ | 1784600492562710528 |
---|---|
author | Lavrichenko, Ksenia Johansson, Stefan Jonassen, Inge |
author_facet | Lavrichenko, Ksenia Johansson, Stefan Jonassen, Inge |
author_sort | Lavrichenko, Ksenia |
collection | PubMed |
description | BACKGROUND: SNP arrays, short- and long-read genome sequencing are genome-wide high-throughput technologies that may be used to assay copy number variants (CNVs) in a personal genome. Each of these technologies comes with its own limitations and biases, many of which are well-known, but not all of them are thoroughly quantified. RESULTS: We assembled an ensemble of public datasets of published CNV calls and raw data for the well-studied Genome in a Bottle individual NA12878. This assembly represents a variety of methods and pipelines used for CNV calling from array, short- and long-read technologies. We then performed cross-technology comparisons regarding their ability to call CNVs. Different from other studies, we refrained from using the golden standard. Instead, we attempted to validate the CNV calls by the raw data of each technology. CONCLUSIONS: Our study confirms that long-read platforms enable recalling CNVs in genomic regions inaccessible to arrays or short reads. We also found that the reproducibility of a CNV by different pipelines within each technology is strongly linked to other CNV evidence measures. Importantly, the three technologies show distinct public database frequency profiles, which differ depending on what technology the database was built on. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12864-021-08082-3). |
format | Online Article Text |
id | pubmed-8596897 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-85968972021-11-17 Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data Lavrichenko, Ksenia Johansson, Stefan Jonassen, Inge BMC Genomics Research BACKGROUND: SNP arrays, short- and long-read genome sequencing are genome-wide high-throughput technologies that may be used to assay copy number variants (CNVs) in a personal genome. Each of these technologies comes with its own limitations and biases, many of which are well-known, but not all of them are thoroughly quantified. RESULTS: We assembled an ensemble of public datasets of published CNV calls and raw data for the well-studied Genome in a Bottle individual NA12878. This assembly represents a variety of methods and pipelines used for CNV calling from array, short- and long-read technologies. We then performed cross-technology comparisons regarding their ability to call CNVs. Different from other studies, we refrained from using the golden standard. Instead, we attempted to validate the CNV calls by the raw data of each technology. CONCLUSIONS: Our study confirms that long-read platforms enable recalling CNVs in genomic regions inaccessible to arrays or short reads. We also found that the reproducibility of a CNV by different pipelines within each technology is strongly linked to other CNV evidence measures. Importantly, the three technologies show distinct public database frequency profiles, which differ depending on what technology the database was built on. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1186/s12864-021-08082-3). BioMed Central 2021-11-17 /pmc/articles/PMC8596897/ /pubmed/34789167 http://dx.doi.org/10.1186/s12864-021-08082-3 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Lavrichenko, Ksenia Johansson, Stefan Jonassen, Inge Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data |
title | Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data |
title_full | Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data |
title_fullStr | Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data |
title_full_unstemmed | Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data |
title_short | Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data |
title_sort | comprehensive characterization of copy number variation (cnv) called from array, long- and short-read data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8596897/ https://www.ncbi.nlm.nih.gov/pubmed/34789167 http://dx.doi.org/10.1186/s12864-021-08082-3 |
work_keys_str_mv | AT lavrichenkoksenia comprehensivecharacterizationofcopynumbervariationcnvcalledfromarraylongandshortreaddata AT johanssonstefan comprehensivecharacterizationofcopynumbervariationcnvcalledfromarraylongandshortreaddata AT jonasseninge comprehensivecharacterizationofcopynumbervariationcnvcalledfromarraylongandshortreaddata |