Cargando…

Benchmarking datasets for assembly-based variant calling using high-fidelity long reads

BACKGROUND: Recent advances in long-read sequencing technologies have enabled accurate identification of all genetic variants in individuals or cells; this procedure is known as variant calling. However, benchmarking studies on variant calling using different long-read sequencing technologies are st...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lee, Hyunji, Kim, Jun, Lee, Junho
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10045170/ https://www.ncbi.nlm.nih.gov/pubmed/36973656 http://dx.doi.org/10.1186/s12864-023-09255-y

_version_	1784913535813287936
author	Lee, Hyunji Kim, Jun Lee, Junho
author_facet	Lee, Hyunji Kim, Jun Lee, Junho
author_sort	Lee, Hyunji
collection	PubMed
description	BACKGROUND: Recent advances in long-read sequencing technologies have enabled accurate identification of all genetic variants in individuals or cells; this procedure is known as variant calling. However, benchmarking studies on variant calling using different long-read sequencing technologies are still lacking. RESULTS: We used two Caenorhabditis elegans strains to measure several variant calling metrics. These two strains shared true-positive genetic variants that were introduced during strain generation. In addition, both strains contained common and distinguishable variants induced by DNA damage, possibly leading to false-positive estimation. We obtained accurate and noisy long reads from both strains using high-fidelity (HiFi) and continuous long-read (CLR) sequencing platforms, and compared the variant calling performance of the two platforms. HiFi identified a 1.65-fold higher number of true-positive variants on average, with 60% fewer false-positive variants, than CLR did. We also compared read-based and assembly-based variant calling methods in combination with subsampling of various sequencing depths and demonstrated that variant calling after genome assembly was particularly effective for detection of large insertions, even with 10 × sequencing depth of accurate long-read sequencing data. CONCLUSIONS: By directly comparing the two long-read sequencing technologies, we demonstrated that variant calling after genome assembly with 10 × or more depth of accurate long-read sequencing data allowed reliable detection of true-positive variants. Considering the high cost of HiFi sequencing, we herein propose appropriate methodologies for performing cost-effective and high-quality variant calling: 10 × assembly-based variant calling. The results of the present study may facilitate the development of methods for identifying all genetic variants at the population level. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-023-09255-y.
format	Online Article Text
id	pubmed-10045170
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-100451702023-03-29 Benchmarking datasets for assembly-based variant calling using high-fidelity long reads Lee, Hyunji Kim, Jun Lee, Junho BMC Genomics Research BACKGROUND: Recent advances in long-read sequencing technologies have enabled accurate identification of all genetic variants in individuals or cells; this procedure is known as variant calling. However, benchmarking studies on variant calling using different long-read sequencing technologies are still lacking. RESULTS: We used two Caenorhabditis elegans strains to measure several variant calling metrics. These two strains shared true-positive genetic variants that were introduced during strain generation. In addition, both strains contained common and distinguishable variants induced by DNA damage, possibly leading to false-positive estimation. We obtained accurate and noisy long reads from both strains using high-fidelity (HiFi) and continuous long-read (CLR) sequencing platforms, and compared the variant calling performance of the two platforms. HiFi identified a 1.65-fold higher number of true-positive variants on average, with 60% fewer false-positive variants, than CLR did. We also compared read-based and assembly-based variant calling methods in combination with subsampling of various sequencing depths and demonstrated that variant calling after genome assembly was particularly effective for detection of large insertions, even with 10 × sequencing depth of accurate long-read sequencing data. CONCLUSIONS: By directly comparing the two long-read sequencing technologies, we demonstrated that variant calling after genome assembly with 10 × or more depth of accurate long-read sequencing data allowed reliable detection of true-positive variants. Considering the high cost of HiFi sequencing, we herein propose appropriate methodologies for performing cost-effective and high-quality variant calling: 10 × assembly-based variant calling. The results of the present study may facilitate the development of methods for identifying all genetic variants at the population level. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-023-09255-y. BioMed Central 2023-03-27 /pmc/articles/PMC10045170/ /pubmed/36973656 http://dx.doi.org/10.1186/s12864-023-09255-y Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Lee, Hyunji Kim, Jun Lee, Junho Benchmarking datasets for assembly-based variant calling using high-fidelity long reads
title	Benchmarking datasets for assembly-based variant calling using high-fidelity long reads
title_full	Benchmarking datasets for assembly-based variant calling using high-fidelity long reads
title_fullStr	Benchmarking datasets for assembly-based variant calling using high-fidelity long reads
title_full_unstemmed	Benchmarking datasets for assembly-based variant calling using high-fidelity long reads
title_short	Benchmarking datasets for assembly-based variant calling using high-fidelity long reads
title_sort	benchmarking datasets for assembly-based variant calling using high-fidelity long reads
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10045170/ https://www.ncbi.nlm.nih.gov/pubmed/36973656 http://dx.doi.org/10.1186/s12864-023-09255-y
work_keys_str_mv	AT leehyunji benchmarkingdatasetsforassemblybasedvariantcallingusinghighfidelitylongreads AT kimjun benchmarkingdatasetsforassemblybasedvariantcallingusinghighfidelitylongreads AT leejunho benchmarkingdatasetsforassemblybasedvariantcallingusinghighfidelitylongreads

Benchmarking datasets for assembly-based variant calling using high-fidelity long reads

Ejemplares similares