Cargando…

Boosting variant-calling performance with multi-platform sequencing data using Clair3-MP

BACKGROUND: With the continuous advances in third-generation sequencing technology and the increasing affordability of next-generation sequencing technology, sequencing data from different sequencing technology platforms is becoming more common. While numerous benchmarking studies have been conducte...

Descripción completa

Detalles Bibliográficos
Autores principales: Yu, Huijing, Zheng, Zhenxian, Su, Junhao, Lam, Tak-Wah, Luo, Ruibang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10401749/
https://www.ncbi.nlm.nih.gov/pubmed/37537536
http://dx.doi.org/10.1186/s12859-023-05434-6
_version_ 1785084730265305088
author Yu, Huijing
Zheng, Zhenxian
Su, Junhao
Lam, Tak-Wah
Luo, Ruibang
author_facet Yu, Huijing
Zheng, Zhenxian
Su, Junhao
Lam, Tak-Wah
Luo, Ruibang
author_sort Yu, Huijing
collection PubMed
description BACKGROUND: With the continuous advances in third-generation sequencing technology and the increasing affordability of next-generation sequencing technology, sequencing data from different sequencing technology platforms is becoming more common. While numerous benchmarking studies have been conducted to compare variant-calling performance across different platforms and approaches, little attention has been paid to the potential of leveraging the strengths of different platforms to optimize overall performance, especially integrating Oxford Nanopore and Illumina sequencing data. RESULTS: We investigated the impact of multi-platform data on the performance of variant calling through carefully designed experiments with a deep learning-based variant caller named Clair3-MP (Multi-Platform). Through our research, we not only demonstrated the capability of ONT-Illumina data for improved variant calling, but also identified the optimal scenarios for utilizing ONT-Illumina data. In addition, we revealed that the improvement in variant calling using ONT-Illumina data comes from an improvement in difficult genomic regions, such as the large low-complexity regions and segmental and collapse duplication regions. Moreover, Clair3-MP can incorporate reference genome stratification information to achieve a small but measurable improvement in variant calling. Clair3-MP is accessible as an open-source project at: https://github.com/HKU-BAL/Clair3-MP. CONCLUSIONS: These insights have important implications for researchers and practitioners alike, providing valuable guidance for improving the reliability and efficiency of genomic analysis in diverse applications. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05434-6.
format Online
Article
Text
id pubmed-10401749
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-104017492023-08-05 Boosting variant-calling performance with multi-platform sequencing data using Clair3-MP Yu, Huijing Zheng, Zhenxian Su, Junhao Lam, Tak-Wah Luo, Ruibang BMC Bioinformatics Software BACKGROUND: With the continuous advances in third-generation sequencing technology and the increasing affordability of next-generation sequencing technology, sequencing data from different sequencing technology platforms is becoming more common. While numerous benchmarking studies have been conducted to compare variant-calling performance across different platforms and approaches, little attention has been paid to the potential of leveraging the strengths of different platforms to optimize overall performance, especially integrating Oxford Nanopore and Illumina sequencing data. RESULTS: We investigated the impact of multi-platform data on the performance of variant calling through carefully designed experiments with a deep learning-based variant caller named Clair3-MP (Multi-Platform). Through our research, we not only demonstrated the capability of ONT-Illumina data for improved variant calling, but also identified the optimal scenarios for utilizing ONT-Illumina data. In addition, we revealed that the improvement in variant calling using ONT-Illumina data comes from an improvement in difficult genomic regions, such as the large low-complexity regions and segmental and collapse duplication regions. Moreover, Clair3-MP can incorporate reference genome stratification information to achieve a small but measurable improvement in variant calling. Clair3-MP is accessible as an open-source project at: https://github.com/HKU-BAL/Clair3-MP. CONCLUSIONS: These insights have important implications for researchers and practitioners alike, providing valuable guidance for improving the reliability and efficiency of genomic analysis in diverse applications. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05434-6. BioMed Central 2023-08-03 /pmc/articles/PMC10401749/ /pubmed/37537536 http://dx.doi.org/10.1186/s12859-023-05434-6 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Yu, Huijing
Zheng, Zhenxian
Su, Junhao
Lam, Tak-Wah
Luo, Ruibang
Boosting variant-calling performance with multi-platform sequencing data using Clair3-MP
title Boosting variant-calling performance with multi-platform sequencing data using Clair3-MP
title_full Boosting variant-calling performance with multi-platform sequencing data using Clair3-MP
title_fullStr Boosting variant-calling performance with multi-platform sequencing data using Clair3-MP
title_full_unstemmed Boosting variant-calling performance with multi-platform sequencing data using Clair3-MP
title_short Boosting variant-calling performance with multi-platform sequencing data using Clair3-MP
title_sort boosting variant-calling performance with multi-platform sequencing data using clair3-mp
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10401749/
https://www.ncbi.nlm.nih.gov/pubmed/37537536
http://dx.doi.org/10.1186/s12859-023-05434-6
work_keys_str_mv AT yuhuijing boostingvariantcallingperformancewithmultiplatformsequencingdatausingclair3mp
AT zhengzhenxian boostingvariantcallingperformancewithmultiplatformsequencingdatausingclair3mp
AT sujunhao boostingvariantcallingperformancewithmultiplatformsequencingdatausingclair3mp
AT lamtakwah boostingvariantcallingperformancewithmultiplatformsequencingdatausingclair3mp
AT luoruibang boostingvariantcallingperformancewithmultiplatformsequencingdatausingclair3mp