Cargando…
DelInsCaller: An Efficient Algorithm for Identifying Delins and Estimating Haplotypes from Long Reads with High Level of Sequencing Errors
Delins, as known as complex indel, is a combined genomic structural variation formed by deleting and inserting DNA fragments at a common genomic location. Recent studies emphasized the importance of delins in cancer diagnosis and treatment. Although the long reads from PacBio CLR sequencing signific...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9858578/ https://www.ncbi.nlm.nih.gov/pubmed/36672745 http://dx.doi.org/10.3390/genes14010004 |
_version_ | 1784874136749735936 |
---|---|
author | Wang, Shenjie Zhang, Xuanping Qiang, Geng Wang, Jiayin |
author_facet | Wang, Shenjie Zhang, Xuanping Qiang, Geng Wang, Jiayin |
author_sort | Wang, Shenjie |
collection | PubMed |
description | Delins, as known as complex indel, is a combined genomic structural variation formed by deleting and inserting DNA fragments at a common genomic location. Recent studies emphasized the importance of delins in cancer diagnosis and treatment. Although the long reads from PacBio CLR sequencing significantly facilitate delins calling, the existing approaches still encounter computational challenges from the high level of sequencing errors, and often introduce errors in genotyping and phasing delins. In this paper, we propose an efficient algorithmic pipeline, named delInsCaller, to identify delins on haplotype resolution from the PacBio CLR sequencing data. delInsCaller design a fault-tolerant method by calculating a variation density score, which helps to locate the candidate mutational regions under a high-level of sequencing errors. It adopts a base association-based contig splicing method, which facilitates contig splicing in the presence of false-positive interference. We conducted a series of experiments on simulated datasets, and the results showed that delInsCaller outperformed several state-of-the-art approaches, e.g., SVseq3, across a wide range of parameter settings, such as read depth, sequencing error rates, etc. delInsCaller often obtained higher f-measures than other approaches; specifically, it was able to maintain advantages at ~15% sequencing errors. delInsCaller was able to significantly improve the N50 values with almost no loss of haplotype accuracy compared with the existing approach as well. |
format | Online Article Text |
id | pubmed-9858578 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-98585782023-01-21 DelInsCaller: An Efficient Algorithm for Identifying Delins and Estimating Haplotypes from Long Reads with High Level of Sequencing Errors Wang, Shenjie Zhang, Xuanping Qiang, Geng Wang, Jiayin Genes (Basel) Article Delins, as known as complex indel, is a combined genomic structural variation formed by deleting and inserting DNA fragments at a common genomic location. Recent studies emphasized the importance of delins in cancer diagnosis and treatment. Although the long reads from PacBio CLR sequencing significantly facilitate delins calling, the existing approaches still encounter computational challenges from the high level of sequencing errors, and often introduce errors in genotyping and phasing delins. In this paper, we propose an efficient algorithmic pipeline, named delInsCaller, to identify delins on haplotype resolution from the PacBio CLR sequencing data. delInsCaller design a fault-tolerant method by calculating a variation density score, which helps to locate the candidate mutational regions under a high-level of sequencing errors. It adopts a base association-based contig splicing method, which facilitates contig splicing in the presence of false-positive interference. We conducted a series of experiments on simulated datasets, and the results showed that delInsCaller outperformed several state-of-the-art approaches, e.g., SVseq3, across a wide range of parameter settings, such as read depth, sequencing error rates, etc. delInsCaller often obtained higher f-measures than other approaches; specifically, it was able to maintain advantages at ~15% sequencing errors. delInsCaller was able to significantly improve the N50 values with almost no loss of haplotype accuracy compared with the existing approach as well. MDPI 2022-12-20 /pmc/articles/PMC9858578/ /pubmed/36672745 http://dx.doi.org/10.3390/genes14010004 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Wang, Shenjie Zhang, Xuanping Qiang, Geng Wang, Jiayin DelInsCaller: An Efficient Algorithm for Identifying Delins and Estimating Haplotypes from Long Reads with High Level of Sequencing Errors |
title | DelInsCaller: An Efficient Algorithm for Identifying Delins and Estimating Haplotypes from Long Reads with High Level of Sequencing Errors |
title_full | DelInsCaller: An Efficient Algorithm for Identifying Delins and Estimating Haplotypes from Long Reads with High Level of Sequencing Errors |
title_fullStr | DelInsCaller: An Efficient Algorithm for Identifying Delins and Estimating Haplotypes from Long Reads with High Level of Sequencing Errors |
title_full_unstemmed | DelInsCaller: An Efficient Algorithm for Identifying Delins and Estimating Haplotypes from Long Reads with High Level of Sequencing Errors |
title_short | DelInsCaller: An Efficient Algorithm for Identifying Delins and Estimating Haplotypes from Long Reads with High Level of Sequencing Errors |
title_sort | delinscaller: an efficient algorithm for identifying delins and estimating haplotypes from long reads with high level of sequencing errors |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9858578/ https://www.ncbi.nlm.nih.gov/pubmed/36672745 http://dx.doi.org/10.3390/genes14010004 |
work_keys_str_mv | AT wangshenjie delinscalleranefficientalgorithmforidentifyingdelinsandestimatinghaplotypesfromlongreadswithhighlevelofsequencingerrors AT zhangxuanping delinscalleranefficientalgorithmforidentifyingdelinsandestimatinghaplotypesfromlongreadswithhighlevelofsequencingerrors AT qianggeng delinscalleranefficientalgorithmforidentifyingdelinsandestimatinghaplotypesfromlongreadswithhighlevelofsequencingerrors AT wangjiayin delinscalleranefficientalgorithmforidentifyingdelinsandestimatinghaplotypesfromlongreadswithhighlevelofsequencingerrors |