Cargando…

NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data

MOTIVATION: Oxford Nanopore sequencing has great potential and advantages in population-scale studies. Due to the cost of sequencing, the depth of whole-genome sequencing for per individual sample must be small. However, the existing single nucleotide polymorphism (SNP) callers are aimed at high-cov...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Neng, Xu, Minghua, Nie, Fan, Ni, Peng, Xiao, Chuan-Le, Luo, Feng, Wang, Jianxin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9822538/
https://www.ncbi.nlm.nih.gov/pubmed/36548365
http://dx.doi.org/10.1093/bioinformatics/btac824
_version_ 1784865970340233216
author Huang, Neng
Xu, Minghua
Nie, Fan
Ni, Peng
Xiao, Chuan-Le
Luo, Feng
Wang, Jianxin
author_facet Huang, Neng
Xu, Minghua
Nie, Fan
Ni, Peng
Xiao, Chuan-Le
Luo, Feng
Wang, Jianxin
author_sort Huang, Neng
collection PubMed
description MOTIVATION: Oxford Nanopore sequencing has great potential and advantages in population-scale studies. Due to the cost of sequencing, the depth of whole-genome sequencing for per individual sample must be small. However, the existing single nucleotide polymorphism (SNP) callers are aimed at high-coverage Nanopore sequencing reads. Detecting the SNP variants on low-coverage Nanopore sequencing data is still a challenging problem. RESULTS: We developed a novel deep learning-based SNP calling method, NanoSNP, to identify the SNP sites (excluding short indels) based on low-coverage Nanopore sequencing reads. In this method, we design a multi-step, multi-scale and haplotype-aware SNP detection pipeline. First, the pileup model in NanoSNP utilizes the naive pileup feature to predict a subset of SNP sites with a Bi-long short-term memory (LSTM) network. These SNP sites are phased and used to divide the low-coverage Nanopore reads into different haplotypes. Finally, the long-range haplotype feature and short-range pileup feature are extracted from each haplotype. The haplotype model combines two features and predicts the genotype for the candidate site using a Bi-LSTM network. To evaluate the performance of NanoSNP, we compared NanoSNP with Clair, Clair3, Pepper-DeepVariant and NanoCaller on the low-coverage (∼16×) Nanopore sequencing reads. We also performed cross-genome testing on six human genomes HG002–HG007, respectively. Comprehensive experiments demonstrate that NanoSNP outperforms Clair, Pepper-DeepVariant and NanoCaller in identifying SNPs on low-coverage Nanopore sequencing data, including the difficult-to-map regions and major histocompatibility complex regions in the human genome. NanoSNP is comparable to Clair3 when the coverage exceeds 16×. AVAILABILITY AND IMPLEMENTATION: https://github.com/huangnengCSU/NanoSNP.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9822538
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-98225382023-01-09 NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data Huang, Neng Xu, Minghua Nie, Fan Ni, Peng Xiao, Chuan-Le Luo, Feng Wang, Jianxin Bioinformatics Original Paper MOTIVATION: Oxford Nanopore sequencing has great potential and advantages in population-scale studies. Due to the cost of sequencing, the depth of whole-genome sequencing for per individual sample must be small. However, the existing single nucleotide polymorphism (SNP) callers are aimed at high-coverage Nanopore sequencing reads. Detecting the SNP variants on low-coverage Nanopore sequencing data is still a challenging problem. RESULTS: We developed a novel deep learning-based SNP calling method, NanoSNP, to identify the SNP sites (excluding short indels) based on low-coverage Nanopore sequencing reads. In this method, we design a multi-step, multi-scale and haplotype-aware SNP detection pipeline. First, the pileup model in NanoSNP utilizes the naive pileup feature to predict a subset of SNP sites with a Bi-long short-term memory (LSTM) network. These SNP sites are phased and used to divide the low-coverage Nanopore reads into different haplotypes. Finally, the long-range haplotype feature and short-range pileup feature are extracted from each haplotype. The haplotype model combines two features and predicts the genotype for the candidate site using a Bi-LSTM network. To evaluate the performance of NanoSNP, we compared NanoSNP with Clair, Clair3, Pepper-DeepVariant and NanoCaller on the low-coverage (∼16×) Nanopore sequencing reads. We also performed cross-genome testing on six human genomes HG002–HG007, respectively. Comprehensive experiments demonstrate that NanoSNP outperforms Clair, Pepper-DeepVariant and NanoCaller in identifying SNPs on low-coverage Nanopore sequencing data, including the difficult-to-map regions and major histocompatibility complex regions in the human genome. NanoSNP is comparable to Clair3 when the coverage exceeds 16×. AVAILABILITY AND IMPLEMENTATION: https://github.com/huangnengCSU/NanoSNP.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-12-22 /pmc/articles/PMC9822538/ /pubmed/36548365 http://dx.doi.org/10.1093/bioinformatics/btac824 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Huang, Neng
Xu, Minghua
Nie, Fan
Ni, Peng
Xiao, Chuan-Le
Luo, Feng
Wang, Jianxin
NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data
title NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data
title_full NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data
title_fullStr NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data
title_full_unstemmed NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data
title_short NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data
title_sort nanosnp: a progressive and haplotype-aware snp caller on low-coverage nanopore sequencing data
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9822538/
https://www.ncbi.nlm.nih.gov/pubmed/36548365
http://dx.doi.org/10.1093/bioinformatics/btac824
work_keys_str_mv AT huangneng nanosnpaprogressiveandhaplotypeawaresnpcalleronlowcoveragenanoporesequencingdata
AT xuminghua nanosnpaprogressiveandhaplotypeawaresnpcalleronlowcoveragenanoporesequencingdata
AT niefan nanosnpaprogressiveandhaplotypeawaresnpcalleronlowcoveragenanoporesequencingdata
AT nipeng nanosnpaprogressiveandhaplotypeawaresnpcalleronlowcoveragenanoporesequencingdata
AT xiaochuanle nanosnpaprogressiveandhaplotypeawaresnpcalleronlowcoveragenanoporesequencingdata
AT luofeng nanosnpaprogressiveandhaplotypeawaresnpcalleronlowcoveragenanoporesequencingdata
AT wangjianxin nanosnpaprogressiveandhaplotypeawaresnpcalleronlowcoveragenanoporesequencingdata