Cargando…
NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data
MOTIVATION: Oxford Nanopore sequencing has great potential and advantages in population-scale studies. Due to the cost of sequencing, the depth of whole-genome sequencing for per individual sample must be small. However, the existing single nucleotide polymorphism (SNP) callers are aimed at high-cov...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9822538/ https://www.ncbi.nlm.nih.gov/pubmed/36548365 http://dx.doi.org/10.1093/bioinformatics/btac824 |
_version_ | 1784865970340233216 |
---|---|
author | Huang, Neng Xu, Minghua Nie, Fan Ni, Peng Xiao, Chuan-Le Luo, Feng Wang, Jianxin |
author_facet | Huang, Neng Xu, Minghua Nie, Fan Ni, Peng Xiao, Chuan-Le Luo, Feng Wang, Jianxin |
author_sort | Huang, Neng |
collection | PubMed |
description | MOTIVATION: Oxford Nanopore sequencing has great potential and advantages in population-scale studies. Due to the cost of sequencing, the depth of whole-genome sequencing for per individual sample must be small. However, the existing single nucleotide polymorphism (SNP) callers are aimed at high-coverage Nanopore sequencing reads. Detecting the SNP variants on low-coverage Nanopore sequencing data is still a challenging problem. RESULTS: We developed a novel deep learning-based SNP calling method, NanoSNP, to identify the SNP sites (excluding short indels) based on low-coverage Nanopore sequencing reads. In this method, we design a multi-step, multi-scale and haplotype-aware SNP detection pipeline. First, the pileup model in NanoSNP utilizes the naive pileup feature to predict a subset of SNP sites with a Bi-long short-term memory (LSTM) network. These SNP sites are phased and used to divide the low-coverage Nanopore reads into different haplotypes. Finally, the long-range haplotype feature and short-range pileup feature are extracted from each haplotype. The haplotype model combines two features and predicts the genotype for the candidate site using a Bi-LSTM network. To evaluate the performance of NanoSNP, we compared NanoSNP with Clair, Clair3, Pepper-DeepVariant and NanoCaller on the low-coverage (∼16×) Nanopore sequencing reads. We also performed cross-genome testing on six human genomes HG002–HG007, respectively. Comprehensive experiments demonstrate that NanoSNP outperforms Clair, Pepper-DeepVariant and NanoCaller in identifying SNPs on low-coverage Nanopore sequencing data, including the difficult-to-map regions and major histocompatibility complex regions in the human genome. NanoSNP is comparable to Clair3 when the coverage exceeds 16×. AVAILABILITY AND IMPLEMENTATION: https://github.com/huangnengCSU/NanoSNP.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-9822538 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-98225382023-01-09 NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data Huang, Neng Xu, Minghua Nie, Fan Ni, Peng Xiao, Chuan-Le Luo, Feng Wang, Jianxin Bioinformatics Original Paper MOTIVATION: Oxford Nanopore sequencing has great potential and advantages in population-scale studies. Due to the cost of sequencing, the depth of whole-genome sequencing for per individual sample must be small. However, the existing single nucleotide polymorphism (SNP) callers are aimed at high-coverage Nanopore sequencing reads. Detecting the SNP variants on low-coverage Nanopore sequencing data is still a challenging problem. RESULTS: We developed a novel deep learning-based SNP calling method, NanoSNP, to identify the SNP sites (excluding short indels) based on low-coverage Nanopore sequencing reads. In this method, we design a multi-step, multi-scale and haplotype-aware SNP detection pipeline. First, the pileup model in NanoSNP utilizes the naive pileup feature to predict a subset of SNP sites with a Bi-long short-term memory (LSTM) network. These SNP sites are phased and used to divide the low-coverage Nanopore reads into different haplotypes. Finally, the long-range haplotype feature and short-range pileup feature are extracted from each haplotype. The haplotype model combines two features and predicts the genotype for the candidate site using a Bi-LSTM network. To evaluate the performance of NanoSNP, we compared NanoSNP with Clair, Clair3, Pepper-DeepVariant and NanoCaller on the low-coverage (∼16×) Nanopore sequencing reads. We also performed cross-genome testing on six human genomes HG002–HG007, respectively. Comprehensive experiments demonstrate that NanoSNP outperforms Clair, Pepper-DeepVariant and NanoCaller in identifying SNPs on low-coverage Nanopore sequencing data, including the difficult-to-map regions and major histocompatibility complex regions in the human genome. NanoSNP is comparable to Clair3 when the coverage exceeds 16×. AVAILABILITY AND IMPLEMENTATION: https://github.com/huangnengCSU/NanoSNP.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-12-22 /pmc/articles/PMC9822538/ /pubmed/36548365 http://dx.doi.org/10.1093/bioinformatics/btac824 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Paper Huang, Neng Xu, Minghua Nie, Fan Ni, Peng Xiao, Chuan-Le Luo, Feng Wang, Jianxin NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data |
title | NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data |
title_full | NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data |
title_fullStr | NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data |
title_full_unstemmed | NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data |
title_short | NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data |
title_sort | nanosnp: a progressive and haplotype-aware snp caller on low-coverage nanopore sequencing data |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9822538/ https://www.ncbi.nlm.nih.gov/pubmed/36548365 http://dx.doi.org/10.1093/bioinformatics/btac824 |
work_keys_str_mv | AT huangneng nanosnpaprogressiveandhaplotypeawaresnpcalleronlowcoveragenanoporesequencingdata AT xuminghua nanosnpaprogressiveandhaplotypeawaresnpcalleronlowcoveragenanoporesequencingdata AT niefan nanosnpaprogressiveandhaplotypeawaresnpcalleronlowcoveragenanoporesequencingdata AT nipeng nanosnpaprogressiveandhaplotypeawaresnpcalleronlowcoveragenanoporesequencingdata AT xiaochuanle nanosnpaprogressiveandhaplotypeawaresnpcalleronlowcoveragenanoporesequencingdata AT luofeng nanosnpaprogressiveandhaplotypeawaresnpcalleronlowcoveragenanoporesequencingdata AT wangjianxin nanosnpaprogressiveandhaplotypeawaresnpcalleronlowcoveragenanoporesequencingdata |