Cargando…

Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads

Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read based phasing. Third-generation nanopore sequence data has demonstrated a long read length, but current interpretatio...

Descripción completa

Detalles Bibliográficos
Autores principales: Shafin, Kishwar, Pesout, Trevor, Chang, Pi-Chuan, Nattestad, Maria, Kolesnikov, Alexey, Goel, Sidharth, Baid, Gunjan, Kolmogorov, Mikhail, Eizenga, Jordan M., Miga, Karen H., Carnevali, Paolo, Jain, Miten, Carroll, Andrew, Paten, Benedict
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8571015/
https://www.ncbi.nlm.nih.gov/pubmed/34725481
http://dx.doi.org/10.1038/s41592-021-01299-w
_version_ 1784594935191699456
author Shafin, Kishwar
Pesout, Trevor
Chang, Pi-Chuan
Nattestad, Maria
Kolesnikov, Alexey
Goel, Sidharth
Baid, Gunjan
Kolmogorov, Mikhail
Eizenga, Jordan M.
Miga, Karen H.
Carnevali, Paolo
Jain, Miten
Carroll, Andrew
Paten, Benedict
author_facet Shafin, Kishwar
Pesout, Trevor
Chang, Pi-Chuan
Nattestad, Maria
Kolesnikov, Alexey
Goel, Sidharth
Baid, Gunjan
Kolmogorov, Mikhail
Eizenga, Jordan M.
Miga, Karen H.
Carnevali, Paolo
Jain, Miten
Carroll, Andrew
Paten, Benedict
author_sort Shafin, Kishwar
collection PubMed
description Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read based phasing. Third-generation nanopore sequence data has demonstrated a long read length, but current interpretation methods for its novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline PEPPER-Margin-DeepVariant that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single nucleotide variant identification method at the whole genome-scale and produces high quality single nucleotide variants in segmental duplications and low-mappability regions where short-read based genotyping fails. We show that our pipeline can provide highly-contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% to 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance than the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio-HiFi-polished).
format Online
Article
Text
id pubmed-8571015
institution National Center for Biotechnology Information
language English
publishDate 2021
record_format MEDLINE/PubMed
spelling pubmed-85710152022-05-01 Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads Shafin, Kishwar Pesout, Trevor Chang, Pi-Chuan Nattestad, Maria Kolesnikov, Alexey Goel, Sidharth Baid, Gunjan Kolmogorov, Mikhail Eizenga, Jordan M. Miga, Karen H. Carnevali, Paolo Jain, Miten Carroll, Andrew Paten, Benedict Nat Methods Article Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read based phasing. Third-generation nanopore sequence data has demonstrated a long read length, but current interpretation methods for its novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline PEPPER-Margin-DeepVariant that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single nucleotide variant identification method at the whole genome-scale and produces high quality single nucleotide variants in segmental duplications and low-mappability regions where short-read based genotyping fails. We show that our pipeline can provide highly-contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% to 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance than the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio-HiFi-polished). 2021-11-01 2021-11 /pmc/articles/PMC8571015/ /pubmed/34725481 http://dx.doi.org/10.1038/s41592-021-01299-w Text en Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms
spellingShingle Article
Shafin, Kishwar
Pesout, Trevor
Chang, Pi-Chuan
Nattestad, Maria
Kolesnikov, Alexey
Goel, Sidharth
Baid, Gunjan
Kolmogorov, Mikhail
Eizenga, Jordan M.
Miga, Karen H.
Carnevali, Paolo
Jain, Miten
Carroll, Andrew
Paten, Benedict
Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads
title Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads
title_full Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads
title_fullStr Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads
title_full_unstemmed Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads
title_short Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads
title_sort haplotype-aware variant calling with pepper-margin-deepvariant enables high accuracy in nanopore long-reads
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8571015/
https://www.ncbi.nlm.nih.gov/pubmed/34725481
http://dx.doi.org/10.1038/s41592-021-01299-w
work_keys_str_mv AT shafinkishwar haplotypeawarevariantcallingwithpeppermargindeepvariantenableshighaccuracyinnanoporelongreads
AT pesouttrevor haplotypeawarevariantcallingwithpeppermargindeepvariantenableshighaccuracyinnanoporelongreads
AT changpichuan haplotypeawarevariantcallingwithpeppermargindeepvariantenableshighaccuracyinnanoporelongreads
AT nattestadmaria haplotypeawarevariantcallingwithpeppermargindeepvariantenableshighaccuracyinnanoporelongreads
AT kolesnikovalexey haplotypeawarevariantcallingwithpeppermargindeepvariantenableshighaccuracyinnanoporelongreads
AT goelsidharth haplotypeawarevariantcallingwithpeppermargindeepvariantenableshighaccuracyinnanoporelongreads
AT baidgunjan haplotypeawarevariantcallingwithpeppermargindeepvariantenableshighaccuracyinnanoporelongreads
AT kolmogorovmikhail haplotypeawarevariantcallingwithpeppermargindeepvariantenableshighaccuracyinnanoporelongreads
AT eizengajordanm haplotypeawarevariantcallingwithpeppermargindeepvariantenableshighaccuracyinnanoporelongreads
AT migakarenh haplotypeawarevariantcallingwithpeppermargindeepvariantenableshighaccuracyinnanoporelongreads
AT carnevalipaolo haplotypeawarevariantcallingwithpeppermargindeepvariantenableshighaccuracyinnanoporelongreads
AT jainmiten haplotypeawarevariantcallingwithpeppermargindeepvariantenableshighaccuracyinnanoporelongreads
AT carrollandrew haplotypeawarevariantcallingwithpeppermargindeepvariantenableshighaccuracyinnanoporelongreads
AT patenbenedict haplotypeawarevariantcallingwithpeppermargindeepvariantenableshighaccuracyinnanoporelongreads