Cargando…

NPSV: A simulation-driven approach to genotyping structural variants in whole-genome sequencing data

BACKGROUND: Structural variants (SVs) play a causal role in numerous diseases but are difficult to detect and accurately genotype (determine zygosity) in whole-genome next-generation sequencing data. SV genotypers that assume that the aligned sequencing data uniformly reflect the underlying SV or us...

Descripción completa

Detalles Bibliográficos
Autores principales: Linderman, Michael D, Paudyal, Crystal, Shakeel, Musab, Kelley, William, Bashir, Ali, Gelb, Bruce D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8246072/
https://www.ncbi.nlm.nih.gov/pubmed/34195837
http://dx.doi.org/10.1093/gigascience/giab046
_version_ 1783716234190127104
author Linderman, Michael D
Paudyal, Crystal
Shakeel, Musab
Kelley, William
Bashir, Ali
Gelb, Bruce D
author_facet Linderman, Michael D
Paudyal, Crystal
Shakeel, Musab
Kelley, William
Bashir, Ali
Gelb, Bruce D
author_sort Linderman, Michael D
collection PubMed
description BACKGROUND: Structural variants (SVs) play a causal role in numerous diseases but are difficult to detect and accurately genotype (determine zygosity) in whole-genome next-generation sequencing data. SV genotypers that assume that the aligned sequencing data uniformly reflect the underlying SV or use existing SV call sets as training data can only partially account for variant and sample-specific biases. RESULTS: We introduce NPSV, a machine learning–based approach for genotyping previously discovered SVs that uses next-generation sequencing simulation to model the combined effects of the genomic region, sequencer, and alignment pipeline on the observed SV evidence. We evaluate NPSV alongside existing SV genotypers on multiple benchmark call sets. We show that NPSV consistently achieves or exceeds state-of-the-art genotyping accuracy across SV call sets, samples, and variant types. NPSV can specifically identify putative de novo SVs in a trio context and is robust to offset SV breakpoints. CONCLUSIONS: Growing SV databases and the increasing availability of SV calls from long-read sequencing make stand-alone genotyping of previously identified SVs an increasingly important component of genome analyses. By treating potential biases as a “black box” that can be simulated, NPSV provides a framework for accurately genotyping a broad range of SVs in both targeted and genome-scale applications.
format Online
Article
Text
id pubmed-8246072
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-82460722021-07-02 NPSV: A simulation-driven approach to genotyping structural variants in whole-genome sequencing data Linderman, Michael D Paudyal, Crystal Shakeel, Musab Kelley, William Bashir, Ali Gelb, Bruce D Gigascience Technical Note BACKGROUND: Structural variants (SVs) play a causal role in numerous diseases but are difficult to detect and accurately genotype (determine zygosity) in whole-genome next-generation sequencing data. SV genotypers that assume that the aligned sequencing data uniformly reflect the underlying SV or use existing SV call sets as training data can only partially account for variant and sample-specific biases. RESULTS: We introduce NPSV, a machine learning–based approach for genotyping previously discovered SVs that uses next-generation sequencing simulation to model the combined effects of the genomic region, sequencer, and alignment pipeline on the observed SV evidence. We evaluate NPSV alongside existing SV genotypers on multiple benchmark call sets. We show that NPSV consistently achieves or exceeds state-of-the-art genotyping accuracy across SV call sets, samples, and variant types. NPSV can specifically identify putative de novo SVs in a trio context and is robust to offset SV breakpoints. CONCLUSIONS: Growing SV databases and the increasing availability of SV calls from long-read sequencing make stand-alone genotyping of previously identified SVs an increasingly important component of genome analyses. By treating potential biases as a “black box” that can be simulated, NPSV provides a framework for accurately genotyping a broad range of SVs in both targeted and genome-scale applications. Oxford University Press 2021-07-01 /pmc/articles/PMC8246072/ /pubmed/34195837 http://dx.doi.org/10.1093/gigascience/giab046 Text en © The Author(s) 2021. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Linderman, Michael D
Paudyal, Crystal
Shakeel, Musab
Kelley, William
Bashir, Ali
Gelb, Bruce D
NPSV: A simulation-driven approach to genotyping structural variants in whole-genome sequencing data
title NPSV: A simulation-driven approach to genotyping structural variants in whole-genome sequencing data
title_full NPSV: A simulation-driven approach to genotyping structural variants in whole-genome sequencing data
title_fullStr NPSV: A simulation-driven approach to genotyping structural variants in whole-genome sequencing data
title_full_unstemmed NPSV: A simulation-driven approach to genotyping structural variants in whole-genome sequencing data
title_short NPSV: A simulation-driven approach to genotyping structural variants in whole-genome sequencing data
title_sort npsv: a simulation-driven approach to genotyping structural variants in whole-genome sequencing data
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8246072/
https://www.ncbi.nlm.nih.gov/pubmed/34195837
http://dx.doi.org/10.1093/gigascience/giab046
work_keys_str_mv AT lindermanmichaeld npsvasimulationdrivenapproachtogenotypingstructuralvariantsinwholegenomesequencingdata
AT paudyalcrystal npsvasimulationdrivenapproachtogenotypingstructuralvariantsinwholegenomesequencingdata
AT shakeelmusab npsvasimulationdrivenapproachtogenotypingstructuralvariantsinwholegenomesequencingdata
AT kelleywilliam npsvasimulationdrivenapproachtogenotypingstructuralvariantsinwholegenomesequencingdata
AT bashirali npsvasimulationdrivenapproachtogenotypingstructuralvariantsinwholegenomesequencingdata
AT gelbbruced npsvasimulationdrivenapproachtogenotypingstructuralvariantsinwholegenomesequencingdata