Cargando…

xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments

BACKGROUND: The growing volume and heterogeneity of next-generation sequencing (NGS) data complicate the further optimization of identifying DNA variation, especially considering that curated high-confidence variant call sets frequently used to validate these methods are generally developed from the...

Descripción completa

Detalles Bibliográficos
Autores principales: Farek, Jesse, Hughes, Daniel, Salerno, William, Zhu, Yiming, Pisupati, Aishwarya, Mansfield, Adam, Krasheninina, Olga, English, Adam C, Metcalf, Ginger, Boerwinkle, Eric, Muzny, Donna M, Gibbs, Richard, Khan, Ziad, Sedlazeck, Fritz J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9841152/
https://www.ncbi.nlm.nih.gov/pubmed/36644891
http://dx.doi.org/10.1093/gigascience/giac125
_version_ 1784869770340859904
author Farek, Jesse
Hughes, Daniel
Salerno, William
Zhu, Yiming
Pisupati, Aishwarya
Mansfield, Adam
Krasheninina, Olga
English, Adam C
Metcalf, Ginger
Boerwinkle, Eric
Muzny, Donna M
Gibbs, Richard
Khan, Ziad
Sedlazeck, Fritz J
author_facet Farek, Jesse
Hughes, Daniel
Salerno, William
Zhu, Yiming
Pisupati, Aishwarya
Mansfield, Adam
Krasheninina, Olga
English, Adam C
Metcalf, Ginger
Boerwinkle, Eric
Muzny, Donna M
Gibbs, Richard
Khan, Ziad
Sedlazeck, Fritz J
author_sort Farek, Jesse
collection PubMed
description BACKGROUND: The growing volume and heterogeneity of next-generation sequencing (NGS) data complicate the further optimization of identifying DNA variation, especially considering that curated high-confidence variant call sets frequently used to validate these methods are generally developed from the analysis of comparatively small and homogeneous sample sets. FINDINGS: We have developed xAtlas, a single-sample variant caller for single-nucleotide variants (SNVs) and small insertions and deletions (indels) in NGS data. xAtlas features rapid runtimes, support for CRAM and gVCF file formats, and retraining capabilities. xAtlas reports SNVs with 99.11% recall and 98.43% precision across a reference HG002 sample at 60× whole-genome coverage in less than 2 CPU hours. Applying xAtlas to 3,202 samples at 30× whole-genome coverage from the 1000 Genomes Project achieves an average runtime of 1.7 hours per sample and a clear separation of the individual populations in principal component analysis across called SNVs. CONCLUSIONS: xAtlas is a fast, lightweight, and accurate SNV and small indel calling method. Source code for xAtlas is available under a BSD 3-clause license at https://github.com/jfarek/xatlas.
format Online
Article
Text
id pubmed-9841152
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-98411522023-01-18 xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments Farek, Jesse Hughes, Daniel Salerno, William Zhu, Yiming Pisupati, Aishwarya Mansfield, Adam Krasheninina, Olga English, Adam C Metcalf, Ginger Boerwinkle, Eric Muzny, Donna M Gibbs, Richard Khan, Ziad Sedlazeck, Fritz J Gigascience Technical Note BACKGROUND: The growing volume and heterogeneity of next-generation sequencing (NGS) data complicate the further optimization of identifying DNA variation, especially considering that curated high-confidence variant call sets frequently used to validate these methods are generally developed from the analysis of comparatively small and homogeneous sample sets. FINDINGS: We have developed xAtlas, a single-sample variant caller for single-nucleotide variants (SNVs) and small insertions and deletions (indels) in NGS data. xAtlas features rapid runtimes, support for CRAM and gVCF file formats, and retraining capabilities. xAtlas reports SNVs with 99.11% recall and 98.43% precision across a reference HG002 sample at 60× whole-genome coverage in less than 2 CPU hours. Applying xAtlas to 3,202 samples at 30× whole-genome coverage from the 1000 Genomes Project achieves an average runtime of 1.7 hours per sample and a clear separation of the individual populations in principal component analysis across called SNVs. CONCLUSIONS: xAtlas is a fast, lightweight, and accurate SNV and small indel calling method. Source code for xAtlas is available under a BSD 3-clause license at https://github.com/jfarek/xatlas. Oxford University Press 2023-01-16 /pmc/articles/PMC9841152/ /pubmed/36644891 http://dx.doi.org/10.1093/gigascience/giac125 Text en © The Author(s) 2023. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Farek, Jesse
Hughes, Daniel
Salerno, William
Zhu, Yiming
Pisupati, Aishwarya
Mansfield, Adam
Krasheninina, Olga
English, Adam C
Metcalf, Ginger
Boerwinkle, Eric
Muzny, Donna M
Gibbs, Richard
Khan, Ziad
Sedlazeck, Fritz J
xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments
title xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments
title_full xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments
title_fullStr xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments
title_full_unstemmed xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments
title_short xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments
title_sort xatlas: scalable small variant calling across heterogeneous next-generation sequencing experiments
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9841152/
https://www.ncbi.nlm.nih.gov/pubmed/36644891
http://dx.doi.org/10.1093/gigascience/giac125
work_keys_str_mv AT farekjesse xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments
AT hughesdaniel xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments
AT salernowilliam xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments
AT zhuyiming xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments
AT pisupatiaishwarya xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments
AT mansfieldadam xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments
AT krashenininaolga xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments
AT englishadamc xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments
AT metcalfginger xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments
AT boerwinkleeric xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments
AT muznydonnam xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments
AT gibbsrichard xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments
AT khanziad xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments
AT sedlazeckfritzj xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments