Cargando…
xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments
BACKGROUND: The growing volume and heterogeneity of next-generation sequencing (NGS) data complicate the further optimization of identifying DNA variation, especially considering that curated high-confidence variant call sets frequently used to validate these methods are generally developed from the...
Autores principales: | , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9841152/ https://www.ncbi.nlm.nih.gov/pubmed/36644891 http://dx.doi.org/10.1093/gigascience/giac125 |
_version_ | 1784869770340859904 |
---|---|
author | Farek, Jesse Hughes, Daniel Salerno, William Zhu, Yiming Pisupati, Aishwarya Mansfield, Adam Krasheninina, Olga English, Adam C Metcalf, Ginger Boerwinkle, Eric Muzny, Donna M Gibbs, Richard Khan, Ziad Sedlazeck, Fritz J |
author_facet | Farek, Jesse Hughes, Daniel Salerno, William Zhu, Yiming Pisupati, Aishwarya Mansfield, Adam Krasheninina, Olga English, Adam C Metcalf, Ginger Boerwinkle, Eric Muzny, Donna M Gibbs, Richard Khan, Ziad Sedlazeck, Fritz J |
author_sort | Farek, Jesse |
collection | PubMed |
description | BACKGROUND: The growing volume and heterogeneity of next-generation sequencing (NGS) data complicate the further optimization of identifying DNA variation, especially considering that curated high-confidence variant call sets frequently used to validate these methods are generally developed from the analysis of comparatively small and homogeneous sample sets. FINDINGS: We have developed xAtlas, a single-sample variant caller for single-nucleotide variants (SNVs) and small insertions and deletions (indels) in NGS data. xAtlas features rapid runtimes, support for CRAM and gVCF file formats, and retraining capabilities. xAtlas reports SNVs with 99.11% recall and 98.43% precision across a reference HG002 sample at 60× whole-genome coverage in less than 2 CPU hours. Applying xAtlas to 3,202 samples at 30× whole-genome coverage from the 1000 Genomes Project achieves an average runtime of 1.7 hours per sample and a clear separation of the individual populations in principal component analysis across called SNVs. CONCLUSIONS: xAtlas is a fast, lightweight, and accurate SNV and small indel calling method. Source code for xAtlas is available under a BSD 3-clause license at https://github.com/jfarek/xatlas. |
format | Online Article Text |
id | pubmed-9841152 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-98411522023-01-18 xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments Farek, Jesse Hughes, Daniel Salerno, William Zhu, Yiming Pisupati, Aishwarya Mansfield, Adam Krasheninina, Olga English, Adam C Metcalf, Ginger Boerwinkle, Eric Muzny, Donna M Gibbs, Richard Khan, Ziad Sedlazeck, Fritz J Gigascience Technical Note BACKGROUND: The growing volume and heterogeneity of next-generation sequencing (NGS) data complicate the further optimization of identifying DNA variation, especially considering that curated high-confidence variant call sets frequently used to validate these methods are generally developed from the analysis of comparatively small and homogeneous sample sets. FINDINGS: We have developed xAtlas, a single-sample variant caller for single-nucleotide variants (SNVs) and small insertions and deletions (indels) in NGS data. xAtlas features rapid runtimes, support for CRAM and gVCF file formats, and retraining capabilities. xAtlas reports SNVs with 99.11% recall and 98.43% precision across a reference HG002 sample at 60× whole-genome coverage in less than 2 CPU hours. Applying xAtlas to 3,202 samples at 30× whole-genome coverage from the 1000 Genomes Project achieves an average runtime of 1.7 hours per sample and a clear separation of the individual populations in principal component analysis across called SNVs. CONCLUSIONS: xAtlas is a fast, lightweight, and accurate SNV and small indel calling method. Source code for xAtlas is available under a BSD 3-clause license at https://github.com/jfarek/xatlas. Oxford University Press 2023-01-16 /pmc/articles/PMC9841152/ /pubmed/36644891 http://dx.doi.org/10.1093/gigascience/giac125 Text en © The Author(s) 2023. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Technical Note Farek, Jesse Hughes, Daniel Salerno, William Zhu, Yiming Pisupati, Aishwarya Mansfield, Adam Krasheninina, Olga English, Adam C Metcalf, Ginger Boerwinkle, Eric Muzny, Donna M Gibbs, Richard Khan, Ziad Sedlazeck, Fritz J xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments |
title | xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments |
title_full | xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments |
title_fullStr | xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments |
title_full_unstemmed | xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments |
title_short | xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments |
title_sort | xatlas: scalable small variant calling across heterogeneous next-generation sequencing experiments |
topic | Technical Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9841152/ https://www.ncbi.nlm.nih.gov/pubmed/36644891 http://dx.doi.org/10.1093/gigascience/giac125 |
work_keys_str_mv | AT farekjesse xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments AT hughesdaniel xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments AT salernowilliam xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments AT zhuyiming xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments AT pisupatiaishwarya xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments AT mansfieldadam xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments AT krashenininaolga xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments AT englishadamc xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments AT metcalfginger xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments AT boerwinkleeric xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments AT muznydonnam xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments AT gibbsrichard xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments AT khanziad xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments AT sedlazeckfritzj xatlasscalablesmallvariantcallingacrossheterogeneousnextgenerationsequencingexperiments |