Cargando…

Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq

Genetic variants and de novo mutations in regulatory regions of the genome are typically discovered by whole-genome sequencing (WGS), however WGS is expensive and most WGS reads come from non-regulatory regions. The Assay for Transposase-Accessible Chromatin (ATAC-seq) generates reads from regulator...

Descripción completa

Detalles Bibliográficos
Autores principales: Massarat, Arya R, Sen, Arko, Jaureguy, Jeff, Tyndale, Sélène T, Fu, Yi, Erikson, Galina, McVicker, Graham
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8373110/
https://www.ncbi.nlm.nih.gov/pubmed/34313779
http://dx.doi.org/10.1093/nar/gkab621
_version_ 1783739887787180032
author Massarat, Arya R
Sen, Arko
Jaureguy, Jeff
Tyndale, Sélène T
Fu, Yi
Erikson, Galina
McVicker, Graham
author_facet Massarat, Arya R
Sen, Arko
Jaureguy, Jeff
Tyndale, Sélène T
Fu, Yi
Erikson, Galina
McVicker, Graham
author_sort Massarat, Arya R
collection PubMed
description Genetic variants and de novo mutations in regulatory regions of the genome are typically discovered by whole-genome sequencing (WGS), however WGS is expensive and most WGS reads come from non-regulatory regions. The Assay for Transposase-Accessible Chromatin (ATAC-seq) generates reads from regulatory sequences and could potentially be used as a low-cost ‘capture’ method for regulatory variant discovery, but its use for this purpose has not been systematically evaluated. Here we apply seven variant callers to bulk and single-cell ATAC-seq data and evaluate their ability to identify single nucleotide variants (SNVs) and insertions/deletions (indels). In addition, we develop an ensemble classifier, VarCA, which combines features from individual variant callers to predict variants. The Genome Analysis Toolkit (GATK) is the best-performing individual caller with precision/recall on a bulk ATAC test dataset of 0.92/0.97 for SNVs and 0.87/0.82 for indels within ATAC-seq peak regions with at least 10 reads. On bulk ATAC-seq reads, VarCA achieves superior performance with precision/recall of 0.99/0.95 for SNVs and 0.93/0.80 for indels. On single-cell ATAC-seq reads, VarCA attains precision/recall of 0.98/0.94 for SNVs and 0.82/0.82 for indels. In summary, ATAC-seq reads can be used to accurately discover non-coding regulatory variants in the absence of whole-genome sequencing data and our ensemble method, VarCA, has the best overall performance.
format Online
Article
Text
id pubmed-8373110
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-83731102021-08-19 Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq Massarat, Arya R Sen, Arko Jaureguy, Jeff Tyndale, Sélène T Fu, Yi Erikson, Galina McVicker, Graham Nucleic Acids Res Computational Biology Genetic variants and de novo mutations in regulatory regions of the genome are typically discovered by whole-genome sequencing (WGS), however WGS is expensive and most WGS reads come from non-regulatory regions. The Assay for Transposase-Accessible Chromatin (ATAC-seq) generates reads from regulatory sequences and could potentially be used as a low-cost ‘capture’ method for regulatory variant discovery, but its use for this purpose has not been systematically evaluated. Here we apply seven variant callers to bulk and single-cell ATAC-seq data and evaluate their ability to identify single nucleotide variants (SNVs) and insertions/deletions (indels). In addition, we develop an ensemble classifier, VarCA, which combines features from individual variant callers to predict variants. The Genome Analysis Toolkit (GATK) is the best-performing individual caller with precision/recall on a bulk ATAC test dataset of 0.92/0.97 for SNVs and 0.87/0.82 for indels within ATAC-seq peak regions with at least 10 reads. On bulk ATAC-seq reads, VarCA achieves superior performance with precision/recall of 0.99/0.95 for SNVs and 0.93/0.80 for indels. On single-cell ATAC-seq reads, VarCA attains precision/recall of 0.98/0.94 for SNVs and 0.82/0.82 for indels. In summary, ATAC-seq reads can be used to accurately discover non-coding regulatory variants in the absence of whole-genome sequencing data and our ensemble method, VarCA, has the best overall performance. Oxford University Press 2021-07-27 /pmc/articles/PMC8373110/ /pubmed/34313779 http://dx.doi.org/10.1093/nar/gkab621 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Computational Biology
Massarat, Arya R
Sen, Arko
Jaureguy, Jeff
Tyndale, Sélène T
Fu, Yi
Erikson, Galina
McVicker, Graham
Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq
title Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq
title_full Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq
title_fullStr Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq
title_full_unstemmed Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq
title_short Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq
title_sort discovering single nucleotide variants and indels from bulk and single-cell atac-seq
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8373110/
https://www.ncbi.nlm.nih.gov/pubmed/34313779
http://dx.doi.org/10.1093/nar/gkab621
work_keys_str_mv AT massarataryar discoveringsinglenucleotidevariantsandindelsfrombulkandsinglecellatacseq
AT senarko discoveringsinglenucleotidevariantsandindelsfrombulkandsinglecellatacseq
AT jaureguyjeff discoveringsinglenucleotidevariantsandindelsfrombulkandsinglecellatacseq
AT tyndaleselenet discoveringsinglenucleotidevariantsandindelsfrombulkandsinglecellatacseq
AT fuyi discoveringsinglenucleotidevariantsandindelsfrombulkandsinglecellatacseq
AT eriksongalina discoveringsinglenucleotidevariantsandindelsfrombulkandsinglecellatacseq
AT mcvickergraham discoveringsinglenucleotidevariantsandindelsfrombulkandsinglecellatacseq