Cargando…
Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq
Genetic variants and de novo mutations in regulatory regions of the genome are typically discovered by whole-genome sequencing (WGS), however WGS is expensive and most WGS reads come from non-regulatory regions. The Assay for Transposase-Accessible Chromatin (ATAC-seq) generates reads from regulator...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8373110/ https://www.ncbi.nlm.nih.gov/pubmed/34313779 http://dx.doi.org/10.1093/nar/gkab621 |
_version_ | 1783739887787180032 |
---|---|
author | Massarat, Arya R Sen, Arko Jaureguy, Jeff Tyndale, Sélène T Fu, Yi Erikson, Galina McVicker, Graham |
author_facet | Massarat, Arya R Sen, Arko Jaureguy, Jeff Tyndale, Sélène T Fu, Yi Erikson, Galina McVicker, Graham |
author_sort | Massarat, Arya R |
collection | PubMed |
description | Genetic variants and de novo mutations in regulatory regions of the genome are typically discovered by whole-genome sequencing (WGS), however WGS is expensive and most WGS reads come from non-regulatory regions. The Assay for Transposase-Accessible Chromatin (ATAC-seq) generates reads from regulatory sequences and could potentially be used as a low-cost ‘capture’ method for regulatory variant discovery, but its use for this purpose has not been systematically evaluated. Here we apply seven variant callers to bulk and single-cell ATAC-seq data and evaluate their ability to identify single nucleotide variants (SNVs) and insertions/deletions (indels). In addition, we develop an ensemble classifier, VarCA, which combines features from individual variant callers to predict variants. The Genome Analysis Toolkit (GATK) is the best-performing individual caller with precision/recall on a bulk ATAC test dataset of 0.92/0.97 for SNVs and 0.87/0.82 for indels within ATAC-seq peak regions with at least 10 reads. On bulk ATAC-seq reads, VarCA achieves superior performance with precision/recall of 0.99/0.95 for SNVs and 0.93/0.80 for indels. On single-cell ATAC-seq reads, VarCA attains precision/recall of 0.98/0.94 for SNVs and 0.82/0.82 for indels. In summary, ATAC-seq reads can be used to accurately discover non-coding regulatory variants in the absence of whole-genome sequencing data and our ensemble method, VarCA, has the best overall performance. |
format | Online Article Text |
id | pubmed-8373110 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-83731102021-08-19 Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq Massarat, Arya R Sen, Arko Jaureguy, Jeff Tyndale, Sélène T Fu, Yi Erikson, Galina McVicker, Graham Nucleic Acids Res Computational Biology Genetic variants and de novo mutations in regulatory regions of the genome are typically discovered by whole-genome sequencing (WGS), however WGS is expensive and most WGS reads come from non-regulatory regions. The Assay for Transposase-Accessible Chromatin (ATAC-seq) generates reads from regulatory sequences and could potentially be used as a low-cost ‘capture’ method for regulatory variant discovery, but its use for this purpose has not been systematically evaluated. Here we apply seven variant callers to bulk and single-cell ATAC-seq data and evaluate their ability to identify single nucleotide variants (SNVs) and insertions/deletions (indels). In addition, we develop an ensemble classifier, VarCA, which combines features from individual variant callers to predict variants. The Genome Analysis Toolkit (GATK) is the best-performing individual caller with precision/recall on a bulk ATAC test dataset of 0.92/0.97 for SNVs and 0.87/0.82 for indels within ATAC-seq peak regions with at least 10 reads. On bulk ATAC-seq reads, VarCA achieves superior performance with precision/recall of 0.99/0.95 for SNVs and 0.93/0.80 for indels. On single-cell ATAC-seq reads, VarCA attains precision/recall of 0.98/0.94 for SNVs and 0.82/0.82 for indels. In summary, ATAC-seq reads can be used to accurately discover non-coding regulatory variants in the absence of whole-genome sequencing data and our ensemble method, VarCA, has the best overall performance. Oxford University Press 2021-07-27 /pmc/articles/PMC8373110/ /pubmed/34313779 http://dx.doi.org/10.1093/nar/gkab621 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Computational Biology Massarat, Arya R Sen, Arko Jaureguy, Jeff Tyndale, Sélène T Fu, Yi Erikson, Galina McVicker, Graham Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq |
title | Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq |
title_full | Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq |
title_fullStr | Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq |
title_full_unstemmed | Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq |
title_short | Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq |
title_sort | discovering single nucleotide variants and indels from bulk and single-cell atac-seq |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8373110/ https://www.ncbi.nlm.nih.gov/pubmed/34313779 http://dx.doi.org/10.1093/nar/gkab621 |
work_keys_str_mv | AT massarataryar discoveringsinglenucleotidevariantsandindelsfrombulkandsinglecellatacseq AT senarko discoveringsinglenucleotidevariantsandindelsfrombulkandsinglecellatacseq AT jaureguyjeff discoveringsinglenucleotidevariantsandindelsfrombulkandsinglecellatacseq AT tyndaleselenet discoveringsinglenucleotidevariantsandindelsfrombulkandsinglecellatacseq AT fuyi discoveringsinglenucleotidevariantsandindelsfrombulkandsinglecellatacseq AT eriksongalina discoveringsinglenucleotidevariantsandindelsfrombulkandsinglecellatacseq AT mcvickergraham discoveringsinglenucleotidevariantsandindelsfrombulkandsinglecellatacseq |