Cargando…

Phylovar: toward scalable phylogeny-aware inference of single-nucleotide variations from single-cell DNA sequencing data

MOTIVATION: Single-nucleotide variants (SNVs) are the most common variations in the human genome. Recently developed methods for SNV detection from single-cell DNA sequencing data, such as SCI [Formula: see text] and scVILP, leverage the evolutionary history of the cells to overcome the technical er...

Descripción completa

Detalles Bibliográficos
Autores principales: Edrisi, Mohammadamin, Valecha, Monica V, Chowdary, Sunkara B V, Robledo, Sergio, Ogilvie, Huw A, Posada, David, Zafar, Hamim, Nakhleh, Luay
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9235480/
https://www.ncbi.nlm.nih.gov/pubmed/35758771
http://dx.doi.org/10.1093/bioinformatics/btac254
_version_ 1784736319079972864
author Edrisi, Mohammadamin
Valecha, Monica V
Chowdary, Sunkara B V
Robledo, Sergio
Ogilvie, Huw A
Posada, David
Zafar, Hamim
Nakhleh, Luay
author_facet Edrisi, Mohammadamin
Valecha, Monica V
Chowdary, Sunkara B V
Robledo, Sergio
Ogilvie, Huw A
Posada, David
Zafar, Hamim
Nakhleh, Luay
author_sort Edrisi, Mohammadamin
collection PubMed
description MOTIVATION: Single-nucleotide variants (SNVs) are the most common variations in the human genome. Recently developed methods for SNV detection from single-cell DNA sequencing data, such as SCI [Formula: see text] and scVILP, leverage the evolutionary history of the cells to overcome the technical errors associated with single-cell sequencing protocols. Despite being accurate, these methods are not scalable to the extensive genomic breadth of single-cell whole-genome (scWGS) and whole-exome sequencing (scWES) data. RESULTS: Here, we report on a new scalable method, Phylovar, which extends the phylogeny-guided variant calling approach to sequencing datasets containing millions of loci. Through benchmarking on simulated datasets under different settings, we show that, Phylovar outperforms SCI [Formula: see text] in terms of running time while being more accurate than Monovar (which is not phylogeny-aware) in terms of SNV detection. Furthermore, we applied Phylovar to two real biological datasets: an scWES triple-negative breast cancer data consisting of 32 cells and 3375 loci as well as an scWGS data of neuron cells from a normal human brain containing 16 cells and approximately 2.5 million loci. For the cancer data, Phylovar detected somatic SNVs with high or moderate functional impact that were also supported by bulk sequencing dataset and for the neuron dataset, Phylovar identified 5745 SNVs with non-synonymous effects some of which were associated with neurodegenerative diseases. AVAILABILITY AND IMPLEMENTATION: Phylovar is implemented in Python and is publicly available at https://github.com/NakhlehLab/Phylovar.
format Online
Article
Text
id pubmed-9235480
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-92354802022-06-29 Phylovar: toward scalable phylogeny-aware inference of single-nucleotide variations from single-cell DNA sequencing data Edrisi, Mohammadamin Valecha, Monica V Chowdary, Sunkara B V Robledo, Sergio Ogilvie, Huw A Posada, David Zafar, Hamim Nakhleh, Luay Bioinformatics ISCB/Ismb 2022 MOTIVATION: Single-nucleotide variants (SNVs) are the most common variations in the human genome. Recently developed methods for SNV detection from single-cell DNA sequencing data, such as SCI [Formula: see text] and scVILP, leverage the evolutionary history of the cells to overcome the technical errors associated with single-cell sequencing protocols. Despite being accurate, these methods are not scalable to the extensive genomic breadth of single-cell whole-genome (scWGS) and whole-exome sequencing (scWES) data. RESULTS: Here, we report on a new scalable method, Phylovar, which extends the phylogeny-guided variant calling approach to sequencing datasets containing millions of loci. Through benchmarking on simulated datasets under different settings, we show that, Phylovar outperforms SCI [Formula: see text] in terms of running time while being more accurate than Monovar (which is not phylogeny-aware) in terms of SNV detection. Furthermore, we applied Phylovar to two real biological datasets: an scWES triple-negative breast cancer data consisting of 32 cells and 3375 loci as well as an scWGS data of neuron cells from a normal human brain containing 16 cells and approximately 2.5 million loci. For the cancer data, Phylovar detected somatic SNVs with high or moderate functional impact that were also supported by bulk sequencing dataset and for the neuron dataset, Phylovar identified 5745 SNVs with non-synonymous effects some of which were associated with neurodegenerative diseases. AVAILABILITY AND IMPLEMENTATION: Phylovar is implemented in Python and is publicly available at https://github.com/NakhlehLab/Phylovar. Oxford University Press 2022-06-27 /pmc/articles/PMC9235480/ /pubmed/35758771 http://dx.doi.org/10.1093/bioinformatics/btac254 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle ISCB/Ismb 2022
Edrisi, Mohammadamin
Valecha, Monica V
Chowdary, Sunkara B V
Robledo, Sergio
Ogilvie, Huw A
Posada, David
Zafar, Hamim
Nakhleh, Luay
Phylovar: toward scalable phylogeny-aware inference of single-nucleotide variations from single-cell DNA sequencing data
title Phylovar: toward scalable phylogeny-aware inference of single-nucleotide variations from single-cell DNA sequencing data
title_full Phylovar: toward scalable phylogeny-aware inference of single-nucleotide variations from single-cell DNA sequencing data
title_fullStr Phylovar: toward scalable phylogeny-aware inference of single-nucleotide variations from single-cell DNA sequencing data
title_full_unstemmed Phylovar: toward scalable phylogeny-aware inference of single-nucleotide variations from single-cell DNA sequencing data
title_short Phylovar: toward scalable phylogeny-aware inference of single-nucleotide variations from single-cell DNA sequencing data
title_sort phylovar: toward scalable phylogeny-aware inference of single-nucleotide variations from single-cell dna sequencing data
topic ISCB/Ismb 2022
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9235480/
https://www.ncbi.nlm.nih.gov/pubmed/35758771
http://dx.doi.org/10.1093/bioinformatics/btac254
work_keys_str_mv AT edrisimohammadamin phylovartowardscalablephylogenyawareinferenceofsinglenucleotidevariationsfromsinglecelldnasequencingdata
AT valechamonicav phylovartowardscalablephylogenyawareinferenceofsinglenucleotidevariationsfromsinglecelldnasequencingdata
AT chowdarysunkarabv phylovartowardscalablephylogenyawareinferenceofsinglenucleotidevariationsfromsinglecelldnasequencingdata
AT robledosergio phylovartowardscalablephylogenyawareinferenceofsinglenucleotidevariationsfromsinglecelldnasequencingdata
AT ogilviehuwa phylovartowardscalablephylogenyawareinferenceofsinglenucleotidevariationsfromsinglecelldnasequencingdata
AT posadadavid phylovartowardscalablephylogenyawareinferenceofsinglenucleotidevariationsfromsinglecelldnasequencingdata
AT zafarhamim phylovartowardscalablephylogenyawareinferenceofsinglenucleotidevariationsfromsinglecelldnasequencingdata
AT nakhlehluay phylovartowardscalablephylogenyawareinferenceofsinglenucleotidevariationsfromsinglecelldnasequencingdata