Cargando…

SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors

Motivation: Next-generation sequencing (NGS) has enabled whole genome and transcriptome single nucleotide variant (SNV) discovery in cancer. NGS produces millions of short sequence reads that, once aligned to a reference genome sequence, can be interpreted for the presence of SNVs. Although tools ex...

Descripción completa

Detalles Bibliográficos
Autores principales: Goya, Rodrigo, Sun, Mark G.F., Morin, Ryan D., Leung, Gillian, Ha, Gavin, Wiegand, Kimberley C., Senz, Janine, Crisan, Anamaria, Marra, Marco A., Hirst, Martin, Huntsman, David, Murphy, Kevin P., Aparicio, Sam, Shah, Sohrab P.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2832826/
https://www.ncbi.nlm.nih.gov/pubmed/20130035
http://dx.doi.org/10.1093/bioinformatics/btq040
_version_ 1782178345886154752
author Goya, Rodrigo
Sun, Mark G.F.
Morin, Ryan D.
Leung, Gillian
Ha, Gavin
Wiegand, Kimberley C.
Senz, Janine
Crisan, Anamaria
Marra, Marco A.
Hirst, Martin
Huntsman, David
Murphy, Kevin P.
Aparicio, Sam
Shah, Sohrab P.
author_facet Goya, Rodrigo
Sun, Mark G.F.
Morin, Ryan D.
Leung, Gillian
Ha, Gavin
Wiegand, Kimberley C.
Senz, Janine
Crisan, Anamaria
Marra, Marco A.
Hirst, Martin
Huntsman, David
Murphy, Kevin P.
Aparicio, Sam
Shah, Sohrab P.
author_sort Goya, Rodrigo
collection PubMed
description Motivation: Next-generation sequencing (NGS) has enabled whole genome and transcriptome single nucleotide variant (SNV) discovery in cancer. NGS produces millions of short sequence reads that, once aligned to a reference genome sequence, can be interpreted for the presence of SNVs. Although tools exist for SNV discovery from NGS data, none are specifically suited to work with data from tumors, where altered ploidy and tumor cellularity impact the statistical expectations of SNV discovery. Results: We developed three implementations of a probabilistic Binomial mixture model, called SNVMix, designed to infer SNVs from NGS data from tumors to address this problem. The first models allelic counts as observations and infers SNVs and model parameters using an expectation maximization (EM) algorithm and is therefore capable of adjusting to deviation of allelic frequencies inherent in genomically unstable tumor genomes. The second models nucleotide and mapping qualities of the reads by probabilistically weighting the contribution of a read/nucleotide to the inference of a SNV based on the confidence we have in the base call and the read alignment. The third combines filtering out low-quality data in addition to probabilistic weighting of the qualities. We quantitatively evaluated these approaches on 16 ovarian cancer RNASeq datasets with matched genotyping arrays and a human breast cancer genome sequenced to >40× (haploid) coverage with ground truth data and show systematically that the SNVMix models outperform competing approaches. Availability: Software and data are available at http://compbio.bccrc.ca Contact: sshah@bccrc.ca Supplemantary information: Supplementary data are available at Bioinformatics online.
format Text
id pubmed-2832826
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-28328262010-03-08 SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors Goya, Rodrigo Sun, Mark G.F. Morin, Ryan D. Leung, Gillian Ha, Gavin Wiegand, Kimberley C. Senz, Janine Crisan, Anamaria Marra, Marco A. Hirst, Martin Huntsman, David Murphy, Kevin P. Aparicio, Sam Shah, Sohrab P. Bioinformatics Original Papers Motivation: Next-generation sequencing (NGS) has enabled whole genome and transcriptome single nucleotide variant (SNV) discovery in cancer. NGS produces millions of short sequence reads that, once aligned to a reference genome sequence, can be interpreted for the presence of SNVs. Although tools exist for SNV discovery from NGS data, none are specifically suited to work with data from tumors, where altered ploidy and tumor cellularity impact the statistical expectations of SNV discovery. Results: We developed three implementations of a probabilistic Binomial mixture model, called SNVMix, designed to infer SNVs from NGS data from tumors to address this problem. The first models allelic counts as observations and infers SNVs and model parameters using an expectation maximization (EM) algorithm and is therefore capable of adjusting to deviation of allelic frequencies inherent in genomically unstable tumor genomes. The second models nucleotide and mapping qualities of the reads by probabilistically weighting the contribution of a read/nucleotide to the inference of a SNV based on the confidence we have in the base call and the read alignment. The third combines filtering out low-quality data in addition to probabilistic weighting of the qualities. We quantitatively evaluated these approaches on 16 ovarian cancer RNASeq datasets with matched genotyping arrays and a human breast cancer genome sequenced to >40× (haploid) coverage with ground truth data and show systematically that the SNVMix models outperform competing approaches. Availability: Software and data are available at http://compbio.bccrc.ca Contact: sshah@bccrc.ca Supplemantary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2010-03-15 2010-02-03 /pmc/articles/PMC2832826/ /pubmed/20130035 http://dx.doi.org/10.1093/bioinformatics/btq040 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Goya, Rodrigo
Sun, Mark G.F.
Morin, Ryan D.
Leung, Gillian
Ha, Gavin
Wiegand, Kimberley C.
Senz, Janine
Crisan, Anamaria
Marra, Marco A.
Hirst, Martin
Huntsman, David
Murphy, Kevin P.
Aparicio, Sam
Shah, Sohrab P.
SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors
title SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors
title_full SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors
title_fullStr SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors
title_full_unstemmed SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors
title_short SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors
title_sort snvmix: predicting single nucleotide variants from next-generation sequencing of tumors
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2832826/
https://www.ncbi.nlm.nih.gov/pubmed/20130035
http://dx.doi.org/10.1093/bioinformatics/btq040
work_keys_str_mv AT goyarodrigo snvmixpredictingsinglenucleotidevariantsfromnextgenerationsequencingoftumors
AT sunmarkgf snvmixpredictingsinglenucleotidevariantsfromnextgenerationsequencingoftumors
AT morinryand snvmixpredictingsinglenucleotidevariantsfromnextgenerationsequencingoftumors
AT leunggillian snvmixpredictingsinglenucleotidevariantsfromnextgenerationsequencingoftumors
AT hagavin snvmixpredictingsinglenucleotidevariantsfromnextgenerationsequencingoftumors
AT wiegandkimberleyc snvmixpredictingsinglenucleotidevariantsfromnextgenerationsequencingoftumors
AT senzjanine snvmixpredictingsinglenucleotidevariantsfromnextgenerationsequencingoftumors
AT crisananamaria snvmixpredictingsinglenucleotidevariantsfromnextgenerationsequencingoftumors
AT marramarcoa snvmixpredictingsinglenucleotidevariantsfromnextgenerationsequencingoftumors
AT hirstmartin snvmixpredictingsinglenucleotidevariantsfromnextgenerationsequencingoftumors
AT huntsmandavid snvmixpredictingsinglenucleotidevariantsfromnextgenerationsequencingoftumors
AT murphykevinp snvmixpredictingsinglenucleotidevariantsfromnextgenerationsequencingoftumors
AT apariciosam snvmixpredictingsinglenucleotidevariantsfromnextgenerationsequencingoftumors
AT shahsohrabp snvmixpredictingsinglenucleotidevariantsfromnextgenerationsequencingoftumors