Cargando…

svclassify: a method to establish benchmark structural variant calls

BACKGROUND: The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been d...

Descripción completa

Detalles Bibliográficos
Autores principales:	Parikh, Hemang, Mohiyuddin, Marghoob, Lam, Hugo Y. K., Iyer, Hariharan, Chen, Desu, Pratt, Mark, Bartha, Gabor, Spies, Noah, Losert, Wolfgang, Zook, Justin M., Salit, Marc
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4715349/ https://www.ncbi.nlm.nih.gov/pubmed/26772178 http://dx.doi.org/10.1186/s12864-016-2366-2

_version_	1782410458285735936
author	Parikh, Hemang Mohiyuddin, Marghoob Lam, Hugo Y. K. Iyer, Hariharan Chen, Desu Pratt, Mark Bartha, Gabor Spies, Noah Losert, Wolfgang Zook, Justin M. Salit, Marc
author_facet	Parikh, Hemang Mohiyuddin, Marghoob Lam, Hugo Y. K. Iyer, Hariharan Chen, Desu Pratt, Mark Bartha, Gabor Spies, Noah Losert, Wolfgang Zook, Justin M. Salit, Marc
author_sort	Parikh, Hemang
collection	PubMed
description	BACKGROUND: The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives. RESULTS: We first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz. CONCLUSIONS: We find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2366-2) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4715349
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-47153492016-01-17 svclassify: a method to establish benchmark structural variant calls Parikh, Hemang Mohiyuddin, Marghoob Lam, Hugo Y. K. Iyer, Hariharan Chen, Desu Pratt, Mark Bartha, Gabor Spies, Noah Losert, Wolfgang Zook, Justin M. Salit, Marc BMC Genomics Research Article BACKGROUND: The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives. RESULTS: We first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz. CONCLUSIONS: We find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2366-2) contains supplementary material, which is available to authorized users. BioMed Central 2016-01-16 /pmc/articles/PMC4715349/ /pubmed/26772178 http://dx.doi.org/10.1186/s12864-016-2366-2 Text en © Parikh et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Parikh, Hemang Mohiyuddin, Marghoob Lam, Hugo Y. K. Iyer, Hariharan Chen, Desu Pratt, Mark Bartha, Gabor Spies, Noah Losert, Wolfgang Zook, Justin M. Salit, Marc svclassify: a method to establish benchmark structural variant calls
title	svclassify: a method to establish benchmark structural variant calls
title_full	svclassify: a method to establish benchmark structural variant calls
title_fullStr	svclassify: a method to establish benchmark structural variant calls
title_full_unstemmed	svclassify: a method to establish benchmark structural variant calls
title_short	svclassify: a method to establish benchmark structural variant calls
title_sort	svclassify: a method to establish benchmark structural variant calls
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4715349/ https://www.ncbi.nlm.nih.gov/pubmed/26772178 http://dx.doi.org/10.1186/s12864-016-2366-2
work_keys_str_mv	AT parikhhemang svclassifyamethodtoestablishbenchmarkstructuralvariantcalls AT mohiyuddinmarghoob svclassifyamethodtoestablishbenchmarkstructuralvariantcalls AT lamhugoyk svclassifyamethodtoestablishbenchmarkstructuralvariantcalls AT iyerhariharan svclassifyamethodtoestablishbenchmarkstructuralvariantcalls AT chendesu svclassifyamethodtoestablishbenchmarkstructuralvariantcalls AT prattmark svclassifyamethodtoestablishbenchmarkstructuralvariantcalls AT barthagabor svclassifyamethodtoestablishbenchmarkstructuralvariantcalls AT spiesnoah svclassifyamethodtoestablishbenchmarkstructuralvariantcalls AT losertwolfgang svclassifyamethodtoestablishbenchmarkstructuralvariantcalls AT zookjustinm svclassifyamethodtoestablishbenchmarkstructuralvariantcalls AT salitmarc svclassifyamethodtoestablishbenchmarkstructuralvariantcalls

svclassify: a method to establish benchmark structural variant calls

Ejemplares similares