Cargando…

PureCN: copy number calling and SNV classification using targeted short read sequencing

BACKGROUND: Matched sequencing of both tumor and normal tissue is routinely used to classify variants of uncertain significance (VUS) into somatic vs. germline. However, assays used in molecular diagnostics focus on known somatic alterations in cancer genes and often only sequence tumors. Therefore,...

Descripción completa

Detalles Bibliográficos
Autores principales: Riester, Markus, Singh, Angad P., Brannon, A. Rose, Yu, Kun, Campbell, Catarina D., Chiang, Derek Y., Morrissey, Michael P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5157099/
https://www.ncbi.nlm.nih.gov/pubmed/27999612
http://dx.doi.org/10.1186/s13029-016-0060-z
_version_ 1782481384828305408
author Riester, Markus
Singh, Angad P.
Brannon, A. Rose
Yu, Kun
Campbell, Catarina D.
Chiang, Derek Y.
Morrissey, Michael P.
author_facet Riester, Markus
Singh, Angad P.
Brannon, A. Rose
Yu, Kun
Campbell, Catarina D.
Chiang, Derek Y.
Morrissey, Michael P.
author_sort Riester, Markus
collection PubMed
description BACKGROUND: Matched sequencing of both tumor and normal tissue is routinely used to classify variants of uncertain significance (VUS) into somatic vs. germline. However, assays used in molecular diagnostics focus on known somatic alterations in cancer genes and often only sequence tumors. Therefore, an algorithm that reliably classifies variants would be helpful for retrospective exploratory analyses. Contamination of tumor samples with normal cells results in differences in expected allelic fractions of germline and somatic variants, which can be exploited to accurately infer genotypes after adjusting for local copy number. However, existing algorithms for determining tumor purity, ploidy and copy number are not designed for unmatched short read sequencing data. RESULTS: We describe a methodology and corresponding open source software for estimating tumor purity, copy number, loss of heterozygosity (LOH), and contamination, and for classification of single nucleotide variants (SNVs) by somatic status and clonality. This R package, PureCN, is optimized for targeted short read sequencing data, integrates well with standard somatic variant detection pipelines, and has support for matched and unmatched tumor samples. Accuracy is demonstrated on simulated data and on real whole exome sequencing data. CONCLUSIONS: Our algorithm provides accurate estimates of tumor purity and ploidy, even if matched normal samples are not available. This in turn allows accurate classification of SNVs. The software is provided as open source (Artistic License 2.0) R/Bioconductor package PureCN (http://bioconductor.org/packages/PureCN/). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13029-016-0060-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5157099
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51570992016-12-20 PureCN: copy number calling and SNV classification using targeted short read sequencing Riester, Markus Singh, Angad P. Brannon, A. Rose Yu, Kun Campbell, Catarina D. Chiang, Derek Y. Morrissey, Michael P. Source Code Biol Med Software BACKGROUND: Matched sequencing of both tumor and normal tissue is routinely used to classify variants of uncertain significance (VUS) into somatic vs. germline. However, assays used in molecular diagnostics focus on known somatic alterations in cancer genes and often only sequence tumors. Therefore, an algorithm that reliably classifies variants would be helpful for retrospective exploratory analyses. Contamination of tumor samples with normal cells results in differences in expected allelic fractions of germline and somatic variants, which can be exploited to accurately infer genotypes after adjusting for local copy number. However, existing algorithms for determining tumor purity, ploidy and copy number are not designed for unmatched short read sequencing data. RESULTS: We describe a methodology and corresponding open source software for estimating tumor purity, copy number, loss of heterozygosity (LOH), and contamination, and for classification of single nucleotide variants (SNVs) by somatic status and clonality. This R package, PureCN, is optimized for targeted short read sequencing data, integrates well with standard somatic variant detection pipelines, and has support for matched and unmatched tumor samples. Accuracy is demonstrated on simulated data and on real whole exome sequencing data. CONCLUSIONS: Our algorithm provides accurate estimates of tumor purity and ploidy, even if matched normal samples are not available. This in turn allows accurate classification of SNVs. The software is provided as open source (Artistic License 2.0) R/Bioconductor package PureCN (http://bioconductor.org/packages/PureCN/). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13029-016-0060-z) contains supplementary material, which is available to authorized users. BioMed Central 2016-12-15 /pmc/articles/PMC5157099/ /pubmed/27999612 http://dx.doi.org/10.1186/s13029-016-0060-z Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Riester, Markus
Singh, Angad P.
Brannon, A. Rose
Yu, Kun
Campbell, Catarina D.
Chiang, Derek Y.
Morrissey, Michael P.
PureCN: copy number calling and SNV classification using targeted short read sequencing
title PureCN: copy number calling and SNV classification using targeted short read sequencing
title_full PureCN: copy number calling and SNV classification using targeted short read sequencing
title_fullStr PureCN: copy number calling and SNV classification using targeted short read sequencing
title_full_unstemmed PureCN: copy number calling and SNV classification using targeted short read sequencing
title_short PureCN: copy number calling and SNV classification using targeted short read sequencing
title_sort purecn: copy number calling and snv classification using targeted short read sequencing
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5157099/
https://www.ncbi.nlm.nih.gov/pubmed/27999612
http://dx.doi.org/10.1186/s13029-016-0060-z
work_keys_str_mv AT riestermarkus purecncopynumbercallingandsnvclassificationusingtargetedshortreadsequencing
AT singhangadp purecncopynumbercallingandsnvclassificationusingtargetedshortreadsequencing
AT brannonarose purecncopynumbercallingandsnvclassificationusingtargetedshortreadsequencing
AT yukun purecncopynumbercallingandsnvclassificationusingtargetedshortreadsequencing
AT campbellcatarinad purecncopynumbercallingandsnvclassificationusingtargetedshortreadsequencing
AT chiangdereky purecncopynumbercallingandsnvclassificationusingtargetedshortreadsequencing
AT morrisseymichaelp purecncopynumbercallingandsnvclassificationusingtargetedshortreadsequencing