Cargando…
PureCN: copy number calling and SNV classification using targeted short read sequencing
BACKGROUND: Matched sequencing of both tumor and normal tissue is routinely used to classify variants of uncertain significance (VUS) into somatic vs. germline. However, assays used in molecular diagnostics focus on known somatic alterations in cancer genes and often only sequence tumors. Therefore,...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5157099/ https://www.ncbi.nlm.nih.gov/pubmed/27999612 http://dx.doi.org/10.1186/s13029-016-0060-z |
_version_ | 1782481384828305408 |
---|---|
author | Riester, Markus Singh, Angad P. Brannon, A. Rose Yu, Kun Campbell, Catarina D. Chiang, Derek Y. Morrissey, Michael P. |
author_facet | Riester, Markus Singh, Angad P. Brannon, A. Rose Yu, Kun Campbell, Catarina D. Chiang, Derek Y. Morrissey, Michael P. |
author_sort | Riester, Markus |
collection | PubMed |
description | BACKGROUND: Matched sequencing of both tumor and normal tissue is routinely used to classify variants of uncertain significance (VUS) into somatic vs. germline. However, assays used in molecular diagnostics focus on known somatic alterations in cancer genes and often only sequence tumors. Therefore, an algorithm that reliably classifies variants would be helpful for retrospective exploratory analyses. Contamination of tumor samples with normal cells results in differences in expected allelic fractions of germline and somatic variants, which can be exploited to accurately infer genotypes after adjusting for local copy number. However, existing algorithms for determining tumor purity, ploidy and copy number are not designed for unmatched short read sequencing data. RESULTS: We describe a methodology and corresponding open source software for estimating tumor purity, copy number, loss of heterozygosity (LOH), and contamination, and for classification of single nucleotide variants (SNVs) by somatic status and clonality. This R package, PureCN, is optimized for targeted short read sequencing data, integrates well with standard somatic variant detection pipelines, and has support for matched and unmatched tumor samples. Accuracy is demonstrated on simulated data and on real whole exome sequencing data. CONCLUSIONS: Our algorithm provides accurate estimates of tumor purity and ploidy, even if matched normal samples are not available. This in turn allows accurate classification of SNVs. The software is provided as open source (Artistic License 2.0) R/Bioconductor package PureCN (http://bioconductor.org/packages/PureCN/). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13029-016-0060-z) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5157099 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-51570992016-12-20 PureCN: copy number calling and SNV classification using targeted short read sequencing Riester, Markus Singh, Angad P. Brannon, A. Rose Yu, Kun Campbell, Catarina D. Chiang, Derek Y. Morrissey, Michael P. Source Code Biol Med Software BACKGROUND: Matched sequencing of both tumor and normal tissue is routinely used to classify variants of uncertain significance (VUS) into somatic vs. germline. However, assays used in molecular diagnostics focus on known somatic alterations in cancer genes and often only sequence tumors. Therefore, an algorithm that reliably classifies variants would be helpful for retrospective exploratory analyses. Contamination of tumor samples with normal cells results in differences in expected allelic fractions of germline and somatic variants, which can be exploited to accurately infer genotypes after adjusting for local copy number. However, existing algorithms for determining tumor purity, ploidy and copy number are not designed for unmatched short read sequencing data. RESULTS: We describe a methodology and corresponding open source software for estimating tumor purity, copy number, loss of heterozygosity (LOH), and contamination, and for classification of single nucleotide variants (SNVs) by somatic status and clonality. This R package, PureCN, is optimized for targeted short read sequencing data, integrates well with standard somatic variant detection pipelines, and has support for matched and unmatched tumor samples. Accuracy is demonstrated on simulated data and on real whole exome sequencing data. CONCLUSIONS: Our algorithm provides accurate estimates of tumor purity and ploidy, even if matched normal samples are not available. This in turn allows accurate classification of SNVs. The software is provided as open source (Artistic License 2.0) R/Bioconductor package PureCN (http://bioconductor.org/packages/PureCN/). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13029-016-0060-z) contains supplementary material, which is available to authorized users. BioMed Central 2016-12-15 /pmc/articles/PMC5157099/ /pubmed/27999612 http://dx.doi.org/10.1186/s13029-016-0060-z Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Riester, Markus Singh, Angad P. Brannon, A. Rose Yu, Kun Campbell, Catarina D. Chiang, Derek Y. Morrissey, Michael P. PureCN: copy number calling and SNV classification using targeted short read sequencing |
title | PureCN: copy number calling and SNV classification using targeted short read sequencing |
title_full | PureCN: copy number calling and SNV classification using targeted short read sequencing |
title_fullStr | PureCN: copy number calling and SNV classification using targeted short read sequencing |
title_full_unstemmed | PureCN: copy number calling and SNV classification using targeted short read sequencing |
title_short | PureCN: copy number calling and SNV classification using targeted short read sequencing |
title_sort | purecn: copy number calling and snv classification using targeted short read sequencing |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5157099/ https://www.ncbi.nlm.nih.gov/pubmed/27999612 http://dx.doi.org/10.1186/s13029-016-0060-z |
work_keys_str_mv | AT riestermarkus purecncopynumbercallingandsnvclassificationusingtargetedshortreadsequencing AT singhangadp purecncopynumbercallingandsnvclassificationusingtargetedshortreadsequencing AT brannonarose purecncopynumbercallingandsnvclassificationusingtargetedshortreadsequencing AT yukun purecncopynumbercallingandsnvclassificationusingtargetedshortreadsequencing AT campbellcatarinad purecncopynumbercallingandsnvclassificationusingtargetedshortreadsequencing AT chiangdereky purecncopynumbercallingandsnvclassificationusingtargetedshortreadsequencing AT morrisseymichaelp purecncopynumbercallingandsnvclassificationusingtargetedshortreadsequencing |