Cargando…

ISOWN: accurate somatic mutation identification in the absence of normal tissue controls

BACKGROUND: A key step in cancer genome analysis is the identification of somatic mutations in the tumor. This is typically done by comparing the genome of the tumor to the reference genome sequence derived from a normal tissue taken from the same donor. However, there are a variety of common scenar...

Descripción completa

Detalles Bibliográficos
Autores principales: Kalatskaya, Irina, Trinh, Quang M., Spears, Melanie, McPherson, John D., Bartlett, John M. S., Stein, Lincoln
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5490163/
https://www.ncbi.nlm.nih.gov/pubmed/28659176
http://dx.doi.org/10.1186/s13073-017-0446-9
_version_ 1783246930949701632
author Kalatskaya, Irina
Trinh, Quang M.
Spears, Melanie
McPherson, John D.
Bartlett, John M. S.
Stein, Lincoln
author_facet Kalatskaya, Irina
Trinh, Quang M.
Spears, Melanie
McPherson, John D.
Bartlett, John M. S.
Stein, Lincoln
author_sort Kalatskaya, Irina
collection PubMed
description BACKGROUND: A key step in cancer genome analysis is the identification of somatic mutations in the tumor. This is typically done by comparing the genome of the tumor to the reference genome sequence derived from a normal tissue taken from the same donor. However, there are a variety of common scenarios in which matched normal tissue is not available for comparison. RESULTS: In this work, we describe an algorithm to distinguish somatic single nucleotide variants (SNVs) in next-generation sequencing data from germline polymorphisms in the absence of normal samples using a machine learning approach. Our algorithm was evaluated using a family of supervised learning classifications across six different cancer types and ~1600 samples, including cell lines, fresh frozen tissues, and formalin-fixed paraffin-embedded tissues; we tested our algorithm with both deep targeted and whole-exome sequencing data. Our algorithm correctly classified between 95 and 98% of somatic mutations with F1-measure ranges from 75.9 to 98.6% depending on the tumor type. We have released the algorithm as a software package called ISOWN (Identification of SOmatic mutations Without matching Normal tissues). CONCLUSIONS: In this work, we describe the development, implementation, and validation of ISOWN, an accurate algorithm for predicting somatic mutations in cancer tissues in the absence of matching normal tissues. ISOWN is available as Open Source under Apache License 2.0 from https://github.com/ikalatskaya/ISOWN. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13073-017-0446-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5490163
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54901632017-06-30 ISOWN: accurate somatic mutation identification in the absence of normal tissue controls Kalatskaya, Irina Trinh, Quang M. Spears, Melanie McPherson, John D. Bartlett, John M. S. Stein, Lincoln Genome Med Software BACKGROUND: A key step in cancer genome analysis is the identification of somatic mutations in the tumor. This is typically done by comparing the genome of the tumor to the reference genome sequence derived from a normal tissue taken from the same donor. However, there are a variety of common scenarios in which matched normal tissue is not available for comparison. RESULTS: In this work, we describe an algorithm to distinguish somatic single nucleotide variants (SNVs) in next-generation sequencing data from germline polymorphisms in the absence of normal samples using a machine learning approach. Our algorithm was evaluated using a family of supervised learning classifications across six different cancer types and ~1600 samples, including cell lines, fresh frozen tissues, and formalin-fixed paraffin-embedded tissues; we tested our algorithm with both deep targeted and whole-exome sequencing data. Our algorithm correctly classified between 95 and 98% of somatic mutations with F1-measure ranges from 75.9 to 98.6% depending on the tumor type. We have released the algorithm as a software package called ISOWN (Identification of SOmatic mutations Without matching Normal tissues). CONCLUSIONS: In this work, we describe the development, implementation, and validation of ISOWN, an accurate algorithm for predicting somatic mutations in cancer tissues in the absence of matching normal tissues. ISOWN is available as Open Source under Apache License 2.0 from https://github.com/ikalatskaya/ISOWN. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13073-017-0446-9) contains supplementary material, which is available to authorized users. BioMed Central 2017-06-29 /pmc/articles/PMC5490163/ /pubmed/28659176 http://dx.doi.org/10.1186/s13073-017-0446-9 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Kalatskaya, Irina
Trinh, Quang M.
Spears, Melanie
McPherson, John D.
Bartlett, John M. S.
Stein, Lincoln
ISOWN: accurate somatic mutation identification in the absence of normal tissue controls
title ISOWN: accurate somatic mutation identification in the absence of normal tissue controls
title_full ISOWN: accurate somatic mutation identification in the absence of normal tissue controls
title_fullStr ISOWN: accurate somatic mutation identification in the absence of normal tissue controls
title_full_unstemmed ISOWN: accurate somatic mutation identification in the absence of normal tissue controls
title_short ISOWN: accurate somatic mutation identification in the absence of normal tissue controls
title_sort isown: accurate somatic mutation identification in the absence of normal tissue controls
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5490163/
https://www.ncbi.nlm.nih.gov/pubmed/28659176
http://dx.doi.org/10.1186/s13073-017-0446-9
work_keys_str_mv AT kalatskayairina isownaccuratesomaticmutationidentificationintheabsenceofnormaltissuecontrols
AT trinhquangm isownaccuratesomaticmutationidentificationintheabsenceofnormaltissuecontrols
AT spearsmelanie isownaccuratesomaticmutationidentificationintheabsenceofnormaltissuecontrols
AT mcphersonjohnd isownaccuratesomaticmutationidentificationintheabsenceofnormaltissuecontrols
AT bartlettjohnms isownaccuratesomaticmutationidentificationintheabsenceofnormaltissuecontrols
AT steinlincoln isownaccuratesomaticmutationidentificationintheabsenceofnormaltissuecontrols