Cargando…

A novel machine learning approach (svmSomatic) to distinguish somatic and germline mutations using next-generation sequencing data

Somatic mutations are a large category of genetic variations, which play an essential role in tumorigenesis. Detection of somatic single nucleotide variants (SNVs) could facilitate downstream analysis of tumorigenesis. Many computational methods have been developed to detect SNVs, but most require n...

Descripción completa

Detalles Bibliográficos
Autores principales: Mao, Yu-Fang, Yuan, Xi-Guo, Cun, Yu-Peng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Science Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7995270/
https://www.ncbi.nlm.nih.gov/pubmed/33709636
http://dx.doi.org/10.24272/j.issn.2095-8137.2021.014
_version_ 1783669887969787904
author Mao, Yu-Fang
Yuan, Xi-Guo
Cun, Yu-Peng
author_facet Mao, Yu-Fang
Yuan, Xi-Guo
Cun, Yu-Peng
author_sort Mao, Yu-Fang
collection PubMed
description Somatic mutations are a large category of genetic variations, which play an essential role in tumorigenesis. Detection of somatic single nucleotide variants (SNVs) could facilitate downstream analysis of tumorigenesis. Many computational methods have been developed to detect SNVs, but most require normal matched samples to differentiate somatic SNVs from the normal state, which can be difficult to obtain. Therefore, developing new approaches for detecting somatic SNVs without matched samples are crucial. In this work, we detected somatic mutations from individual tumor samples based on a novel machine learning approach, svmSomatic, using next-generation sequencing (NGS) data. In addition, as somatic SNV detection can be impacted by multiple mutations, with germline mutations and co-occurrence of copy number variations (CNVs) common in organisms, we used the novel approach to distinguish somatic and germline mutations based on the NGS data from individual tumor samples. In summary, svmSomatic: (1) considers the influence of CNV co-occurrence in detecting somatic mutations; and (2) trains a support vector machine algorithm to distinguish between somatic and germline mutations, without requiring normal matched samples. We further tested and compared svmSomatic with other common methods. Results showed that svmSomatic performance, as measured by F1-score, was significantly better than that of others using both simulation and real NGS data.
format Online
Article
Text
id pubmed-7995270
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Science Press
record_format MEDLINE/PubMed
spelling pubmed-79952702021-04-01 A novel machine learning approach (svmSomatic) to distinguish somatic and germline mutations using next-generation sequencing data Mao, Yu-Fang Yuan, Xi-Guo Cun, Yu-Peng Zool Res Letters to the Editor Somatic mutations are a large category of genetic variations, which play an essential role in tumorigenesis. Detection of somatic single nucleotide variants (SNVs) could facilitate downstream analysis of tumorigenesis. Many computational methods have been developed to detect SNVs, but most require normal matched samples to differentiate somatic SNVs from the normal state, which can be difficult to obtain. Therefore, developing new approaches for detecting somatic SNVs without matched samples are crucial. In this work, we detected somatic mutations from individual tumor samples based on a novel machine learning approach, svmSomatic, using next-generation sequencing (NGS) data. In addition, as somatic SNV detection can be impacted by multiple mutations, with germline mutations and co-occurrence of copy number variations (CNVs) common in organisms, we used the novel approach to distinguish somatic and germline mutations based on the NGS data from individual tumor samples. In summary, svmSomatic: (1) considers the influence of CNV co-occurrence in detecting somatic mutations; and (2) trains a support vector machine algorithm to distinguish between somatic and germline mutations, without requiring normal matched samples. We further tested and compared svmSomatic with other common methods. Results showed that svmSomatic performance, as measured by F1-score, was significantly better than that of others using both simulation and real NGS data. Science Press 2021-03-18 /pmc/articles/PMC7995270/ /pubmed/33709636 http://dx.doi.org/10.24272/j.issn.2095-8137.2021.014 Text en Editorial Office of Zoological Research, Kunming Institute of Zoology, Chinese Academy of Sciences http://creativecommons.org/licenses/by-nc/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Letters to the Editor
Mao, Yu-Fang
Yuan, Xi-Guo
Cun, Yu-Peng
A novel machine learning approach (svmSomatic) to distinguish somatic and germline mutations using next-generation sequencing data
title A novel machine learning approach (svmSomatic) to distinguish somatic and germline mutations using next-generation sequencing data
title_full A novel machine learning approach (svmSomatic) to distinguish somatic and germline mutations using next-generation sequencing data
title_fullStr A novel machine learning approach (svmSomatic) to distinguish somatic and germline mutations using next-generation sequencing data
title_full_unstemmed A novel machine learning approach (svmSomatic) to distinguish somatic and germline mutations using next-generation sequencing data
title_short A novel machine learning approach (svmSomatic) to distinguish somatic and germline mutations using next-generation sequencing data
title_sort novel machine learning approach (svmsomatic) to distinguish somatic and germline mutations using next-generation sequencing data
topic Letters to the Editor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7995270/
https://www.ncbi.nlm.nih.gov/pubmed/33709636
http://dx.doi.org/10.24272/j.issn.2095-8137.2021.014
work_keys_str_mv AT maoyufang anovelmachinelearningapproachsvmsomatictodistinguishsomaticandgermlinemutationsusingnextgenerationsequencingdata
AT yuanxiguo anovelmachinelearningapproachsvmsomatictodistinguishsomaticandgermlinemutationsusingnextgenerationsequencingdata
AT cunyupeng anovelmachinelearningapproachsvmsomatictodistinguishsomaticandgermlinemutationsusingnextgenerationsequencingdata
AT maoyufang novelmachinelearningapproachsvmsomatictodistinguishsomaticandgermlinemutationsusingnextgenerationsequencingdata
AT yuanxiguo novelmachinelearningapproachsvmsomatictodistinguishsomaticandgermlinemutationsusingnextgenerationsequencingdata
AT cunyupeng novelmachinelearningapproachsvmsomatictodistinguishsomaticandgermlinemutationsusingnextgenerationsequencingdata