Cargando…

Improving somatic variant identification through integration of genome and exome data

BACKGROUND: Cost-effective high-throughput sequencing technologies, together with efficient mapping and variant calling tools, have made it possible to identify somatic variants for cancer study. However, integrating somatic variants from whole exome and whole genome studies poses a challenge to res...

Descripción completa

Detalles Bibliográficos
Autores principales: Vijayan, Vinaya, Yiu, Siu-Ming, Zhang, Liqing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5657037/
https://www.ncbi.nlm.nih.gov/pubmed/29513195
http://dx.doi.org/10.1186/s12864-017-4134-3
_version_ 1783273807456239616
author Vijayan, Vinaya
Yiu, Siu-Ming
Zhang, Liqing
author_facet Vijayan, Vinaya
Yiu, Siu-Ming
Zhang, Liqing
author_sort Vijayan, Vinaya
collection PubMed
description BACKGROUND: Cost-effective high-throughput sequencing technologies, together with efficient mapping and variant calling tools, have made it possible to identify somatic variants for cancer study. However, integrating somatic variants from whole exome and whole genome studies poses a challenge to researchers as the variants identified by whole genome analysis may not be identified by whole exome analysis and vice versa. Simply taking the union or intersection of the results may lead to too many false positives or too many false negatives. RESULTS: To tackle this problem, we use machine learning models to integrate whole exome and whole genome calling results from two representative tools, VCMM (with the highest sensitivity but very low precision) and MuTect (with the highest precision). The evaluation results, based on both simulated and real data, show that our framework improves somatic variant calling, and is more accurate in identifying somatic variants than either individual method used alone or using variants identified from only whole genome data or only whole exome data. CONCLUSION: Using machine learning approach to combine results from multiple calling methods on multiple data platforms (e.g., genome and exome) enables more accurate identification of somatic variants.
format Online
Article
Text
id pubmed-5657037
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56570372017-10-31 Improving somatic variant identification through integration of genome and exome data Vijayan, Vinaya Yiu, Siu-Ming Zhang, Liqing BMC Genomics Research BACKGROUND: Cost-effective high-throughput sequencing technologies, together with efficient mapping and variant calling tools, have made it possible to identify somatic variants for cancer study. However, integrating somatic variants from whole exome and whole genome studies poses a challenge to researchers as the variants identified by whole genome analysis may not be identified by whole exome analysis and vice versa. Simply taking the union or intersection of the results may lead to too many false positives or too many false negatives. RESULTS: To tackle this problem, we use machine learning models to integrate whole exome and whole genome calling results from two representative tools, VCMM (with the highest sensitivity but very low precision) and MuTect (with the highest precision). The evaluation results, based on both simulated and real data, show that our framework improves somatic variant calling, and is more accurate in identifying somatic variants than either individual method used alone or using variants identified from only whole genome data or only whole exome data. CONCLUSION: Using machine learning approach to combine results from multiple calling methods on multiple data platforms (e.g., genome and exome) enables more accurate identification of somatic variants. BioMed Central 2017-10-16 /pmc/articles/PMC5657037/ /pubmed/29513195 http://dx.doi.org/10.1186/s12864-017-4134-3 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Vijayan, Vinaya
Yiu, Siu-Ming
Zhang, Liqing
Improving somatic variant identification through integration of genome and exome data
title Improving somatic variant identification through integration of genome and exome data
title_full Improving somatic variant identification through integration of genome and exome data
title_fullStr Improving somatic variant identification through integration of genome and exome data
title_full_unstemmed Improving somatic variant identification through integration of genome and exome data
title_short Improving somatic variant identification through integration of genome and exome data
title_sort improving somatic variant identification through integration of genome and exome data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5657037/
https://www.ncbi.nlm.nih.gov/pubmed/29513195
http://dx.doi.org/10.1186/s12864-017-4134-3
work_keys_str_mv AT vijayanvinaya improvingsomaticvariantidentificationthroughintegrationofgenomeandexomedata
AT yiusiuming improvingsomaticvariantidentificationthroughintegrationofgenomeandexomedata
AT zhangliqing improvingsomaticvariantidentificationthroughintegrationofgenomeandexomedata