Cargando…

A study on fast calling variants from next-generation sequencing data using decision tree

BACKGROUND: The rapid development of next-generation sequencing (NGS) technology has continuously been refreshing the throughput of sequencing data. However, due to the lack of a smart tool that is both fast and accurate, the analysis task for NGS data, especially those with low-coverage, remains ch...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Zhentang, Wang, Yi, Wang, Fei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5907718/
https://www.ncbi.nlm.nih.gov/pubmed/29673316
http://dx.doi.org/10.1186/s12859-018-2147-9
_version_ 1783315591795310592
author Li, Zhentang
Wang, Yi
Wang, Fei
author_facet Li, Zhentang
Wang, Yi
Wang, Fei
author_sort Li, Zhentang
collection PubMed
description BACKGROUND: The rapid development of next-generation sequencing (NGS) technology has continuously been refreshing the throughput of sequencing data. However, due to the lack of a smart tool that is both fast and accurate, the analysis task for NGS data, especially those with low-coverage, remains challenging. RESULTS: We proposed a decision-tree based variant calling algorithm. Experiments on a set of real data indicate that our algorithm achieves high accuracy and sensitivity for SNVs and indels and shows good adaptability on low-coverage data. In particular, our algorithm is obviously faster than 3 widely used tools in our experiments. CONCLUSIONS: We implemented our algorithm in a software named Fuwa and applied it together with 4 well-known variant callers, i.e., Platypus, GATK-UnifiedGenotyper, GATK-HaplotypeCaller and SAMtools, to three sequencing data sets of a well-studied sample NA12878, which were produced by whole-genome, whole-exome and low-coverage whole-genome sequencing technology respectively. We also conducted additional experiments on the WGS data of 4 newly released samples that have not been used to populate dbSNP.
format Online
Article
Text
id pubmed-5907718
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-59077182018-04-30 A study on fast calling variants from next-generation sequencing data using decision tree Li, Zhentang Wang, Yi Wang, Fei BMC Bioinformatics Methodology Article BACKGROUND: The rapid development of next-generation sequencing (NGS) technology has continuously been refreshing the throughput of sequencing data. However, due to the lack of a smart tool that is both fast and accurate, the analysis task for NGS data, especially those with low-coverage, remains challenging. RESULTS: We proposed a decision-tree based variant calling algorithm. Experiments on a set of real data indicate that our algorithm achieves high accuracy and sensitivity for SNVs and indels and shows good adaptability on low-coverage data. In particular, our algorithm is obviously faster than 3 widely used tools in our experiments. CONCLUSIONS: We implemented our algorithm in a software named Fuwa and applied it together with 4 well-known variant callers, i.e., Platypus, GATK-UnifiedGenotyper, GATK-HaplotypeCaller and SAMtools, to three sequencing data sets of a well-studied sample NA12878, which were produced by whole-genome, whole-exome and low-coverage whole-genome sequencing technology respectively. We also conducted additional experiments on the WGS data of 4 newly released samples that have not been used to populate dbSNP. BioMed Central 2018-04-19 /pmc/articles/PMC5907718/ /pubmed/29673316 http://dx.doi.org/10.1186/s12859-018-2147-9 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Li, Zhentang
Wang, Yi
Wang, Fei
A study on fast calling variants from next-generation sequencing data using decision tree
title A study on fast calling variants from next-generation sequencing data using decision tree
title_full A study on fast calling variants from next-generation sequencing data using decision tree
title_fullStr A study on fast calling variants from next-generation sequencing data using decision tree
title_full_unstemmed A study on fast calling variants from next-generation sequencing data using decision tree
title_short A study on fast calling variants from next-generation sequencing data using decision tree
title_sort study on fast calling variants from next-generation sequencing data using decision tree
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5907718/
https://www.ncbi.nlm.nih.gov/pubmed/29673316
http://dx.doi.org/10.1186/s12859-018-2147-9
work_keys_str_mv AT lizhentang astudyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree
AT wangyi astudyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree
AT wangfei astudyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree
AT lizhentang studyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree
AT wangyi studyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree
AT wangfei studyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree