Cargando…
A study on fast calling variants from next-generation sequencing data using decision tree
BACKGROUND: The rapid development of next-generation sequencing (NGS) technology has continuously been refreshing the throughput of sequencing data. However, due to the lack of a smart tool that is both fast and accurate, the analysis task for NGS data, especially those with low-coverage, remains ch...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5907718/ https://www.ncbi.nlm.nih.gov/pubmed/29673316 http://dx.doi.org/10.1186/s12859-018-2147-9 |
_version_ | 1783315591795310592 |
---|---|
author | Li, Zhentang Wang, Yi Wang, Fei |
author_facet | Li, Zhentang Wang, Yi Wang, Fei |
author_sort | Li, Zhentang |
collection | PubMed |
description | BACKGROUND: The rapid development of next-generation sequencing (NGS) technology has continuously been refreshing the throughput of sequencing data. However, due to the lack of a smart tool that is both fast and accurate, the analysis task for NGS data, especially those with low-coverage, remains challenging. RESULTS: We proposed a decision-tree based variant calling algorithm. Experiments on a set of real data indicate that our algorithm achieves high accuracy and sensitivity for SNVs and indels and shows good adaptability on low-coverage data. In particular, our algorithm is obviously faster than 3 widely used tools in our experiments. CONCLUSIONS: We implemented our algorithm in a software named Fuwa and applied it together with 4 well-known variant callers, i.e., Platypus, GATK-UnifiedGenotyper, GATK-HaplotypeCaller and SAMtools, to three sequencing data sets of a well-studied sample NA12878, which were produced by whole-genome, whole-exome and low-coverage whole-genome sequencing technology respectively. We also conducted additional experiments on the WGS data of 4 newly released samples that have not been used to populate dbSNP. |
format | Online Article Text |
id | pubmed-5907718 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-59077182018-04-30 A study on fast calling variants from next-generation sequencing data using decision tree Li, Zhentang Wang, Yi Wang, Fei BMC Bioinformatics Methodology Article BACKGROUND: The rapid development of next-generation sequencing (NGS) technology has continuously been refreshing the throughput of sequencing data. However, due to the lack of a smart tool that is both fast and accurate, the analysis task for NGS data, especially those with low-coverage, remains challenging. RESULTS: We proposed a decision-tree based variant calling algorithm. Experiments on a set of real data indicate that our algorithm achieves high accuracy and sensitivity for SNVs and indels and shows good adaptability on low-coverage data. In particular, our algorithm is obviously faster than 3 widely used tools in our experiments. CONCLUSIONS: We implemented our algorithm in a software named Fuwa and applied it together with 4 well-known variant callers, i.e., Platypus, GATK-UnifiedGenotyper, GATK-HaplotypeCaller and SAMtools, to three sequencing data sets of a well-studied sample NA12878, which were produced by whole-genome, whole-exome and low-coverage whole-genome sequencing technology respectively. We also conducted additional experiments on the WGS data of 4 newly released samples that have not been used to populate dbSNP. BioMed Central 2018-04-19 /pmc/articles/PMC5907718/ /pubmed/29673316 http://dx.doi.org/10.1186/s12859-018-2147-9 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Li, Zhentang Wang, Yi Wang, Fei A study on fast calling variants from next-generation sequencing data using decision tree |
title | A study on fast calling variants from next-generation sequencing data using decision tree |
title_full | A study on fast calling variants from next-generation sequencing data using decision tree |
title_fullStr | A study on fast calling variants from next-generation sequencing data using decision tree |
title_full_unstemmed | A study on fast calling variants from next-generation sequencing data using decision tree |
title_short | A study on fast calling variants from next-generation sequencing data using decision tree |
title_sort | study on fast calling variants from next-generation sequencing data using decision tree |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5907718/ https://www.ncbi.nlm.nih.gov/pubmed/29673316 http://dx.doi.org/10.1186/s12859-018-2147-9 |
work_keys_str_mv | AT lizhentang astudyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree AT wangyi astudyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree AT wangfei astudyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree AT lizhentang studyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree AT wangyi studyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree AT wangfei studyonfastcallingvariantsfromnextgenerationsequencingdatausingdecisiontree |