Cargando…

GTX.Digest.VCF: an online NGS data interpretation system based on intelligent gene ranking and large-scale text mining

BACKGROUND: An important task in the interpretation of sequencing data is to highlight pathogenic genes (or detrimental variants) in the field of Mendelian diseases. It is still challenging despite the recent rapid development of genomics and bioinformatics. A typical interpretation workflow include...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Yanhuang, Wu, Chengkun, Zhang, Yanghui, Zhang, Shaowei, Yu, Shuojun, Lei, Peng, Lu, Qin, Xi, Yanwei, Wang, Hua, Song, Zhuo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6923899/
https://www.ncbi.nlm.nih.gov/pubmed/31856831
http://dx.doi.org/10.1186/s12920-019-0637-x
_version_ 1783481616831610880
author Jiang, Yanhuang
Wu, Chengkun
Zhang, Yanghui
Zhang, Shaowei
Yu, Shuojun
Lei, Peng
Lu, Qin
Xi, Yanwei
Wang, Hua
Song, Zhuo
author_facet Jiang, Yanhuang
Wu, Chengkun
Zhang, Yanghui
Zhang, Shaowei
Yu, Shuojun
Lei, Peng
Lu, Qin
Xi, Yanwei
Wang, Hua
Song, Zhuo
author_sort Jiang, Yanhuang
collection PubMed
description BACKGROUND: An important task in the interpretation of sequencing data is to highlight pathogenic genes (or detrimental variants) in the field of Mendelian diseases. It is still challenging despite the recent rapid development of genomics and bioinformatics. A typical interpretation workflow includes annotation, filtration, manual inspection and literature review. Those steps are time-consuming and error-prone in the absence of systematic support. Therefore, we developed GTX.Digest.VCF, an online DNA sequencing interpretation system, which prioritizes genes and variants for novel disease-gene relation discovery and integrates text mining results to provide literature evidence for the discovery. Its phenotype-driven ranking and biological data mining approach significantly speed up the whole interpretation process. RESULTS: The GTX.Digest.VCF system is freely available as a web portal at http://vcf.gtxlab.com for academic research. Evaluation on the DDD project dataset demonstrates an accuracy of 77% (235 out of 305 cases) for top-50 genes and an accuracy of 41.6% (127 out of 305 cases) for top-5 genes. CONCLUSIONS: GTX.Digest.VCF provides an intelligent web portal for genomics data interpretation via the integration of bioinformatics tools, distributed parallel computing, biomedical text mining. It can facilitate the application of genomic analytics in clinical research and practices.
format Online
Article
Text
id pubmed-6923899
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69238992019-12-30 GTX.Digest.VCF: an online NGS data interpretation system based on intelligent gene ranking and large-scale text mining Jiang, Yanhuang Wu, Chengkun Zhang, Yanghui Zhang, Shaowei Yu, Shuojun Lei, Peng Lu, Qin Xi, Yanwei Wang, Hua Song, Zhuo BMC Med Genomics Software BACKGROUND: An important task in the interpretation of sequencing data is to highlight pathogenic genes (or detrimental variants) in the field of Mendelian diseases. It is still challenging despite the recent rapid development of genomics and bioinformatics. A typical interpretation workflow includes annotation, filtration, manual inspection and literature review. Those steps are time-consuming and error-prone in the absence of systematic support. Therefore, we developed GTX.Digest.VCF, an online DNA sequencing interpretation system, which prioritizes genes and variants for novel disease-gene relation discovery and integrates text mining results to provide literature evidence for the discovery. Its phenotype-driven ranking and biological data mining approach significantly speed up the whole interpretation process. RESULTS: The GTX.Digest.VCF system is freely available as a web portal at http://vcf.gtxlab.com for academic research. Evaluation on the DDD project dataset demonstrates an accuracy of 77% (235 out of 305 cases) for top-50 genes and an accuracy of 41.6% (127 out of 305 cases) for top-5 genes. CONCLUSIONS: GTX.Digest.VCF provides an intelligent web portal for genomics data interpretation via the integration of bioinformatics tools, distributed parallel computing, biomedical text mining. It can facilitate the application of genomic analytics in clinical research and practices. BioMed Central 2019-12-20 /pmc/articles/PMC6923899/ /pubmed/31856831 http://dx.doi.org/10.1186/s12920-019-0637-x Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Jiang, Yanhuang
Wu, Chengkun
Zhang, Yanghui
Zhang, Shaowei
Yu, Shuojun
Lei, Peng
Lu, Qin
Xi, Yanwei
Wang, Hua
Song, Zhuo
GTX.Digest.VCF: an online NGS data interpretation system based on intelligent gene ranking and large-scale text mining
title GTX.Digest.VCF: an online NGS data interpretation system based on intelligent gene ranking and large-scale text mining
title_full GTX.Digest.VCF: an online NGS data interpretation system based on intelligent gene ranking and large-scale text mining
title_fullStr GTX.Digest.VCF: an online NGS data interpretation system based on intelligent gene ranking and large-scale text mining
title_full_unstemmed GTX.Digest.VCF: an online NGS data interpretation system based on intelligent gene ranking and large-scale text mining
title_short GTX.Digest.VCF: an online NGS data interpretation system based on intelligent gene ranking and large-scale text mining
title_sort gtx.digest.vcf: an online ngs data interpretation system based on intelligent gene ranking and large-scale text mining
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6923899/
https://www.ncbi.nlm.nih.gov/pubmed/31856831
http://dx.doi.org/10.1186/s12920-019-0637-x
work_keys_str_mv AT jiangyanhuang gtxdigestvcfanonlinengsdatainterpretationsystembasedonintelligentgenerankingandlargescaletextmining
AT wuchengkun gtxdigestvcfanonlinengsdatainterpretationsystembasedonintelligentgenerankingandlargescaletextmining
AT zhangyanghui gtxdigestvcfanonlinengsdatainterpretationsystembasedonintelligentgenerankingandlargescaletextmining
AT zhangshaowei gtxdigestvcfanonlinengsdatainterpretationsystembasedonintelligentgenerankingandlargescaletextmining
AT yushuojun gtxdigestvcfanonlinengsdatainterpretationsystembasedonintelligentgenerankingandlargescaletextmining
AT leipeng gtxdigestvcfanonlinengsdatainterpretationsystembasedonintelligentgenerankingandlargescaletextmining
AT luqin gtxdigestvcfanonlinengsdatainterpretationsystembasedonintelligentgenerankingandlargescaletextmining
AT xiyanwei gtxdigestvcfanonlinengsdatainterpretationsystembasedonintelligentgenerankingandlargescaletextmining
AT wanghua gtxdigestvcfanonlinengsdatainterpretationsystembasedonintelligentgenerankingandlargescaletextmining
AT songzhuo gtxdigestvcfanonlinengsdatainterpretationsystembasedonintelligentgenerankingandlargescaletextmining