Cargando…

DisVar: an R library for identifying variants associated with diseases using large-scale personal genetic information

BACKGROUND: Genetic variants may potentially play a contributing factor in the development of diseases. Several genetic disease databases are used in medical research and diagnosis but the web applications used to search these databases for disease-associated variants have limitations. The applicati...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chanasongkhram, Khunanon, Damkliang, Kasikrit, Sangket, Unitsa
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2023
Materias:	Bioinformatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10542659/ https://www.ncbi.nlm.nih.gov/pubmed/37790633 http://dx.doi.org/10.7717/peerj.16086

_version_	1785114138536574976
author	Chanasongkhram, Khunanon Damkliang, Kasikrit Sangket, Unitsa
author_facet	Chanasongkhram, Khunanon Damkliang, Kasikrit Sangket, Unitsa
author_sort	Chanasongkhram, Khunanon
collection	PubMed
description	BACKGROUND: Genetic variants may potentially play a contributing factor in the development of diseases. Several genetic disease databases are used in medical research and diagnosis but the web applications used to search these databases for disease-associated variants have limitations. The application may not be able to search for large-scale genetic variants, the results of searches may be difficult to interpret and variants mapped from the latest reference genome (GRCH38/hg38) may not be supported. METHODS: In this study, we developed a novel R library called “DisVar” to identify disease-associated genetic variants in large-scale individual genomic data. This R library is compatible with variants from the latest reference genome version. DisVar uses five databases of disease-associated variants. Over 100 million variants can be simultaneously searched for specific associated diseases. RESULTS: The package was evaluated using 24 Variant Call Format (VCF) files (215,054 to 11,346,899 sites) from the 1000 Genomes Project. Disease-associated variants were detected in 298,227 hits across all the VCF files, taking a total of 63.58 m to complete. The package was also tested on ClinVar’s VCF file (2,120,558 variants), where 20,657 hits associated with diseases were identified with an estimated elapsed time of 45.98 s. CONCLUSIONS: DisVar can overcome the limitations of existing tools and is a fast and effective diagnostic and preventive tool that identifies disease-associated variations from large-scale genetic variants against the latest reference genome.
format	Online Article Text
id	pubmed-10542659
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-105426592023-10-03 DisVar: an R library for identifying variants associated with diseases using large-scale personal genetic information Chanasongkhram, Khunanon Damkliang, Kasikrit Sangket, Unitsa PeerJ Bioinformatics BACKGROUND: Genetic variants may potentially play a contributing factor in the development of diseases. Several genetic disease databases are used in medical research and diagnosis but the web applications used to search these databases for disease-associated variants have limitations. The application may not be able to search for large-scale genetic variants, the results of searches may be difficult to interpret and variants mapped from the latest reference genome (GRCH38/hg38) may not be supported. METHODS: In this study, we developed a novel R library called “DisVar” to identify disease-associated genetic variants in large-scale individual genomic data. This R library is compatible with variants from the latest reference genome version. DisVar uses five databases of disease-associated variants. Over 100 million variants can be simultaneously searched for specific associated diseases. RESULTS: The package was evaluated using 24 Variant Call Format (VCF) files (215,054 to 11,346,899 sites) from the 1000 Genomes Project. Disease-associated variants were detected in 298,227 hits across all the VCF files, taking a total of 63.58 m to complete. The package was also tested on ClinVar’s VCF file (2,120,558 variants), where 20,657 hits associated with diseases were identified with an estimated elapsed time of 45.98 s. CONCLUSIONS: DisVar can overcome the limitations of existing tools and is a fast and effective diagnostic and preventive tool that identifies disease-associated variations from large-scale genetic variants against the latest reference genome. PeerJ Inc. 2023-09-28 /pmc/articles/PMC10542659/ /pubmed/37790633 http://dx.doi.org/10.7717/peerj.16086 Text en ©2023 Chanasongkhram et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle	Bioinformatics Chanasongkhram, Khunanon Damkliang, Kasikrit Sangket, Unitsa DisVar: an R library for identifying variants associated with diseases using large-scale personal genetic information
title	DisVar: an R library for identifying variants associated with diseases using large-scale personal genetic information
title_full	DisVar: an R library for identifying variants associated with diseases using large-scale personal genetic information
title_fullStr	DisVar: an R library for identifying variants associated with diseases using large-scale personal genetic information
title_full_unstemmed	DisVar: an R library for identifying variants associated with diseases using large-scale personal genetic information
title_short	DisVar: an R library for identifying variants associated with diseases using large-scale personal genetic information
title_sort	disvar: an r library for identifying variants associated with diseases using large-scale personal genetic information
topic	Bioinformatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10542659/ https://www.ncbi.nlm.nih.gov/pubmed/37790633 http://dx.doi.org/10.7717/peerj.16086
work_keys_str_mv	AT chanasongkhramkhunanon disvaranrlibraryforidentifyingvariantsassociatedwithdiseasesusinglargescalepersonalgeneticinformation AT damkliangkasikrit disvaranrlibraryforidentifyingvariantsassociatedwithdiseasesusinglargescalepersonalgeneticinformation AT sangketunitsa disvaranrlibraryforidentifyingvariantsassociatedwithdiseasesusinglargescalepersonalgeneticinformation

DisVar: an R library for identifying variants associated with diseases using large-scale personal genetic information

Ejemplares similares