Cargando…

DivBrowse—interactive visualization and exploratory data analysis of variant call matrices

BACKGROUND: The sequencing of whole genomes is becoming increasingly affordable. In this context, large-scale sequencing projects are generating ever larger datasets of species-specific genomic diversity. As a consequence, more and more genomic data need to be made easily accessible and analyzable t...

Descripción completa

Detalles Bibliográficos
Autores principales: König, Patrick, Beier, Sebastian, Mascher, Martin, Stein, Nils, Lange, Matthias, Scholz, Uwe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10120423/
https://www.ncbi.nlm.nih.gov/pubmed/37083938
http://dx.doi.org/10.1093/gigascience/giad025
_version_ 1785029177513082880
author König, Patrick
Beier, Sebastian
Mascher, Martin
Stein, Nils
Lange, Matthias
Scholz, Uwe
author_facet König, Patrick
Beier, Sebastian
Mascher, Martin
Stein, Nils
Lange, Matthias
Scholz, Uwe
author_sort König, Patrick
collection PubMed
description BACKGROUND: The sequencing of whole genomes is becoming increasingly affordable. In this context, large-scale sequencing projects are generating ever larger datasets of species-specific genomic diversity. As a consequence, more and more genomic data need to be made easily accessible and analyzable to the scientific community. FINDINGS: We present DivBrowse, a web application for interactive visualization and exploratory analysis of genomic diversity data stored in Variant Call Format (VCF) files of any size. By seamlessly combining BLAST as an entry point together with interactive data analysis features such as principal component analysis in one graphical user interface, DivBrowse provides a novel and unique set of exploratory data analysis capabilities for genomic biodiversity datasets. The capability to integrate DivBrowse into existing web applications supports interoperability between different web applications. Built-in interactive computation of principal component analysis allows users to perform ad hoc analysis of the population structure based on specific genetic elements such as genes and exons. Data interoperability is supported by the ability to export genomic diversity data in VCF and General Feature Format 3 files. CONCLUSION: DivBrowse offers a novel approach for interactive visualization and analysis of genomic diversity data and optionally also gene annotation data by including features like interactive calculation of variant frequencies and principal component analysis. The use of established standard file formats for data input supports interoperability and seamless deployment of application instances based on the data output of established bioinformatics pipelines.
format Online
Article
Text
id pubmed-10120423
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-101204232023-04-22 DivBrowse—interactive visualization and exploratory data analysis of variant call matrices König, Patrick Beier, Sebastian Mascher, Martin Stein, Nils Lange, Matthias Scholz, Uwe Gigascience Technical Note BACKGROUND: The sequencing of whole genomes is becoming increasingly affordable. In this context, large-scale sequencing projects are generating ever larger datasets of species-specific genomic diversity. As a consequence, more and more genomic data need to be made easily accessible and analyzable to the scientific community. FINDINGS: We present DivBrowse, a web application for interactive visualization and exploratory analysis of genomic diversity data stored in Variant Call Format (VCF) files of any size. By seamlessly combining BLAST as an entry point together with interactive data analysis features such as principal component analysis in one graphical user interface, DivBrowse provides a novel and unique set of exploratory data analysis capabilities for genomic biodiversity datasets. The capability to integrate DivBrowse into existing web applications supports interoperability between different web applications. Built-in interactive computation of principal component analysis allows users to perform ad hoc analysis of the population structure based on specific genetic elements such as genes and exons. Data interoperability is supported by the ability to export genomic diversity data in VCF and General Feature Format 3 files. CONCLUSION: DivBrowse offers a novel approach for interactive visualization and analysis of genomic diversity data and optionally also gene annotation data by including features like interactive calculation of variant frequencies and principal component analysis. The use of established standard file formats for data input supports interoperability and seamless deployment of application instances based on the data output of established bioinformatics pipelines. Oxford University Press 2023-04-21 /pmc/articles/PMC10120423/ /pubmed/37083938 http://dx.doi.org/10.1093/gigascience/giad025 Text en © The Author(s) 2023. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
König, Patrick
Beier, Sebastian
Mascher, Martin
Stein, Nils
Lange, Matthias
Scholz, Uwe
DivBrowse—interactive visualization and exploratory data analysis of variant call matrices
title DivBrowse—interactive visualization and exploratory data analysis of variant call matrices
title_full DivBrowse—interactive visualization and exploratory data analysis of variant call matrices
title_fullStr DivBrowse—interactive visualization and exploratory data analysis of variant call matrices
title_full_unstemmed DivBrowse—interactive visualization and exploratory data analysis of variant call matrices
title_short DivBrowse—interactive visualization and exploratory data analysis of variant call matrices
title_sort divbrowse—interactive visualization and exploratory data analysis of variant call matrices
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10120423/
https://www.ncbi.nlm.nih.gov/pubmed/37083938
http://dx.doi.org/10.1093/gigascience/giad025
work_keys_str_mv AT konigpatrick divbrowseinteractivevisualizationandexploratorydataanalysisofvariantcallmatrices
AT beiersebastian divbrowseinteractivevisualizationandexploratorydataanalysisofvariantcallmatrices
AT maschermartin divbrowseinteractivevisualizationandexploratorydataanalysisofvariantcallmatrices
AT steinnils divbrowseinteractivevisualizationandexploratorydataanalysisofvariantcallmatrices
AT langematthias divbrowseinteractivevisualizationandexploratorydataanalysisofvariantcallmatrices
AT scholzuwe divbrowseinteractivevisualizationandexploratorydataanalysisofvariantcallmatrices