Cargando…

Computational graph pangenomics: a tutorial on data structures and their applications

Computational pangenomics is an emerging research field that is changing the way computer scientists are facing challenges in biological sequence analysis. In past decades, contributions from combinatorics, stringology, graph theory and data structures were essential in the development of a plethora...

Descripción completa

Detalles Bibliográficos
Autores principales:	Baaijens, Jasmijn A., Bonizzoni, Paola, Boucher, Christina, Della Vedova, Gianluca, Pirola, Yuri, Rizzi, Raffaella, Sirén, Jouni
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10038355/ https://www.ncbi.nlm.nih.gov/pubmed/36969737 http://dx.doi.org/10.1007/s11047-022-09882-6

_version_	1784912061982048256
author	Baaijens, Jasmijn A. Bonizzoni, Paola Boucher, Christina Della Vedova, Gianluca Pirola, Yuri Rizzi, Raffaella Sirén, Jouni
author_facet	Baaijens, Jasmijn A. Bonizzoni, Paola Boucher, Christina Della Vedova, Gianluca Pirola, Yuri Rizzi, Raffaella Sirén, Jouni
author_sort	Baaijens, Jasmijn A.
collection	PubMed
description	Computational pangenomics is an emerging research field that is changing the way computer scientists are facing challenges in biological sequence analysis. In past decades, contributions from combinatorics, stringology, graph theory and data structures were essential in the development of a plethora of software tools for the analysis of the human genome. These tools allowed computational biologists to approach ambitious projects at population scale, such as the 1000 Genomes Project. A major contribution of the 1000 Genomes Project is the characterization of a broad spectrum of genetic variations in the human genome, including the discovery of novel variations in the South Asian, African and European populations—thus enhancing the catalogue of variability within the reference genome. Currently, the need to take into account the high variability in population genomes as well as the specificity of an individual genome in a personalized approach to medicine is rapidly pushing the abandonment of the traditional paradigm of using a single reference genome. A graph-based representation of multiple genomes, or a graph pangenome, is replacing the linear reference genome. This means completely rethinking well-established procedures to analyze, store, and access information from genome representations. Properly addressing these challenges is crucial to face the computational tasks of ambitious healthcare projects aiming to characterize human diversity by sequencing 1M individuals (Stark et al. 2019). This tutorial aims to introduce readers to the most recent advances in the theory of data structures for the representation of graph pangenomes. We discuss efficient representations of haplotypes and the variability of genotypes in graph pangenomes, and highlight applications in solving computational problems in human and microbial (viral) pangenomes.
format	Online Article Text
id	pubmed-10038355
institution	National Center for Biotechnology Information
language	English
publishDate	2022
record_format	MEDLINE/PubMed
spelling	pubmed-100383552023-03-24 Computational graph pangenomics: a tutorial on data structures and their applications Baaijens, Jasmijn A. Bonizzoni, Paola Boucher, Christina Della Vedova, Gianluca Pirola, Yuri Rizzi, Raffaella Sirén, Jouni Nat Comput Article Computational pangenomics is an emerging research field that is changing the way computer scientists are facing challenges in biological sequence analysis. In past decades, contributions from combinatorics, stringology, graph theory and data structures were essential in the development of a plethora of software tools for the analysis of the human genome. These tools allowed computational biologists to approach ambitious projects at population scale, such as the 1000 Genomes Project. A major contribution of the 1000 Genomes Project is the characterization of a broad spectrum of genetic variations in the human genome, including the discovery of novel variations in the South Asian, African and European populations—thus enhancing the catalogue of variability within the reference genome. Currently, the need to take into account the high variability in population genomes as well as the specificity of an individual genome in a personalized approach to medicine is rapidly pushing the abandonment of the traditional paradigm of using a single reference genome. A graph-based representation of multiple genomes, or a graph pangenome, is replacing the linear reference genome. This means completely rethinking well-established procedures to analyze, store, and access information from genome representations. Properly addressing these challenges is crucial to face the computational tasks of ambitious healthcare projects aiming to characterize human diversity by sequencing 1M individuals (Stark et al. 2019). This tutorial aims to introduce readers to the most recent advances in the theory of data structures for the representation of graph pangenomes. We discuss efficient representations of haplotypes and the variability of genotypes in graph pangenomes, and highlight applications in solving computational problems in human and microbial (viral) pangenomes. 2022-03 2022-03-04 /pmc/articles/PMC10038355/ /pubmed/36969737 http://dx.doi.org/10.1007/s11047-022-09882-6 Text en https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Baaijens, Jasmijn A. Bonizzoni, Paola Boucher, Christina Della Vedova, Gianluca Pirola, Yuri Rizzi, Raffaella Sirén, Jouni Computational graph pangenomics: a tutorial on data structures and their applications
title	Computational graph pangenomics: a tutorial on data structures and their applications
title_full	Computational graph pangenomics: a tutorial on data structures and their applications
title_fullStr	Computational graph pangenomics: a tutorial on data structures and their applications
title_full_unstemmed	Computational graph pangenomics: a tutorial on data structures and their applications
title_short	Computational graph pangenomics: a tutorial on data structures and their applications
title_sort	computational graph pangenomics: a tutorial on data structures and their applications
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10038355/ https://www.ncbi.nlm.nih.gov/pubmed/36969737 http://dx.doi.org/10.1007/s11047-022-09882-6
work_keys_str_mv	AT baaijensjasmijna computationalgraphpangenomicsatutorialondatastructuresandtheirapplications AT bonizzonipaola computationalgraphpangenomicsatutorialondatastructuresandtheirapplications AT boucherchristina computationalgraphpangenomicsatutorialondatastructuresandtheirapplications AT dellavedovagianluca computationalgraphpangenomicsatutorialondatastructuresandtheirapplications AT pirolayuri computationalgraphpangenomicsatutorialondatastructuresandtheirapplications AT rizziraffaella computationalgraphpangenomicsatutorialondatastructuresandtheirapplications AT sirenjouni computationalgraphpangenomicsatutorialondatastructuresandtheirapplications

Computational graph pangenomics: a tutorial on data structures and their applications

Ejemplares similares