Cargando…

Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis

Graph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference to represent the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European an...

Descripción completa

Detalles Bibliográficos
Autores principales: Tetikol, H. Serhat, Turgut, Deniz, Narci, Kubra, Budak, Gungor, Kalay, Ozem, Arslan, Elif, Demirkaya-Budak, Sinem, Dolgoborodov, Alexey, Kabakci-Zorlu, Duygu, Semenyuk, Vladimir, Jain, Amit, Davis-Dusenbery, Brandi N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9352875/
https://www.ncbi.nlm.nih.gov/pubmed/35927245
http://dx.doi.org/10.1038/s41467-022-31724-3
_version_ 1784762744491212800
author Tetikol, H. Serhat
Turgut, Deniz
Narci, Kubra
Budak, Gungor
Kalay, Ozem
Arslan, Elif
Demirkaya-Budak, Sinem
Dolgoborodov, Alexey
Kabakci-Zorlu, Duygu
Semenyuk, Vladimir
Jain, Amit
Davis-Dusenbery, Brandi N.
author_facet Tetikol, H. Serhat
Turgut, Deniz
Narci, Kubra
Budak, Gungor
Kalay, Ozem
Arslan, Elif
Demirkaya-Budak, Sinem
Dolgoborodov, Alexey
Kabakci-Zorlu, Duygu
Semenyuk, Vladimir
Jain, Amit
Davis-Dusenbery, Brandi N.
author_sort Tetikol, H. Serhat
collection PubMed
description Graph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference to represent the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European ancestries. While there have been many efforts to develop computationally efficient graph-based toolkits for NGS read alignment and variant calling, methods to curate genomic variants and subsequently construct genome graphs remain an understudied problem that inevitably determines the effectiveness of the overall bioinformatics pipeline. In this study, we discuss obstacles encountered during graph construction and propose methods for sample selection based on population diversity, graph augmentation with structural variants and resolution of graph reference ambiguity caused by information overload. Moreover, we present the case for iteratively augmenting tailored genome graphs for targeted populations and demonstrate this approach on the whole-genome samples of African ancestry. Our results show that population-specific graphs, as more representative alternatives to linear or generic graph references, can achieve significantly lower read mapping errors and enhanced variant calling sensitivity, in addition to providing the improvements of joint variant calling without the need of computationally intensive post-processing steps.
format Online
Article
Text
id pubmed-9352875
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-93528752022-08-06 Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis Tetikol, H. Serhat Turgut, Deniz Narci, Kubra Budak, Gungor Kalay, Ozem Arslan, Elif Demirkaya-Budak, Sinem Dolgoborodov, Alexey Kabakci-Zorlu, Duygu Semenyuk, Vladimir Jain, Amit Davis-Dusenbery, Brandi N. Nat Commun Article Graph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference to represent the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European ancestries. While there have been many efforts to develop computationally efficient graph-based toolkits for NGS read alignment and variant calling, methods to curate genomic variants and subsequently construct genome graphs remain an understudied problem that inevitably determines the effectiveness of the overall bioinformatics pipeline. In this study, we discuss obstacles encountered during graph construction and propose methods for sample selection based on population diversity, graph augmentation with structural variants and resolution of graph reference ambiguity caused by information overload. Moreover, we present the case for iteratively augmenting tailored genome graphs for targeted populations and demonstrate this approach on the whole-genome samples of African ancestry. Our results show that population-specific graphs, as more representative alternatives to linear or generic graph references, can achieve significantly lower read mapping errors and enhanced variant calling sensitivity, in addition to providing the improvements of joint variant calling without the need of computationally intensive post-processing steps. Nature Publishing Group UK 2022-08-04 /pmc/articles/PMC9352875/ /pubmed/35927245 http://dx.doi.org/10.1038/s41467-022-31724-3 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Tetikol, H. Serhat
Turgut, Deniz
Narci, Kubra
Budak, Gungor
Kalay, Ozem
Arslan, Elif
Demirkaya-Budak, Sinem
Dolgoborodov, Alexey
Kabakci-Zorlu, Duygu
Semenyuk, Vladimir
Jain, Amit
Davis-Dusenbery, Brandi N.
Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis
title Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis
title_full Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis
title_fullStr Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis
title_full_unstemmed Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis
title_short Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis
title_sort pan-african genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9352875/
https://www.ncbi.nlm.nih.gov/pubmed/35927245
http://dx.doi.org/10.1038/s41467-022-31724-3
work_keys_str_mv AT tetikolhserhat panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis
AT turgutdeniz panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis
AT narcikubra panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis
AT budakgungor panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis
AT kalayozem panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis
AT arslanelif panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis
AT demirkayabudaksinem panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis
AT dolgoborodovalexey panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis
AT kabakcizorluduygu panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis
AT semenyukvladimir panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis
AT jainamit panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis
AT davisdusenberybrandin panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis