Cargando…
Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis
Graph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference to represent the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European an...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9352875/ https://www.ncbi.nlm.nih.gov/pubmed/35927245 http://dx.doi.org/10.1038/s41467-022-31724-3 |
_version_ | 1784762744491212800 |
---|---|
author | Tetikol, H. Serhat Turgut, Deniz Narci, Kubra Budak, Gungor Kalay, Ozem Arslan, Elif Demirkaya-Budak, Sinem Dolgoborodov, Alexey Kabakci-Zorlu, Duygu Semenyuk, Vladimir Jain, Amit Davis-Dusenbery, Brandi N. |
author_facet | Tetikol, H. Serhat Turgut, Deniz Narci, Kubra Budak, Gungor Kalay, Ozem Arslan, Elif Demirkaya-Budak, Sinem Dolgoborodov, Alexey Kabakci-Zorlu, Duygu Semenyuk, Vladimir Jain, Amit Davis-Dusenbery, Brandi N. |
author_sort | Tetikol, H. Serhat |
collection | PubMed |
description | Graph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference to represent the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European ancestries. While there have been many efforts to develop computationally efficient graph-based toolkits for NGS read alignment and variant calling, methods to curate genomic variants and subsequently construct genome graphs remain an understudied problem that inevitably determines the effectiveness of the overall bioinformatics pipeline. In this study, we discuss obstacles encountered during graph construction and propose methods for sample selection based on population diversity, graph augmentation with structural variants and resolution of graph reference ambiguity caused by information overload. Moreover, we present the case for iteratively augmenting tailored genome graphs for targeted populations and demonstrate this approach on the whole-genome samples of African ancestry. Our results show that population-specific graphs, as more representative alternatives to linear or generic graph references, can achieve significantly lower read mapping errors and enhanced variant calling sensitivity, in addition to providing the improvements of joint variant calling without the need of computationally intensive post-processing steps. |
format | Online Article Text |
id | pubmed-9352875 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-93528752022-08-06 Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis Tetikol, H. Serhat Turgut, Deniz Narci, Kubra Budak, Gungor Kalay, Ozem Arslan, Elif Demirkaya-Budak, Sinem Dolgoborodov, Alexey Kabakci-Zorlu, Duygu Semenyuk, Vladimir Jain, Amit Davis-Dusenbery, Brandi N. Nat Commun Article Graph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference to represent the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European ancestries. While there have been many efforts to develop computationally efficient graph-based toolkits for NGS read alignment and variant calling, methods to curate genomic variants and subsequently construct genome graphs remain an understudied problem that inevitably determines the effectiveness of the overall bioinformatics pipeline. In this study, we discuss obstacles encountered during graph construction and propose methods for sample selection based on population diversity, graph augmentation with structural variants and resolution of graph reference ambiguity caused by information overload. Moreover, we present the case for iteratively augmenting tailored genome graphs for targeted populations and demonstrate this approach on the whole-genome samples of African ancestry. Our results show that population-specific graphs, as more representative alternatives to linear or generic graph references, can achieve significantly lower read mapping errors and enhanced variant calling sensitivity, in addition to providing the improvements of joint variant calling without the need of computationally intensive post-processing steps. Nature Publishing Group UK 2022-08-04 /pmc/articles/PMC9352875/ /pubmed/35927245 http://dx.doi.org/10.1038/s41467-022-31724-3 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Tetikol, H. Serhat Turgut, Deniz Narci, Kubra Budak, Gungor Kalay, Ozem Arslan, Elif Demirkaya-Budak, Sinem Dolgoborodov, Alexey Kabakci-Zorlu, Duygu Semenyuk, Vladimir Jain, Amit Davis-Dusenbery, Brandi N. Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis |
title | Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis |
title_full | Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis |
title_fullStr | Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis |
title_full_unstemmed | Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis |
title_short | Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis |
title_sort | pan-african genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9352875/ https://www.ncbi.nlm.nih.gov/pubmed/35927245 http://dx.doi.org/10.1038/s41467-022-31724-3 |
work_keys_str_mv | AT tetikolhserhat panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis AT turgutdeniz panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis AT narcikubra panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis AT budakgungor panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis AT kalayozem panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis AT arslanelif panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis AT demirkayabudaksinem panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis AT dolgoborodovalexey panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis AT kabakcizorluduygu panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis AT semenyukvladimir panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis AT jainamit panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis AT davisdusenberybrandin panafricangenomedemonstrateshowpopulationspecificgenomegraphsimprovehighthroughputsequencingdataanalysis |