Cargando…

Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database

BACKGROUND: Clinically effective and safe genotyping relies on correct reference sequences, often represented by haplotypes. The 1000 Genomes Project recorded individual genotypes across 26 different populations and, using computerized genotype phasing, reported haplotype data. In contrast, we ident...

Descripción completa

Detalles Bibliográficos
Autores principales: Srivastava, Kshitij, Fratzscher, Anne-Sophie, Lan, Bo, Flegel, Willy Albert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8150616/
https://www.ncbi.nlm.nih.gov/pubmed/34039276
http://dx.doi.org/10.1186/s12859-021-04169-6
_version_ 1783698190645592064
author Srivastava, Kshitij
Fratzscher, Anne-Sophie
Lan, Bo
Flegel, Willy Albert
author_facet Srivastava, Kshitij
Fratzscher, Anne-Sophie
Lan, Bo
Flegel, Willy Albert
author_sort Srivastava, Kshitij
collection PubMed
description BACKGROUND: Clinically effective and safe genotyping relies on correct reference sequences, often represented by haplotypes. The 1000 Genomes Project recorded individual genotypes across 26 different populations and, using computerized genotype phasing, reported haplotype data. In contrast, we identified long reference sequences by analyzing the homozygous genomic regions in this online database, a concept that has rarely been reported since next generation sequencing data became available. STUDY DESIGN AND METHODS: Phased genotype data for a 80.6 kb region of chromosome 1 was downloaded for all 2,504 unrelated individuals of the 1000 Genome Project Phase 3 cohort. The data was centered on the ACKR1 gene and bordered by the CADM3 and FCER1A genes. Individuals with heterozygosity at a single site or with complete homozygosity allowed unambiguous assignment of an ACKR1 haplotype. A computer algorithm was developed for extracting these haplotypes from the 1000 Genome Project in an automated fashion. A manual analysis validated the data extracted by the algorithm. RESULTS: We confirmed 902 ACKR1 haplotypes of varying lengths, the longest at 80,584 nucleotides and shortest at 1,901 nucleotides. The combined length of haplotype sequences comprised 19,895,388 nucleotides with a median of 16,014 nucleotides. Based on our approach, all haplotypes can be considered experimentally confirmed and not affected by the known errors of computerized genotype phasing. CONCLUSIONS: Tracts of homozygosity can provide definitive reference sequences for any gene. They are particularly useful when observed in unrelated individuals of large scale sequence databases. As a proof of principle, we explored the 1000 Genomes Project database for ACKR1 gene data and mined long haplotypes. These haplotypes are useful for high throughput analysis with next generation sequencing. Our approach is scalable, using automated bioinformatics tools, and can be applied to any gene. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04169-6.
format Online
Article
Text
id pubmed-8150616
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-81506162021-05-26 Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database Srivastava, Kshitij Fratzscher, Anne-Sophie Lan, Bo Flegel, Willy Albert BMC Bioinformatics Research BACKGROUND: Clinically effective and safe genotyping relies on correct reference sequences, often represented by haplotypes. The 1000 Genomes Project recorded individual genotypes across 26 different populations and, using computerized genotype phasing, reported haplotype data. In contrast, we identified long reference sequences by analyzing the homozygous genomic regions in this online database, a concept that has rarely been reported since next generation sequencing data became available. STUDY DESIGN AND METHODS: Phased genotype data for a 80.6 kb region of chromosome 1 was downloaded for all 2,504 unrelated individuals of the 1000 Genome Project Phase 3 cohort. The data was centered on the ACKR1 gene and bordered by the CADM3 and FCER1A genes. Individuals with heterozygosity at a single site or with complete homozygosity allowed unambiguous assignment of an ACKR1 haplotype. A computer algorithm was developed for extracting these haplotypes from the 1000 Genome Project in an automated fashion. A manual analysis validated the data extracted by the algorithm. RESULTS: We confirmed 902 ACKR1 haplotypes of varying lengths, the longest at 80,584 nucleotides and shortest at 1,901 nucleotides. The combined length of haplotype sequences comprised 19,895,388 nucleotides with a median of 16,014 nucleotides. Based on our approach, all haplotypes can be considered experimentally confirmed and not affected by the known errors of computerized genotype phasing. CONCLUSIONS: Tracts of homozygosity can provide definitive reference sequences for any gene. They are particularly useful when observed in unrelated individuals of large scale sequence databases. As a proof of principle, we explored the 1000 Genomes Project database for ACKR1 gene data and mined long haplotypes. These haplotypes are useful for high throughput analysis with next generation sequencing. Our approach is scalable, using automated bioinformatics tools, and can be applied to any gene. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04169-6. BioMed Central 2021-05-26 /pmc/articles/PMC8150616/ /pubmed/34039276 http://dx.doi.org/10.1186/s12859-021-04169-6 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Srivastava, Kshitij
Fratzscher, Anne-Sophie
Lan, Bo
Flegel, Willy Albert
Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database
title Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database
title_full Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database
title_fullStr Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database
title_full_unstemmed Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database
title_short Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database
title_sort cataloguing experimentally confirmed 80.7 kb-long ackr1 haplotypes from the 1000 genomes project database
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8150616/
https://www.ncbi.nlm.nih.gov/pubmed/34039276
http://dx.doi.org/10.1186/s12859-021-04169-6
work_keys_str_mv AT srivastavakshitij cataloguingexperimentallyconfirmed807kblongackr1haplotypesfromthe1000genomesprojectdatabase
AT fratzscherannesophie cataloguingexperimentallyconfirmed807kblongackr1haplotypesfromthe1000genomesprojectdatabase
AT lanbo cataloguingexperimentallyconfirmed807kblongackr1haplotypesfromthe1000genomesprojectdatabase
AT flegelwillyalbert cataloguingexperimentallyconfirmed807kblongackr1haplotypesfromthe1000genomesprojectdatabase