Cargando…
Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database
BACKGROUND: Clinically effective and safe genotyping relies on correct reference sequences, often represented by haplotypes. The 1000 Genomes Project recorded individual genotypes across 26 different populations and, using computerized genotype phasing, reported haplotype data. In contrast, we ident...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8150616/ https://www.ncbi.nlm.nih.gov/pubmed/34039276 http://dx.doi.org/10.1186/s12859-021-04169-6 |
_version_ | 1783698190645592064 |
---|---|
author | Srivastava, Kshitij Fratzscher, Anne-Sophie Lan, Bo Flegel, Willy Albert |
author_facet | Srivastava, Kshitij Fratzscher, Anne-Sophie Lan, Bo Flegel, Willy Albert |
author_sort | Srivastava, Kshitij |
collection | PubMed |
description | BACKGROUND: Clinically effective and safe genotyping relies on correct reference sequences, often represented by haplotypes. The 1000 Genomes Project recorded individual genotypes across 26 different populations and, using computerized genotype phasing, reported haplotype data. In contrast, we identified long reference sequences by analyzing the homozygous genomic regions in this online database, a concept that has rarely been reported since next generation sequencing data became available. STUDY DESIGN AND METHODS: Phased genotype data for a 80.6 kb region of chromosome 1 was downloaded for all 2,504 unrelated individuals of the 1000 Genome Project Phase 3 cohort. The data was centered on the ACKR1 gene and bordered by the CADM3 and FCER1A genes. Individuals with heterozygosity at a single site or with complete homozygosity allowed unambiguous assignment of an ACKR1 haplotype. A computer algorithm was developed for extracting these haplotypes from the 1000 Genome Project in an automated fashion. A manual analysis validated the data extracted by the algorithm. RESULTS: We confirmed 902 ACKR1 haplotypes of varying lengths, the longest at 80,584 nucleotides and shortest at 1,901 nucleotides. The combined length of haplotype sequences comprised 19,895,388 nucleotides with a median of 16,014 nucleotides. Based on our approach, all haplotypes can be considered experimentally confirmed and not affected by the known errors of computerized genotype phasing. CONCLUSIONS: Tracts of homozygosity can provide definitive reference sequences for any gene. They are particularly useful when observed in unrelated individuals of large scale sequence databases. As a proof of principle, we explored the 1000 Genomes Project database for ACKR1 gene data and mined long haplotypes. These haplotypes are useful for high throughput analysis with next generation sequencing. Our approach is scalable, using automated bioinformatics tools, and can be applied to any gene. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04169-6. |
format | Online Article Text |
id | pubmed-8150616 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-81506162021-05-26 Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database Srivastava, Kshitij Fratzscher, Anne-Sophie Lan, Bo Flegel, Willy Albert BMC Bioinformatics Research BACKGROUND: Clinically effective and safe genotyping relies on correct reference sequences, often represented by haplotypes. The 1000 Genomes Project recorded individual genotypes across 26 different populations and, using computerized genotype phasing, reported haplotype data. In contrast, we identified long reference sequences by analyzing the homozygous genomic regions in this online database, a concept that has rarely been reported since next generation sequencing data became available. STUDY DESIGN AND METHODS: Phased genotype data for a 80.6 kb region of chromosome 1 was downloaded for all 2,504 unrelated individuals of the 1000 Genome Project Phase 3 cohort. The data was centered on the ACKR1 gene and bordered by the CADM3 and FCER1A genes. Individuals with heterozygosity at a single site or with complete homozygosity allowed unambiguous assignment of an ACKR1 haplotype. A computer algorithm was developed for extracting these haplotypes from the 1000 Genome Project in an automated fashion. A manual analysis validated the data extracted by the algorithm. RESULTS: We confirmed 902 ACKR1 haplotypes of varying lengths, the longest at 80,584 nucleotides and shortest at 1,901 nucleotides. The combined length of haplotype sequences comprised 19,895,388 nucleotides with a median of 16,014 nucleotides. Based on our approach, all haplotypes can be considered experimentally confirmed and not affected by the known errors of computerized genotype phasing. CONCLUSIONS: Tracts of homozygosity can provide definitive reference sequences for any gene. They are particularly useful when observed in unrelated individuals of large scale sequence databases. As a proof of principle, we explored the 1000 Genomes Project database for ACKR1 gene data and mined long haplotypes. These haplotypes are useful for high throughput analysis with next generation sequencing. Our approach is scalable, using automated bioinformatics tools, and can be applied to any gene. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04169-6. BioMed Central 2021-05-26 /pmc/articles/PMC8150616/ /pubmed/34039276 http://dx.doi.org/10.1186/s12859-021-04169-6 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Srivastava, Kshitij Fratzscher, Anne-Sophie Lan, Bo Flegel, Willy Albert Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database |
title | Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database |
title_full | Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database |
title_fullStr | Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database |
title_full_unstemmed | Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database |
title_short | Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database |
title_sort | cataloguing experimentally confirmed 80.7 kb-long ackr1 haplotypes from the 1000 genomes project database |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8150616/ https://www.ncbi.nlm.nih.gov/pubmed/34039276 http://dx.doi.org/10.1186/s12859-021-04169-6 |
work_keys_str_mv | AT srivastavakshitij cataloguingexperimentallyconfirmed807kblongackr1haplotypesfromthe1000genomesprojectdatabase AT fratzscherannesophie cataloguingexperimentallyconfirmed807kblongackr1haplotypesfromthe1000genomesprojectdatabase AT lanbo cataloguingexperimentallyconfirmed807kblongackr1haplotypesfromthe1000genomesprojectdatabase AT flegelwillyalbert cataloguingexperimentallyconfirmed807kblongackr1haplotypesfromthe1000genomesprojectdatabase |