Cargando…

The phylogeny of 48 alleles, experimentally verified at 21 kb, and its application to clinical allele detection

BACKGROUND: Sequence information generated from next generation sequencing is often computationally phased using haplotype-phasing algorithms. Utilizing experimentally derived allele or haplotype information improves this prediction, as routinely used in HLA typing. We recently established a large d...

Descripción completa

Detalles Bibliográficos
Autores principales: Srivastava, Kshitij, Wollenberg, Kurt R., Flegel, Willy A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6371619/
https://www.ncbi.nlm.nih.gov/pubmed/30744658
http://dx.doi.org/10.1186/s12967-019-1791-9
_version_ 1783394593351401472
author Srivastava, Kshitij
Wollenberg, Kurt R.
Flegel, Willy A.
author_facet Srivastava, Kshitij
Wollenberg, Kurt R.
Flegel, Willy A.
author_sort Srivastava, Kshitij
collection PubMed
description BACKGROUND: Sequence information generated from next generation sequencing is often computationally phased using haplotype-phasing algorithms. Utilizing experimentally derived allele or haplotype information improves this prediction, as routinely used in HLA typing. We recently established a large dataset of long ERMAP alleles, which code for protein variants in the Scianna blood group system. We propose the phylogeny of this set of 48 alleles and identify evolutionary steps to derive the observed alleles. METHODS: The nucleotide sequence of > 21 kb each was used for all physically confirmed 48 ERMAP alleles that we previously published. Full-length sequences were aligned and variant sites were extracted manually. The Bayesian coalescent algorithm implemented in BEAST v1.8.3 was used to estimate a coalescent phylogeny for these variants and the allelic ancestral states at the internal nodes of the phylogeny. RESULTS: The phylogenetic analysis allowed us to identify the evolutionary relationships among the 48 ERMAP alleles, predict 4243 potential ancestral alleles and calculate a posterior probability for each of these unobserved alleles. Some of them coincide with observed alleles that are extant in the population. CONCLUSIONS: Our proposed strategy places known alleles in a phylogenetic framework, allowing us to describe as-yet-undiscovered alleles. In this new approach, which relies heavily on the accuracy of the alleles used for the phylogenetic analysis, an expanded set of predicted alleles can be used to infer alleles when large genotype data are analyzed, as typically generated by high-throughput sequencing. The alleles identified by studies like ours may be utilized in designing of microarray technologies, imputing of genotypes and mapping of next generation sequencing data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12967-019-1791-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6371619
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63716192019-02-25 The phylogeny of 48 alleles, experimentally verified at 21 kb, and its application to clinical allele detection Srivastava, Kshitij Wollenberg, Kurt R. Flegel, Willy A. J Transl Med Research BACKGROUND: Sequence information generated from next generation sequencing is often computationally phased using haplotype-phasing algorithms. Utilizing experimentally derived allele or haplotype information improves this prediction, as routinely used in HLA typing. We recently established a large dataset of long ERMAP alleles, which code for protein variants in the Scianna blood group system. We propose the phylogeny of this set of 48 alleles and identify evolutionary steps to derive the observed alleles. METHODS: The nucleotide sequence of > 21 kb each was used for all physically confirmed 48 ERMAP alleles that we previously published. Full-length sequences were aligned and variant sites were extracted manually. The Bayesian coalescent algorithm implemented in BEAST v1.8.3 was used to estimate a coalescent phylogeny for these variants and the allelic ancestral states at the internal nodes of the phylogeny. RESULTS: The phylogenetic analysis allowed us to identify the evolutionary relationships among the 48 ERMAP alleles, predict 4243 potential ancestral alleles and calculate a posterior probability for each of these unobserved alleles. Some of them coincide with observed alleles that are extant in the population. CONCLUSIONS: Our proposed strategy places known alleles in a phylogenetic framework, allowing us to describe as-yet-undiscovered alleles. In this new approach, which relies heavily on the accuracy of the alleles used for the phylogenetic analysis, an expanded set of predicted alleles can be used to infer alleles when large genotype data are analyzed, as typically generated by high-throughput sequencing. The alleles identified by studies like ours may be utilized in designing of microarray technologies, imputing of genotypes and mapping of next generation sequencing data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12967-019-1791-9) contains supplementary material, which is available to authorized users. BioMed Central 2019-02-11 /pmc/articles/PMC6371619/ /pubmed/30744658 http://dx.doi.org/10.1186/s12967-019-1791-9 Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Srivastava, Kshitij
Wollenberg, Kurt R.
Flegel, Willy A.
The phylogeny of 48 alleles, experimentally verified at 21 kb, and its application to clinical allele detection
title The phylogeny of 48 alleles, experimentally verified at 21 kb, and its application to clinical allele detection
title_full The phylogeny of 48 alleles, experimentally verified at 21 kb, and its application to clinical allele detection
title_fullStr The phylogeny of 48 alleles, experimentally verified at 21 kb, and its application to clinical allele detection
title_full_unstemmed The phylogeny of 48 alleles, experimentally verified at 21 kb, and its application to clinical allele detection
title_short The phylogeny of 48 alleles, experimentally verified at 21 kb, and its application to clinical allele detection
title_sort phylogeny of 48 alleles, experimentally verified at 21 kb, and its application to clinical allele detection
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6371619/
https://www.ncbi.nlm.nih.gov/pubmed/30744658
http://dx.doi.org/10.1186/s12967-019-1791-9
work_keys_str_mv AT srivastavakshitij thephylogenyof48allelesexperimentallyverifiedat21kbanditsapplicationtoclinicalalleledetection
AT wollenbergkurtr thephylogenyof48allelesexperimentallyverifiedat21kbanditsapplicationtoclinicalalleledetection
AT flegelwillya thephylogenyof48allelesexperimentallyverifiedat21kbanditsapplicationtoclinicalalleledetection
AT srivastavakshitij phylogenyof48allelesexperimentallyverifiedat21kbanditsapplicationtoclinicalalleledetection
AT wollenbergkurtr phylogenyof48allelesexperimentallyverifiedat21kbanditsapplicationtoclinicalalleledetection
AT flegelwillya phylogenyof48allelesexperimentallyverifiedat21kbanditsapplicationtoclinicalalleledetection