Cargando…

Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads

BACKGROUND: Accurate HLA typing at amino acid level (four-digit resolution) is critical in hematopoietic and organ transplantations, pathogenesis studies of autoimmune and infectious diseases, as well as the development of immunoncology therapies. With the rapid adoption of genome-wide sequencing in...

Descripción completa

Detalles Bibliográficos
Autores principales: Bai, Yu, Ni, Min, Cooper, Blerta, Wei, Yi, Fury, Wen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4035057/
https://www.ncbi.nlm.nih.gov/pubmed/24884790
http://dx.doi.org/10.1186/1471-2164-15-325
_version_ 1782318014002102272
author Bai, Yu
Ni, Min
Cooper, Blerta
Wei, Yi
Fury, Wen
author_facet Bai, Yu
Ni, Min
Cooper, Blerta
Wei, Yi
Fury, Wen
author_sort Bai, Yu
collection PubMed
description BACKGROUND: Accurate HLA typing at amino acid level (four-digit resolution) is critical in hematopoietic and organ transplantations, pathogenesis studies of autoimmune and infectious diseases, as well as the development of immunoncology therapies. With the rapid adoption of genome-wide sequencing in biomedical research, HLA typing based on transcriptome and whole exome/genome sequencing data becomes increasingly attractive due to its high throughput and convenience. However, unlike targeted amplicon sequencing, genome-wide sequencing often employs a reduced read length and coverage that impose great challenges in resolving the highly homologous HLA alleles. Though several algorithms exist and have been applied to four-digit typing, some deliver low to moderate accuracies, some output ambiguous predictions. Moreover, few methods suit diverse read lengths and depths, and both RNA and DNA sequencing inputs. New algorithms are therefore needed to leverage the accuracy and flexibility of HLA typing at high resolution using genome-wide sequencing data. RESULTS: We have developed a new algorithm named PHLAT to discover the most probable pair of HLA alleles at four-digit resolution or higher, via a unique integration of a candidate allele selection and a likelihood scoring. Over a comprehensive set of benchmarking data (a total of 768 HLA alleles) from both RNA and DNA sequencing and with a broad range of read lengths and coverage, PHLAT consistently achieves a high accuracy at four-digit (92%-95%) and two-digit resolutions (96%-99%), outcompeting most of the existing methods. It also supports targeted amplicon sequencing data from Illumina Miseq. CONCLUSIONS: PHLAT significantly leverages the accuracy and flexibility of high resolution HLA typing based on genome-wide sequencing data. It may benefit both basic and applied research in immunology and related fields as well as numerous clinical applications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-325) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4035057
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40350572014-06-06 Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads Bai, Yu Ni, Min Cooper, Blerta Wei, Yi Fury, Wen BMC Genomics Methodology Article BACKGROUND: Accurate HLA typing at amino acid level (four-digit resolution) is critical in hematopoietic and organ transplantations, pathogenesis studies of autoimmune and infectious diseases, as well as the development of immunoncology therapies. With the rapid adoption of genome-wide sequencing in biomedical research, HLA typing based on transcriptome and whole exome/genome sequencing data becomes increasingly attractive due to its high throughput and convenience. However, unlike targeted amplicon sequencing, genome-wide sequencing often employs a reduced read length and coverage that impose great challenges in resolving the highly homologous HLA alleles. Though several algorithms exist and have been applied to four-digit typing, some deliver low to moderate accuracies, some output ambiguous predictions. Moreover, few methods suit diverse read lengths and depths, and both RNA and DNA sequencing inputs. New algorithms are therefore needed to leverage the accuracy and flexibility of HLA typing at high resolution using genome-wide sequencing data. RESULTS: We have developed a new algorithm named PHLAT to discover the most probable pair of HLA alleles at four-digit resolution or higher, via a unique integration of a candidate allele selection and a likelihood scoring. Over a comprehensive set of benchmarking data (a total of 768 HLA alleles) from both RNA and DNA sequencing and with a broad range of read lengths and coverage, PHLAT consistently achieves a high accuracy at four-digit (92%-95%) and two-digit resolutions (96%-99%), outcompeting most of the existing methods. It also supports targeted amplicon sequencing data from Illumina Miseq. CONCLUSIONS: PHLAT significantly leverages the accuracy and flexibility of high resolution HLA typing based on genome-wide sequencing data. It may benefit both basic and applied research in immunology and related fields as well as numerous clinical applications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-325) contains supplementary material, which is available to authorized users. BioMed Central 2014-05-01 /pmc/articles/PMC4035057/ /pubmed/24884790 http://dx.doi.org/10.1186/1471-2164-15-325 Text en © Bai et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Bai, Yu
Ni, Min
Cooper, Blerta
Wei, Yi
Fury, Wen
Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads
title Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads
title_full Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads
title_fullStr Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads
title_full_unstemmed Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads
title_short Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads
title_sort inference of high resolution hla types using genome-wide rna or dna sequencing reads
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4035057/
https://www.ncbi.nlm.nih.gov/pubmed/24884790
http://dx.doi.org/10.1186/1471-2164-15-325
work_keys_str_mv AT baiyu inferenceofhighresolutionhlatypesusinggenomewidernaordnasequencingreads
AT nimin inferenceofhighresolutionhlatypesusinggenomewidernaordnasequencingreads
AT cooperblerta inferenceofhighresolutionhlatypesusinggenomewidernaordnasequencingreads
AT weiyi inferenceofhighresolutionhlatypesusinggenomewidernaordnasequencingreads
AT furywen inferenceofhighresolutionhlatypesusinggenomewidernaordnasequencingreads