Cargando…

HLAProfiler utilizes k-mer profiles to improve HLA calling accuracy for rare and common alleles in RNA-seq data

BACKGROUND: The human leukocyte antigen (HLA) system is a genomic region involved in regulating the human immune system by encoding cell membrane major histocompatibility complex (MHC) proteins that are responsible for self-recognition. Understanding the variation in this region provides important i...

Descripción completa

Detalles Bibliográficos
Autores principales: Buchkovich, Martin L., Brown, Chad C., Robasky, Kimberly, Chai, Shengjie, Westfall, Sharon, Vincent, Benjamin G., Weimer, Eric T., Powers, Jason G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5618726/
https://www.ncbi.nlm.nih.gov/pubmed/28954626
http://dx.doi.org/10.1186/s13073-017-0473-6
_version_ 1783267251421446144
author Buchkovich, Martin L.
Brown, Chad C.
Robasky, Kimberly
Chai, Shengjie
Westfall, Sharon
Vincent, Benjamin G.
Weimer, Eric T.
Powers, Jason G.
author_facet Buchkovich, Martin L.
Brown, Chad C.
Robasky, Kimberly
Chai, Shengjie
Westfall, Sharon
Vincent, Benjamin G.
Weimer, Eric T.
Powers, Jason G.
author_sort Buchkovich, Martin L.
collection PubMed
description BACKGROUND: The human leukocyte antigen (HLA) system is a genomic region involved in regulating the human immune system by encoding cell membrane major histocompatibility complex (MHC) proteins that are responsible for self-recognition. Understanding the variation in this region provides important insights into autoimmune disorders, disease susceptibility, oncological immunotherapy, regenerative medicine, transplant rejection, and toxicogenomics. Traditional approaches to HLA typing are low throughput, target only a few genes, are labor intensive and costly, or require specialized protocols. RNA sequencing promises a relatively inexpensive, high-throughput solution for HLA calling across all genes, with the bonus of complete transcriptome information and widespread availability of historical data. Existing tools have been limited in their ability to accurately and comprehensively call HLA genes from RNA-seq data. RESULTS: We created HLAProfiler (https://github.com/ExpressionAnalysis/HLAProfiler), a k-mer profile-based method for HLA calling in RNA-seq data which can identify rare and common HLA alleles with > 99% accuracy at two-field precision in both biological and simulated data. For 68% of novel alleles not present in the reference database, HLAProfiler can correctly identify the two-field precision or exact coding sequence, a significant advance over existing algorithms. CONCLUSIONS: HLAProfiler allows for accurate HLA calls in RNA-seq data, reliably expanding the utility of these data in HLA-related research and enabling advances across a broad range of disciplines. Additionally, by using the observed data to identify potential novel alleles and update partial alleles, HLAProfiler will facilitate further improvements to the existing database of reference HLA alleles. HLAProfiler is available at https://expressionanalysis.github.io/HLAProfiler/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13073-017-0473-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5618726
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56187262017-10-03 HLAProfiler utilizes k-mer profiles to improve HLA calling accuracy for rare and common alleles in RNA-seq data Buchkovich, Martin L. Brown, Chad C. Robasky, Kimberly Chai, Shengjie Westfall, Sharon Vincent, Benjamin G. Weimer, Eric T. Powers, Jason G. Genome Med Software BACKGROUND: The human leukocyte antigen (HLA) system is a genomic region involved in regulating the human immune system by encoding cell membrane major histocompatibility complex (MHC) proteins that are responsible for self-recognition. Understanding the variation in this region provides important insights into autoimmune disorders, disease susceptibility, oncological immunotherapy, regenerative medicine, transplant rejection, and toxicogenomics. Traditional approaches to HLA typing are low throughput, target only a few genes, are labor intensive and costly, or require specialized protocols. RNA sequencing promises a relatively inexpensive, high-throughput solution for HLA calling across all genes, with the bonus of complete transcriptome information and widespread availability of historical data. Existing tools have been limited in their ability to accurately and comprehensively call HLA genes from RNA-seq data. RESULTS: We created HLAProfiler (https://github.com/ExpressionAnalysis/HLAProfiler), a k-mer profile-based method for HLA calling in RNA-seq data which can identify rare and common HLA alleles with > 99% accuracy at two-field precision in both biological and simulated data. For 68% of novel alleles not present in the reference database, HLAProfiler can correctly identify the two-field precision or exact coding sequence, a significant advance over existing algorithms. CONCLUSIONS: HLAProfiler allows for accurate HLA calls in RNA-seq data, reliably expanding the utility of these data in HLA-related research and enabling advances across a broad range of disciplines. Additionally, by using the observed data to identify potential novel alleles and update partial alleles, HLAProfiler will facilitate further improvements to the existing database of reference HLA alleles. HLAProfiler is available at https://expressionanalysis.github.io/HLAProfiler/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13073-017-0473-6) contains supplementary material, which is available to authorized users. BioMed Central 2017-09-27 /pmc/articles/PMC5618726/ /pubmed/28954626 http://dx.doi.org/10.1186/s13073-017-0473-6 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Buchkovich, Martin L.
Brown, Chad C.
Robasky, Kimberly
Chai, Shengjie
Westfall, Sharon
Vincent, Benjamin G.
Weimer, Eric T.
Powers, Jason G.
HLAProfiler utilizes k-mer profiles to improve HLA calling accuracy for rare and common alleles in RNA-seq data
title HLAProfiler utilizes k-mer profiles to improve HLA calling accuracy for rare and common alleles in RNA-seq data
title_full HLAProfiler utilizes k-mer profiles to improve HLA calling accuracy for rare and common alleles in RNA-seq data
title_fullStr HLAProfiler utilizes k-mer profiles to improve HLA calling accuracy for rare and common alleles in RNA-seq data
title_full_unstemmed HLAProfiler utilizes k-mer profiles to improve HLA calling accuracy for rare and common alleles in RNA-seq data
title_short HLAProfiler utilizes k-mer profiles to improve HLA calling accuracy for rare and common alleles in RNA-seq data
title_sort hlaprofiler utilizes k-mer profiles to improve hla calling accuracy for rare and common alleles in rna-seq data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5618726/
https://www.ncbi.nlm.nih.gov/pubmed/28954626
http://dx.doi.org/10.1186/s13073-017-0473-6
work_keys_str_mv AT buchkovichmartinl hlaprofilerutilizeskmerprofilestoimprovehlacallingaccuracyforrareandcommonallelesinrnaseqdata
AT brownchadc hlaprofilerutilizeskmerprofilestoimprovehlacallingaccuracyforrareandcommonallelesinrnaseqdata
AT robaskykimberly hlaprofilerutilizeskmerprofilestoimprovehlacallingaccuracyforrareandcommonallelesinrnaseqdata
AT chaishengjie hlaprofilerutilizeskmerprofilestoimprovehlacallingaccuracyforrareandcommonallelesinrnaseqdata
AT westfallsharon hlaprofilerutilizeskmerprofilestoimprovehlacallingaccuracyforrareandcommonallelesinrnaseqdata
AT vincentbenjaming hlaprofilerutilizeskmerprofilestoimprovehlacallingaccuracyforrareandcommonallelesinrnaseqdata
AT weimererict hlaprofilerutilizeskmerprofilestoimprovehlacallingaccuracyforrareandcommonallelesinrnaseqdata
AT powersjasong hlaprofilerutilizeskmerprofilestoimprovehlacallingaccuracyforrareandcommonallelesinrnaseqdata