Cargando…
HLAProfiler utilizes k-mer profiles to improve HLA calling accuracy for rare and common alleles in RNA-seq data
BACKGROUND: The human leukocyte antigen (HLA) system is a genomic region involved in regulating the human immune system by encoding cell membrane major histocompatibility complex (MHC) proteins that are responsible for self-recognition. Understanding the variation in this region provides important i...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5618726/ https://www.ncbi.nlm.nih.gov/pubmed/28954626 http://dx.doi.org/10.1186/s13073-017-0473-6 |
_version_ | 1783267251421446144 |
---|---|
author | Buchkovich, Martin L. Brown, Chad C. Robasky, Kimberly Chai, Shengjie Westfall, Sharon Vincent, Benjamin G. Weimer, Eric T. Powers, Jason G. |
author_facet | Buchkovich, Martin L. Brown, Chad C. Robasky, Kimberly Chai, Shengjie Westfall, Sharon Vincent, Benjamin G. Weimer, Eric T. Powers, Jason G. |
author_sort | Buchkovich, Martin L. |
collection | PubMed |
description | BACKGROUND: The human leukocyte antigen (HLA) system is a genomic region involved in regulating the human immune system by encoding cell membrane major histocompatibility complex (MHC) proteins that are responsible for self-recognition. Understanding the variation in this region provides important insights into autoimmune disorders, disease susceptibility, oncological immunotherapy, regenerative medicine, transplant rejection, and toxicogenomics. Traditional approaches to HLA typing are low throughput, target only a few genes, are labor intensive and costly, or require specialized protocols. RNA sequencing promises a relatively inexpensive, high-throughput solution for HLA calling across all genes, with the bonus of complete transcriptome information and widespread availability of historical data. Existing tools have been limited in their ability to accurately and comprehensively call HLA genes from RNA-seq data. RESULTS: We created HLAProfiler (https://github.com/ExpressionAnalysis/HLAProfiler), a k-mer profile-based method for HLA calling in RNA-seq data which can identify rare and common HLA alleles with > 99% accuracy at two-field precision in both biological and simulated data. For 68% of novel alleles not present in the reference database, HLAProfiler can correctly identify the two-field precision or exact coding sequence, a significant advance over existing algorithms. CONCLUSIONS: HLAProfiler allows for accurate HLA calls in RNA-seq data, reliably expanding the utility of these data in HLA-related research and enabling advances across a broad range of disciplines. Additionally, by using the observed data to identify potential novel alleles and update partial alleles, HLAProfiler will facilitate further improvements to the existing database of reference HLA alleles. HLAProfiler is available at https://expressionanalysis.github.io/HLAProfiler/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13073-017-0473-6) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5618726 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-56187262017-10-03 HLAProfiler utilizes k-mer profiles to improve HLA calling accuracy for rare and common alleles in RNA-seq data Buchkovich, Martin L. Brown, Chad C. Robasky, Kimberly Chai, Shengjie Westfall, Sharon Vincent, Benjamin G. Weimer, Eric T. Powers, Jason G. Genome Med Software BACKGROUND: The human leukocyte antigen (HLA) system is a genomic region involved in regulating the human immune system by encoding cell membrane major histocompatibility complex (MHC) proteins that are responsible for self-recognition. Understanding the variation in this region provides important insights into autoimmune disorders, disease susceptibility, oncological immunotherapy, regenerative medicine, transplant rejection, and toxicogenomics. Traditional approaches to HLA typing are low throughput, target only a few genes, are labor intensive and costly, or require specialized protocols. RNA sequencing promises a relatively inexpensive, high-throughput solution for HLA calling across all genes, with the bonus of complete transcriptome information and widespread availability of historical data. Existing tools have been limited in their ability to accurately and comprehensively call HLA genes from RNA-seq data. RESULTS: We created HLAProfiler (https://github.com/ExpressionAnalysis/HLAProfiler), a k-mer profile-based method for HLA calling in RNA-seq data which can identify rare and common HLA alleles with > 99% accuracy at two-field precision in both biological and simulated data. For 68% of novel alleles not present in the reference database, HLAProfiler can correctly identify the two-field precision or exact coding sequence, a significant advance over existing algorithms. CONCLUSIONS: HLAProfiler allows for accurate HLA calls in RNA-seq data, reliably expanding the utility of these data in HLA-related research and enabling advances across a broad range of disciplines. Additionally, by using the observed data to identify potential novel alleles and update partial alleles, HLAProfiler will facilitate further improvements to the existing database of reference HLA alleles. HLAProfiler is available at https://expressionanalysis.github.io/HLAProfiler/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13073-017-0473-6) contains supplementary material, which is available to authorized users. BioMed Central 2017-09-27 /pmc/articles/PMC5618726/ /pubmed/28954626 http://dx.doi.org/10.1186/s13073-017-0473-6 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Buchkovich, Martin L. Brown, Chad C. Robasky, Kimberly Chai, Shengjie Westfall, Sharon Vincent, Benjamin G. Weimer, Eric T. Powers, Jason G. HLAProfiler utilizes k-mer profiles to improve HLA calling accuracy for rare and common alleles in RNA-seq data |
title | HLAProfiler utilizes k-mer profiles to improve HLA calling accuracy for rare and common alleles in RNA-seq data |
title_full | HLAProfiler utilizes k-mer profiles to improve HLA calling accuracy for rare and common alleles in RNA-seq data |
title_fullStr | HLAProfiler utilizes k-mer profiles to improve HLA calling accuracy for rare and common alleles in RNA-seq data |
title_full_unstemmed | HLAProfiler utilizes k-mer profiles to improve HLA calling accuracy for rare and common alleles in RNA-seq data |
title_short | HLAProfiler utilizes k-mer profiles to improve HLA calling accuracy for rare and common alleles in RNA-seq data |
title_sort | hlaprofiler utilizes k-mer profiles to improve hla calling accuracy for rare and common alleles in rna-seq data |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5618726/ https://www.ncbi.nlm.nih.gov/pubmed/28954626 http://dx.doi.org/10.1186/s13073-017-0473-6 |
work_keys_str_mv | AT buchkovichmartinl hlaprofilerutilizeskmerprofilestoimprovehlacallingaccuracyforrareandcommonallelesinrnaseqdata AT brownchadc hlaprofilerutilizeskmerprofilestoimprovehlacallingaccuracyforrareandcommonallelesinrnaseqdata AT robaskykimberly hlaprofilerutilizeskmerprofilestoimprovehlacallingaccuracyforrareandcommonallelesinrnaseqdata AT chaishengjie hlaprofilerutilizeskmerprofilestoimprovehlacallingaccuracyforrareandcommonallelesinrnaseqdata AT westfallsharon hlaprofilerutilizeskmerprofilestoimprovehlacallingaccuracyforrareandcommonallelesinrnaseqdata AT vincentbenjaming hlaprofilerutilizeskmerprofilestoimprovehlacallingaccuracyforrareandcommonallelesinrnaseqdata AT weimererict hlaprofilerutilizeskmerprofilestoimprovehlacallingaccuracyforrareandcommonallelesinrnaseqdata AT powersjasong hlaprofilerutilizeskmerprofilestoimprovehlacallingaccuracyforrareandcommonallelesinrnaseqdata |