Cargando…

ALPHLARD: a Bayesian method for analyzing HLA genes from whole genome sequence data

BACKGROUND: Although human leukocyte antigen (HLA) genotyping based on amplicon, whole exome sequence (WES), and RNA sequence data has been achieved in recent years, accurate genotyping from whole genome sequence (WGS) data remains a challenge due to the low depth. Furthermore, there is no method to...

Descripción completa

Detalles Bibliográficos
Autores principales: Hayashi, Shuto, Yamaguchi, Rui, Mizuno, Shinichi, Komura, Mitsuhiro, Miyano, Satoru, Nakagawa, Hidewaki, Imoto, Seiya
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6211482/
https://www.ncbi.nlm.nih.gov/pubmed/30384854
http://dx.doi.org/10.1186/s12864-018-5169-9
_version_ 1783367343030665216
author Hayashi, Shuto
Yamaguchi, Rui
Mizuno, Shinichi
Komura, Mitsuhiro
Miyano, Satoru
Nakagawa, Hidewaki
Imoto, Seiya
author_facet Hayashi, Shuto
Yamaguchi, Rui
Mizuno, Shinichi
Komura, Mitsuhiro
Miyano, Satoru
Nakagawa, Hidewaki
Imoto, Seiya
author_sort Hayashi, Shuto
collection PubMed
description BACKGROUND: Although human leukocyte antigen (HLA) genotyping based on amplicon, whole exome sequence (WES), and RNA sequence data has been achieved in recent years, accurate genotyping from whole genome sequence (WGS) data remains a challenge due to the low depth. Furthermore, there is no method to identify the sequences of unknown HLA types not registered in HLA databases. RESULTS: We developed a Bayesian model, called ALPHLARD, that collects reads potentially generated from HLA genes and accurately determines a pair of HLA types for each of HLA-A, -B, -C, -DPA1, -DPB1, -DQA1, -DQB1, and -DRB1 genes at 3rd field resolution. Furthermore, ALPHLARD can detect rare germline variants not stored in HLA databases and call somatic mutations from paired normal and tumor sequence data. We illustrate the capability of ALPHLARD using 253 WES data and 25 WGS data from Illumina platforms. By comparing the results of HLA genotyping from SBT and amplicon sequencing methods, ALPHLARD achieved 98.8% for WES data and 98.5% for WGS data at 2nd field resolution. We also detected three somatic point mutations and one case of loss of heterozygosity in the HLA genes from the WGS data. CONCLUSIONS: ALPHLARD showed good performance for HLA genotyping even from low-coverage data. It also has a potential to detect rare germline variants and somatic mutations in HLA genes. It would help to fill in the current gaps in HLA reference databases and unveil the immunological significance of somatic mutations identified in HLA genes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-5169-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6211482
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62114822018-11-08 ALPHLARD: a Bayesian method for analyzing HLA genes from whole genome sequence data Hayashi, Shuto Yamaguchi, Rui Mizuno, Shinichi Komura, Mitsuhiro Miyano, Satoru Nakagawa, Hidewaki Imoto, Seiya BMC Genomics Methodology Article BACKGROUND: Although human leukocyte antigen (HLA) genotyping based on amplicon, whole exome sequence (WES), and RNA sequence data has been achieved in recent years, accurate genotyping from whole genome sequence (WGS) data remains a challenge due to the low depth. Furthermore, there is no method to identify the sequences of unknown HLA types not registered in HLA databases. RESULTS: We developed a Bayesian model, called ALPHLARD, that collects reads potentially generated from HLA genes and accurately determines a pair of HLA types for each of HLA-A, -B, -C, -DPA1, -DPB1, -DQA1, -DQB1, and -DRB1 genes at 3rd field resolution. Furthermore, ALPHLARD can detect rare germline variants not stored in HLA databases and call somatic mutations from paired normal and tumor sequence data. We illustrate the capability of ALPHLARD using 253 WES data and 25 WGS data from Illumina platforms. By comparing the results of HLA genotyping from SBT and amplicon sequencing methods, ALPHLARD achieved 98.8% for WES data and 98.5% for WGS data at 2nd field resolution. We also detected three somatic point mutations and one case of loss of heterozygosity in the HLA genes from the WGS data. CONCLUSIONS: ALPHLARD showed good performance for HLA genotyping even from low-coverage data. It also has a potential to detect rare germline variants and somatic mutations in HLA genes. It would help to fill in the current gaps in HLA reference databases and unveil the immunological significance of somatic mutations identified in HLA genes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-5169-9) contains supplementary material, which is available to authorized users. BioMed Central 2018-11-01 /pmc/articles/PMC6211482/ /pubmed/30384854 http://dx.doi.org/10.1186/s12864-018-5169-9 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Hayashi, Shuto
Yamaguchi, Rui
Mizuno, Shinichi
Komura, Mitsuhiro
Miyano, Satoru
Nakagawa, Hidewaki
Imoto, Seiya
ALPHLARD: a Bayesian method for analyzing HLA genes from whole genome sequence data
title ALPHLARD: a Bayesian method for analyzing HLA genes from whole genome sequence data
title_full ALPHLARD: a Bayesian method for analyzing HLA genes from whole genome sequence data
title_fullStr ALPHLARD: a Bayesian method for analyzing HLA genes from whole genome sequence data
title_full_unstemmed ALPHLARD: a Bayesian method for analyzing HLA genes from whole genome sequence data
title_short ALPHLARD: a Bayesian method for analyzing HLA genes from whole genome sequence data
title_sort alphlard: a bayesian method for analyzing hla genes from whole genome sequence data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6211482/
https://www.ncbi.nlm.nih.gov/pubmed/30384854
http://dx.doi.org/10.1186/s12864-018-5169-9
work_keys_str_mv AT hayashishuto alphlardabayesianmethodforanalyzinghlagenesfromwholegenomesequencedata
AT yamaguchirui alphlardabayesianmethodforanalyzinghlagenesfromwholegenomesequencedata
AT mizunoshinichi alphlardabayesianmethodforanalyzinghlagenesfromwholegenomesequencedata
AT komuramitsuhiro alphlardabayesianmethodforanalyzinghlagenesfromwholegenomesequencedata
AT miyanosatoru alphlardabayesianmethodforanalyzinghlagenesfromwholegenomesequencedata
AT nakagawahidewaki alphlardabayesianmethodforanalyzinghlagenesfromwholegenomesequencedata
AT imotoseiya alphlardabayesianmethodforanalyzinghlagenesfromwholegenomesequencedata