Cargando…
Improved genome inference in the MHC using a population reference graph
While much is known about human genetic variation, such information is typically ignored in assembling novel genomes. Instead, reads are mapped to a single reference, which can lead to poor characterization of regions of high sequence or structural diversity. We introduce a population reference grap...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4449272/ https://www.ncbi.nlm.nih.gov/pubmed/25915597 http://dx.doi.org/10.1038/ng.3257 |
_version_ | 1782373835161468928 |
---|---|
author | Dilthey, Alexander Cox, Charles Iqbal, Zamin Nelson, Matthew R. McVean, Gil |
author_facet | Dilthey, Alexander Cox, Charles Iqbal, Zamin Nelson, Matthew R. McVean, Gil |
author_sort | Dilthey, Alexander |
collection | PubMed |
description | While much is known about human genetic variation, such information is typically ignored in assembling novel genomes. Instead, reads are mapped to a single reference, which can lead to poor characterization of regions of high sequence or structural diversity. We introduce a population reference graph, which combines multiple reference sequences and catalogues of variation. The genomes of novel samples are reconstructed as paths through the graph using an efficient hidden Markov model, allowing for recombination between different haplotypes and additional variants. By applying the method to the 4.5Mb extended MHC region on human chromosome 6, combining eight assembled haplotypes, sequences of known classical HLA alleles and 87,640 SNP variants from the 1000 Genomes Project, we demonstrate, using simulations, SNP genotyping, short-read and long-read data, how the method improves the accuracy of genome inference and reveals regions where the current set of reference sequences is substantially incomplete. |
format | Online Article Text |
id | pubmed-4449272 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
record_format | MEDLINE/PubMed |
spelling | pubmed-44492722015-12-01 Improved genome inference in the MHC using a population reference graph Dilthey, Alexander Cox, Charles Iqbal, Zamin Nelson, Matthew R. McVean, Gil Nat Genet Article While much is known about human genetic variation, such information is typically ignored in assembling novel genomes. Instead, reads are mapped to a single reference, which can lead to poor characterization of regions of high sequence or structural diversity. We introduce a population reference graph, which combines multiple reference sequences and catalogues of variation. The genomes of novel samples are reconstructed as paths through the graph using an efficient hidden Markov model, allowing for recombination between different haplotypes and additional variants. By applying the method to the 4.5Mb extended MHC region on human chromosome 6, combining eight assembled haplotypes, sequences of known classical HLA alleles and 87,640 SNP variants from the 1000 Genomes Project, we demonstrate, using simulations, SNP genotyping, short-read and long-read data, how the method improves the accuracy of genome inference and reveals regions where the current set of reference sequences is substantially incomplete. 2015-04-27 2015-06 /pmc/articles/PMC4449272/ /pubmed/25915597 http://dx.doi.org/10.1038/ng.3257 Text en Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms |
spellingShingle | Article Dilthey, Alexander Cox, Charles Iqbal, Zamin Nelson, Matthew R. McVean, Gil Improved genome inference in the MHC using a population reference graph |
title | Improved genome inference in the MHC using a population reference graph |
title_full | Improved genome inference in the MHC using a population reference graph |
title_fullStr | Improved genome inference in the MHC using a population reference graph |
title_full_unstemmed | Improved genome inference in the MHC using a population reference graph |
title_short | Improved genome inference in the MHC using a population reference graph |
title_sort | improved genome inference in the mhc using a population reference graph |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4449272/ https://www.ncbi.nlm.nih.gov/pubmed/25915597 http://dx.doi.org/10.1038/ng.3257 |
work_keys_str_mv | AT diltheyalexander improvedgenomeinferenceinthemhcusingapopulationreferencegraph AT coxcharles improvedgenomeinferenceinthemhcusingapopulationreferencegraph AT iqbalzamin improvedgenomeinferenceinthemhcusingapopulationreferencegraph AT nelsonmatthewr improvedgenomeinferenceinthemhcusingapopulationreferencegraph AT mcveangil improvedgenomeinferenceinthemhcusingapopulationreferencegraph |