Cargando…

Improved haplotype resolution of highly duplicated MHC genes in a long-read genome assembly using MiSeq amplicons

Long-read sequencing offers a great improvement in the assembly of complex genomic regions, such as the major histocompatibility complex (MHC) region, which can contain both tandemly duplicated MHC genes (paralogs) and high repeat content. The MHC genes have expanded in passerine birds, resulting in...

Descripción completa

Detalles Bibliográficos
Autores principales: Mellinger, Samantha, Stervander, Martin, Lundberg, Max, Drews, Anna, Westerdahl, Helena
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10349553/
https://www.ncbi.nlm.nih.gov/pubmed/37456901
http://dx.doi.org/10.7717/peerj.15480
_version_ 1785073939798556672
author Mellinger, Samantha
Stervander, Martin
Lundberg, Max
Drews, Anna
Westerdahl, Helena
author_facet Mellinger, Samantha
Stervander, Martin
Lundberg, Max
Drews, Anna
Westerdahl, Helena
author_sort Mellinger, Samantha
collection PubMed
description Long-read sequencing offers a great improvement in the assembly of complex genomic regions, such as the major histocompatibility complex (MHC) region, which can contain both tandemly duplicated MHC genes (paralogs) and high repeat content. The MHC genes have expanded in passerine birds, resulting in numerous MHC paralogs, with relatively high sequence similarity, making the assembly of the MHC region challenging even with long-read sequencing. In addition, MHC genes show rather high sequence divergence between alleles, making diploid-aware assemblers incorrectly classify haplotypes from the same locus as sequences originating from different genomic regions. Consequently, the number of MHC paralogs can easily be over- or underestimated in long-read assemblies. We therefore set out to verify the MHC diversity in an original and a haplotype-purged long-read assembly of one great reed warbler Acrocephalus arundinaceus individual (the focal individual) by using Illumina MiSeq amplicon sequencing. Single exons, representing MHC class I (MHC-I) and class IIB (MHC-IIB) alleles, were sequenced in the focal individual and mapped to the annotated MHC alleles in the original long-read genome assembly. Eighty-four percent of the annotated MHC-I alleles in the original long-read genome assembly were detected using 55% of the amplicon alleles and likewise, 78% of the annotated MHC-IIB alleles were detected using 61% of the amplicon alleles, indicating an incomplete annotation of MHC genes. In the haploid genome assembly, each MHC-IIB gene should be represented by one allele. The parental origin of the MHC-IIB amplicon alleles in the focal individual was determined by sequencing MHC-IIB in its parents. Two of five larger scaffolds, containing 6–19 MHC-IIB paralogs, had a maternal and paternal origin, respectively, as well as a high nucleotide similarity, which suggests that these scaffolds had been incorrectly assigned as belonging to different loci in the genome rather than as alternate haplotypes of the same locus. Therefore, the number of MHC-IIB paralogs was overestimated in the haploid genome assembly. Based on our findings we propose amplicon sequencing as a suitable complement to long-read sequencing for independent validation of the number of paralogs in general and for haplotype inference in multigene families in particular.
format Online
Article
Text
id pubmed-10349553
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-103495532023-07-16 Improved haplotype resolution of highly duplicated MHC genes in a long-read genome assembly using MiSeq amplicons Mellinger, Samantha Stervander, Martin Lundberg, Max Drews, Anna Westerdahl, Helena PeerJ Computational Biology Long-read sequencing offers a great improvement in the assembly of complex genomic regions, such as the major histocompatibility complex (MHC) region, which can contain both tandemly duplicated MHC genes (paralogs) and high repeat content. The MHC genes have expanded in passerine birds, resulting in numerous MHC paralogs, with relatively high sequence similarity, making the assembly of the MHC region challenging even with long-read sequencing. In addition, MHC genes show rather high sequence divergence between alleles, making diploid-aware assemblers incorrectly classify haplotypes from the same locus as sequences originating from different genomic regions. Consequently, the number of MHC paralogs can easily be over- or underestimated in long-read assemblies. We therefore set out to verify the MHC diversity in an original and a haplotype-purged long-read assembly of one great reed warbler Acrocephalus arundinaceus individual (the focal individual) by using Illumina MiSeq amplicon sequencing. Single exons, representing MHC class I (MHC-I) and class IIB (MHC-IIB) alleles, were sequenced in the focal individual and mapped to the annotated MHC alleles in the original long-read genome assembly. Eighty-four percent of the annotated MHC-I alleles in the original long-read genome assembly were detected using 55% of the amplicon alleles and likewise, 78% of the annotated MHC-IIB alleles were detected using 61% of the amplicon alleles, indicating an incomplete annotation of MHC genes. In the haploid genome assembly, each MHC-IIB gene should be represented by one allele. The parental origin of the MHC-IIB amplicon alleles in the focal individual was determined by sequencing MHC-IIB in its parents. Two of five larger scaffolds, containing 6–19 MHC-IIB paralogs, had a maternal and paternal origin, respectively, as well as a high nucleotide similarity, which suggests that these scaffolds had been incorrectly assigned as belonging to different loci in the genome rather than as alternate haplotypes of the same locus. Therefore, the number of MHC-IIB paralogs was overestimated in the haploid genome assembly. Based on our findings we propose amplicon sequencing as a suitable complement to long-read sequencing for independent validation of the number of paralogs in general and for haplotype inference in multigene families in particular. PeerJ Inc. 2023-07-12 /pmc/articles/PMC10349553/ /pubmed/37456901 http://dx.doi.org/10.7717/peerj.15480 Text en ©2023 Mellinger et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Computational Biology
Mellinger, Samantha
Stervander, Martin
Lundberg, Max
Drews, Anna
Westerdahl, Helena
Improved haplotype resolution of highly duplicated MHC genes in a long-read genome assembly using MiSeq amplicons
title Improved haplotype resolution of highly duplicated MHC genes in a long-read genome assembly using MiSeq amplicons
title_full Improved haplotype resolution of highly duplicated MHC genes in a long-read genome assembly using MiSeq amplicons
title_fullStr Improved haplotype resolution of highly duplicated MHC genes in a long-read genome assembly using MiSeq amplicons
title_full_unstemmed Improved haplotype resolution of highly duplicated MHC genes in a long-read genome assembly using MiSeq amplicons
title_short Improved haplotype resolution of highly duplicated MHC genes in a long-read genome assembly using MiSeq amplicons
title_sort improved haplotype resolution of highly duplicated mhc genes in a long-read genome assembly using miseq amplicons
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10349553/
https://www.ncbi.nlm.nih.gov/pubmed/37456901
http://dx.doi.org/10.7717/peerj.15480
work_keys_str_mv AT mellingersamantha improvedhaplotyperesolutionofhighlyduplicatedmhcgenesinalongreadgenomeassemblyusingmiseqamplicons
AT stervandermartin improvedhaplotyperesolutionofhighlyduplicatedmhcgenesinalongreadgenomeassemblyusingmiseqamplicons
AT lundbergmax improvedhaplotyperesolutionofhighlyduplicatedmhcgenesinalongreadgenomeassemblyusingmiseqamplicons
AT drewsanna improvedhaplotyperesolutionofhighlyduplicatedmhcgenesinalongreadgenomeassemblyusingmiseqamplicons
AT westerdahlhelena improvedhaplotyperesolutionofhighlyduplicatedmhcgenesinalongreadgenomeassemblyusingmiseqamplicons