Cargando…

PB-Motif—A Method for Identifying Gene/Pseudogene Rearrangements With Long Reads: An Application to CYP21A2 Genotyping

Long read sequencing technologies have the potential to accurately detect and phase variation in genomic regions that are difficult to fully characterize with conventional short read methods. These difficult to sequence regions include several clinically relevant genes with highly homologous pseudog...

Descripción completa

Detalles Bibliográficos
Autores principales: Stephens, Zachary, Milosevic, Dragana, Kipp, Benjamin, Grebe, Stefan, Iyer, Ravishankar K., Kocher, Jean-Pierre A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8355628/
https://www.ncbi.nlm.nih.gov/pubmed/34394200
http://dx.doi.org/10.3389/fgene.2021.716586
_version_ 1783736801446330368
author Stephens, Zachary
Milosevic, Dragana
Kipp, Benjamin
Grebe, Stefan
Iyer, Ravishankar K.
Kocher, Jean-Pierre A.
author_facet Stephens, Zachary
Milosevic, Dragana
Kipp, Benjamin
Grebe, Stefan
Iyer, Ravishankar K.
Kocher, Jean-Pierre A.
author_sort Stephens, Zachary
collection PubMed
description Long read sequencing technologies have the potential to accurately detect and phase variation in genomic regions that are difficult to fully characterize with conventional short read methods. These difficult to sequence regions include several clinically relevant genes with highly homologous pseudogenes, many of which are prone to gene conversions or other types of complex structural rearrangements. We present PB-Motif, a new method for identifying rearrangements between two highly homologous genomic regions using PacBio long reads. PB-Motif leverages clustering and filtering techniques to efficiently report rearrangements in the presence of sequencing errors and other systematic artifacts. Supporting reads for each high-confidence rearrangement can then be used for copy number estimation and phased variant calling. First, we demonstrate PB-Motif's accuracy with simulated sequence rearrangements of PMS2 and its pseudogene PMS2CL using simulated reads sweeping over a range of sequencing error rates. We then apply PB-Motif to 26 clinical samples, characterizing CYP21A2 and its pseudogene CYP21A1P as part of a diagnostic assay for congenital adrenal hyperplasia. We successfully identify damaging variation and patient carrier status concordant with clinical diagnosis obtained from multiplex ligation-dependent amplification (MLPA) and Sanger sequencing. The source code is available at: github.com/zstephens/pb-motif.
format Online
Article
Text
id pubmed-8355628
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-83556282021-08-12 PB-Motif—A Method for Identifying Gene/Pseudogene Rearrangements With Long Reads: An Application to CYP21A2 Genotyping Stephens, Zachary Milosevic, Dragana Kipp, Benjamin Grebe, Stefan Iyer, Ravishankar K. Kocher, Jean-Pierre A. Front Genet Genetics Long read sequencing technologies have the potential to accurately detect and phase variation in genomic regions that are difficult to fully characterize with conventional short read methods. These difficult to sequence regions include several clinically relevant genes with highly homologous pseudogenes, many of which are prone to gene conversions or other types of complex structural rearrangements. We present PB-Motif, a new method for identifying rearrangements between two highly homologous genomic regions using PacBio long reads. PB-Motif leverages clustering and filtering techniques to efficiently report rearrangements in the presence of sequencing errors and other systematic artifacts. Supporting reads for each high-confidence rearrangement can then be used for copy number estimation and phased variant calling. First, we demonstrate PB-Motif's accuracy with simulated sequence rearrangements of PMS2 and its pseudogene PMS2CL using simulated reads sweeping over a range of sequencing error rates. We then apply PB-Motif to 26 clinical samples, characterizing CYP21A2 and its pseudogene CYP21A1P as part of a diagnostic assay for congenital adrenal hyperplasia. We successfully identify damaging variation and patient carrier status concordant with clinical diagnosis obtained from multiplex ligation-dependent amplification (MLPA) and Sanger sequencing. The source code is available at: github.com/zstephens/pb-motif. Frontiers Media S.A. 2021-07-28 /pmc/articles/PMC8355628/ /pubmed/34394200 http://dx.doi.org/10.3389/fgene.2021.716586 Text en Copyright © 2021 Stephens, Milosevic, Kipp, Grebe, Iyer and Kocher. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Stephens, Zachary
Milosevic, Dragana
Kipp, Benjamin
Grebe, Stefan
Iyer, Ravishankar K.
Kocher, Jean-Pierre A.
PB-Motif—A Method for Identifying Gene/Pseudogene Rearrangements With Long Reads: An Application to CYP21A2 Genotyping
title PB-Motif—A Method for Identifying Gene/Pseudogene Rearrangements With Long Reads: An Application to CYP21A2 Genotyping
title_full PB-Motif—A Method for Identifying Gene/Pseudogene Rearrangements With Long Reads: An Application to CYP21A2 Genotyping
title_fullStr PB-Motif—A Method for Identifying Gene/Pseudogene Rearrangements With Long Reads: An Application to CYP21A2 Genotyping
title_full_unstemmed PB-Motif—A Method for Identifying Gene/Pseudogene Rearrangements With Long Reads: An Application to CYP21A2 Genotyping
title_short PB-Motif—A Method for Identifying Gene/Pseudogene Rearrangements With Long Reads: An Application to CYP21A2 Genotyping
title_sort pb-motif—a method for identifying gene/pseudogene rearrangements with long reads: an application to cyp21a2 genotyping
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8355628/
https://www.ncbi.nlm.nih.gov/pubmed/34394200
http://dx.doi.org/10.3389/fgene.2021.716586
work_keys_str_mv AT stephenszachary pbmotifamethodforidentifyinggenepseudogenerearrangementswithlongreadsanapplicationtocyp21a2genotyping
AT milosevicdragana pbmotifamethodforidentifyinggenepseudogenerearrangementswithlongreadsanapplicationtocyp21a2genotyping
AT kippbenjamin pbmotifamethodforidentifyinggenepseudogenerearrangementswithlongreadsanapplicationtocyp21a2genotyping
AT grebestefan pbmotifamethodforidentifyinggenepseudogenerearrangementswithlongreadsanapplicationtocyp21a2genotyping
AT iyerravishankark pbmotifamethodforidentifyinggenepseudogenerearrangementswithlongreadsanapplicationtocyp21a2genotyping
AT kocherjeanpierrea pbmotifamethodforidentifyinggenepseudogenerearrangementswithlongreadsanapplicationtocyp21a2genotyping