Cargando…

Small allelic variants are a source of ancestral bias in structural variant breakpoint placement

High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥ 50 bp) has improved to near basepair precision. Despite these advances, many SVs in unique regions of the genome are subje...

Descripción completa

Detalles Bibliográficos
Autores principales: Audano, Peter A., Beck, Christine R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10327140/
https://www.ncbi.nlm.nih.gov/pubmed/37425850
http://dx.doi.org/10.1101/2023.06.25.546295
_version_ 1785069564442181632
author Audano, Peter A.
Beck, Christine R.
author_facet Audano, Peter A.
Beck, Christine R.
author_sort Audano, Peter A.
collection PubMed
description High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥ 50 bp) has improved to near basepair precision. Despite these advances, many SVs in unique regions of the genome are subject to systematic bias that affects breakpoint location. This ambiguity leads to less accurate variant comparisons across samples, and it obscures true breakpoint features needed for mechanistic inferences. To understand why SVs are not consistently placed, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identified variable breakpoints for 882 SV insertions and 180 SV deletions not anchored in tandem repeats (TRs) or segmental duplications (SDs). While this is unexpectedly high for genome assemblies in unique loci, we find read-based callsets from the same sequencing data yielded 1,566 insertions and 986 deletions with inconsistent breakpoints also not anchored in TRs or SDs. When we investigated causes for breakpoint inaccuracy, we found sequence and assembly errors had minimal impact, but we observed a strong effect of ancestry. We confirmed that polymorphic mismatches and small indels are enriched at shifted breakpoints and that these polymorphisms are generally lost when breakpoints shift. Long tracts of homology, such as SVs mediated by transposable elements, increase the likelihood of imprecise SV calls and the distance they are shifted. Tandem Duplication (TD) breakpoints are the most heavily affected SV class with 14% of TDs placed at different locations across haplotypes. While graph genome methods normalize SV calls across many samples, the resulting breakpoints are sometimes incorrect, highlighting a need to tune graph methods for breakpoint accuracy. The breakpoint inconsistencies we characterize collectively affect ~5% of the SVs called in a human genome and underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoint placement, and increase the value of callsets for investigating mutational processes.
format Online
Article
Text
id pubmed-10327140
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-103271402023-07-08 Small allelic variants are a source of ancestral bias in structural variant breakpoint placement Audano, Peter A. Beck, Christine R. bioRxiv Article High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥ 50 bp) has improved to near basepair precision. Despite these advances, many SVs in unique regions of the genome are subject to systematic bias that affects breakpoint location. This ambiguity leads to less accurate variant comparisons across samples, and it obscures true breakpoint features needed for mechanistic inferences. To understand why SVs are not consistently placed, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identified variable breakpoints for 882 SV insertions and 180 SV deletions not anchored in tandem repeats (TRs) or segmental duplications (SDs). While this is unexpectedly high for genome assemblies in unique loci, we find read-based callsets from the same sequencing data yielded 1,566 insertions and 986 deletions with inconsistent breakpoints also not anchored in TRs or SDs. When we investigated causes for breakpoint inaccuracy, we found sequence and assembly errors had minimal impact, but we observed a strong effect of ancestry. We confirmed that polymorphic mismatches and small indels are enriched at shifted breakpoints and that these polymorphisms are generally lost when breakpoints shift. Long tracts of homology, such as SVs mediated by transposable elements, increase the likelihood of imprecise SV calls and the distance they are shifted. Tandem Duplication (TD) breakpoints are the most heavily affected SV class with 14% of TDs placed at different locations across haplotypes. While graph genome methods normalize SV calls across many samples, the resulting breakpoints are sometimes incorrect, highlighting a need to tune graph methods for breakpoint accuracy. The breakpoint inconsistencies we characterize collectively affect ~5% of the SVs called in a human genome and underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoint placement, and increase the value of callsets for investigating mutational processes. Cold Spring Harbor Laboratory 2023-06-26 /pmc/articles/PMC10327140/ /pubmed/37425850 http://dx.doi.org/10.1101/2023.06.25.546295 Text en https://creativecommons.org/licenses/by-nd/4.0/This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, and only so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Audano, Peter A.
Beck, Christine R.
Small allelic variants are a source of ancestral bias in structural variant breakpoint placement
title Small allelic variants are a source of ancestral bias in structural variant breakpoint placement
title_full Small allelic variants are a source of ancestral bias in structural variant breakpoint placement
title_fullStr Small allelic variants are a source of ancestral bias in structural variant breakpoint placement
title_full_unstemmed Small allelic variants are a source of ancestral bias in structural variant breakpoint placement
title_short Small allelic variants are a source of ancestral bias in structural variant breakpoint placement
title_sort small allelic variants are a source of ancestral bias in structural variant breakpoint placement
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10327140/
https://www.ncbi.nlm.nih.gov/pubmed/37425850
http://dx.doi.org/10.1101/2023.06.25.546295
work_keys_str_mv AT audanopetera smallallelicvariantsareasourceofancestralbiasinstructuralvariantbreakpointplacement
AT beckchristiner smallallelicvariantsareasourceofancestralbiasinstructuralvariantbreakpointplacement