Cargando…
Small allelic variants are a source of ancestral bias in structural variant breakpoint placement
High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥ 50 bp) has improved to near basepair precision. Despite these advances, many SVs in unique regions of the genome are subje...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10327140/ https://www.ncbi.nlm.nih.gov/pubmed/37425850 http://dx.doi.org/10.1101/2023.06.25.546295 |
_version_ | 1785069564442181632 |
---|---|
author | Audano, Peter A. Beck, Christine R. |
author_facet | Audano, Peter A. Beck, Christine R. |
author_sort | Audano, Peter A. |
collection | PubMed |
description | High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥ 50 bp) has improved to near basepair precision. Despite these advances, many SVs in unique regions of the genome are subject to systematic bias that affects breakpoint location. This ambiguity leads to less accurate variant comparisons across samples, and it obscures true breakpoint features needed for mechanistic inferences. To understand why SVs are not consistently placed, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identified variable breakpoints for 882 SV insertions and 180 SV deletions not anchored in tandem repeats (TRs) or segmental duplications (SDs). While this is unexpectedly high for genome assemblies in unique loci, we find read-based callsets from the same sequencing data yielded 1,566 insertions and 986 deletions with inconsistent breakpoints also not anchored in TRs or SDs. When we investigated causes for breakpoint inaccuracy, we found sequence and assembly errors had minimal impact, but we observed a strong effect of ancestry. We confirmed that polymorphic mismatches and small indels are enriched at shifted breakpoints and that these polymorphisms are generally lost when breakpoints shift. Long tracts of homology, such as SVs mediated by transposable elements, increase the likelihood of imprecise SV calls and the distance they are shifted. Tandem Duplication (TD) breakpoints are the most heavily affected SV class with 14% of TDs placed at different locations across haplotypes. While graph genome methods normalize SV calls across many samples, the resulting breakpoints are sometimes incorrect, highlighting a need to tune graph methods for breakpoint accuracy. The breakpoint inconsistencies we characterize collectively affect ~5% of the SVs called in a human genome and underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoint placement, and increase the value of callsets for investigating mutational processes. |
format | Online Article Text |
id | pubmed-10327140 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-103271402023-07-08 Small allelic variants are a source of ancestral bias in structural variant breakpoint placement Audano, Peter A. Beck, Christine R. bioRxiv Article High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥ 50 bp) has improved to near basepair precision. Despite these advances, many SVs in unique regions of the genome are subject to systematic bias that affects breakpoint location. This ambiguity leads to less accurate variant comparisons across samples, and it obscures true breakpoint features needed for mechanistic inferences. To understand why SVs are not consistently placed, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identified variable breakpoints for 882 SV insertions and 180 SV deletions not anchored in tandem repeats (TRs) or segmental duplications (SDs). While this is unexpectedly high for genome assemblies in unique loci, we find read-based callsets from the same sequencing data yielded 1,566 insertions and 986 deletions with inconsistent breakpoints also not anchored in TRs or SDs. When we investigated causes for breakpoint inaccuracy, we found sequence and assembly errors had minimal impact, but we observed a strong effect of ancestry. We confirmed that polymorphic mismatches and small indels are enriched at shifted breakpoints and that these polymorphisms are generally lost when breakpoints shift. Long tracts of homology, such as SVs mediated by transposable elements, increase the likelihood of imprecise SV calls and the distance they are shifted. Tandem Duplication (TD) breakpoints are the most heavily affected SV class with 14% of TDs placed at different locations across haplotypes. While graph genome methods normalize SV calls across many samples, the resulting breakpoints are sometimes incorrect, highlighting a need to tune graph methods for breakpoint accuracy. The breakpoint inconsistencies we characterize collectively affect ~5% of the SVs called in a human genome and underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoint placement, and increase the value of callsets for investigating mutational processes. Cold Spring Harbor Laboratory 2023-06-26 /pmc/articles/PMC10327140/ /pubmed/37425850 http://dx.doi.org/10.1101/2023.06.25.546295 Text en https://creativecommons.org/licenses/by-nd/4.0/This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, and only so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article Audano, Peter A. Beck, Christine R. Small allelic variants are a source of ancestral bias in structural variant breakpoint placement |
title | Small allelic variants are a source of ancestral bias in structural variant breakpoint placement |
title_full | Small allelic variants are a source of ancestral bias in structural variant breakpoint placement |
title_fullStr | Small allelic variants are a source of ancestral bias in structural variant breakpoint placement |
title_full_unstemmed | Small allelic variants are a source of ancestral bias in structural variant breakpoint placement |
title_short | Small allelic variants are a source of ancestral bias in structural variant breakpoint placement |
title_sort | small allelic variants are a source of ancestral bias in structural variant breakpoint placement |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10327140/ https://www.ncbi.nlm.nih.gov/pubmed/37425850 http://dx.doi.org/10.1101/2023.06.25.546295 |
work_keys_str_mv | AT audanopetera smallallelicvariantsareasourceofancestralbiasinstructuralvariantbreakpointplacement AT beckchristiner smallallelicvariantsareasourceofancestralbiasinstructuralvariantbreakpointplacement |