Cargando…

Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem

BACKGROUND: Structural variants (SVs) represent an important source of genetic variation. One of the most critical problems in their detection is breakpoint uncertainty associated with the inability to determine their exact genomic position. Breakpoint uncertainty is a characteristic issue of struct...

Descripción completa

Detalles Bibliográficos
Autores principales: Geryk, Jan, Zinkova, Alzbeta, Zedníková, Iveta, Simková, Halina, Stenzl, Vlastimil, Korabecna, Marie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8474851/
https://www.ncbi.nlm.nih.gov/pubmed/34579642
http://dx.doi.org/10.1186/s12859-021-04374-3
_version_ 1784575312128901120
author Geryk, Jan
Zinkova, Alzbeta
Zedníková, Iveta
Simková, Halina
Stenzl, Vlastimil
Korabecna, Marie
author_facet Geryk, Jan
Zinkova, Alzbeta
Zedníková, Iveta
Simková, Halina
Stenzl, Vlastimil
Korabecna, Marie
author_sort Geryk, Jan
collection PubMed
description BACKGROUND: Structural variants (SVs) represent an important source of genetic variation. One of the most critical problems in their detection is breakpoint uncertainty associated with the inability to determine their exact genomic position. Breakpoint uncertainty is a characteristic issue of structural variants detected via short-read sequencing methods and complicates subsequent population analyses. The commonly used heuristic strategy reduces this issue by clustering/merging nearby structural variants of the same type before the data from individual samples are merged. RESULTS: We compared the two most used dissimilarity measures for SV clustering in terms of Mendelian inheritance errors (MIE), kinship prediction, and deviation from Hardy–Weinberg equilibrium. We analyzed the occurrence of Mendelian-inconsistent SV clusters that can be collapsed into one Mendelian-consistent SV as a new measure of dataset consistency. We also developed a new method based on constrained clustering that explicitly identifies these types of clusters. CONCLUSIONS: We found that the dissimilarity measure based on the distance between SVs breakpoints produces slightly better results than the measure based on SVs overlap. This difference is evident in trivial and corrected clustering strategy, but not in constrained clustering strategy. However, constrained clustering strategy provided the best results in all aspects, regardless of the dissimilarity measure used. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04374-3.
format Online
Article
Text
id pubmed-8474851
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-84748512021-09-28 Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem Geryk, Jan Zinkova, Alzbeta Zedníková, Iveta Simková, Halina Stenzl, Vlastimil Korabecna, Marie BMC Bioinformatics Research BACKGROUND: Structural variants (SVs) represent an important source of genetic variation. One of the most critical problems in their detection is breakpoint uncertainty associated with the inability to determine their exact genomic position. Breakpoint uncertainty is a characteristic issue of structural variants detected via short-read sequencing methods and complicates subsequent population analyses. The commonly used heuristic strategy reduces this issue by clustering/merging nearby structural variants of the same type before the data from individual samples are merged. RESULTS: We compared the two most used dissimilarity measures for SV clustering in terms of Mendelian inheritance errors (MIE), kinship prediction, and deviation from Hardy–Weinberg equilibrium. We analyzed the occurrence of Mendelian-inconsistent SV clusters that can be collapsed into one Mendelian-consistent SV as a new measure of dataset consistency. We also developed a new method based on constrained clustering that explicitly identifies these types of clusters. CONCLUSIONS: We found that the dissimilarity measure based on the distance between SVs breakpoints produces slightly better results than the measure based on SVs overlap. This difference is evident in trivial and corrected clustering strategy, but not in constrained clustering strategy. However, constrained clustering strategy provided the best results in all aspects, regardless of the dissimilarity measure used. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04374-3. BioMed Central 2021-09-27 /pmc/articles/PMC8474851/ /pubmed/34579642 http://dx.doi.org/10.1186/s12859-021-04374-3 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Geryk, Jan
Zinkova, Alzbeta
Zedníková, Iveta
Simková, Halina
Stenzl, Vlastimil
Korabecna, Marie
Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem
title Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem
title_full Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem
title_fullStr Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem
title_full_unstemmed Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem
title_short Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem
title_sort improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8474851/
https://www.ncbi.nlm.nih.gov/pubmed/34579642
http://dx.doi.org/10.1186/s12859-021-04374-3
work_keys_str_mv AT gerykjan improvingstructuralvariantclusteringtoreducethenegativeeffectofthebreakpointuncertaintyproblem
AT zinkovaalzbeta improvingstructuralvariantclusteringtoreducethenegativeeffectofthebreakpointuncertaintyproblem
AT zednikovaiveta improvingstructuralvariantclusteringtoreducethenegativeeffectofthebreakpointuncertaintyproblem
AT simkovahalina improvingstructuralvariantclusteringtoreducethenegativeeffectofthebreakpointuncertaintyproblem
AT stenzlvlastimil improvingstructuralvariantclusteringtoreducethenegativeeffectofthebreakpointuncertaintyproblem
AT korabecnamarie improvingstructuralvariantclusteringtoreducethenegativeeffectofthebreakpointuncertaintyproblem