Cargando…
Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem
BACKGROUND: Structural variants (SVs) represent an important source of genetic variation. One of the most critical problems in their detection is breakpoint uncertainty associated with the inability to determine their exact genomic position. Breakpoint uncertainty is a characteristic issue of struct...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8474851/ https://www.ncbi.nlm.nih.gov/pubmed/34579642 http://dx.doi.org/10.1186/s12859-021-04374-3 |
_version_ | 1784575312128901120 |
---|---|
author | Geryk, Jan Zinkova, Alzbeta Zedníková, Iveta Simková, Halina Stenzl, Vlastimil Korabecna, Marie |
author_facet | Geryk, Jan Zinkova, Alzbeta Zedníková, Iveta Simková, Halina Stenzl, Vlastimil Korabecna, Marie |
author_sort | Geryk, Jan |
collection | PubMed |
description | BACKGROUND: Structural variants (SVs) represent an important source of genetic variation. One of the most critical problems in their detection is breakpoint uncertainty associated with the inability to determine their exact genomic position. Breakpoint uncertainty is a characteristic issue of structural variants detected via short-read sequencing methods and complicates subsequent population analyses. The commonly used heuristic strategy reduces this issue by clustering/merging nearby structural variants of the same type before the data from individual samples are merged. RESULTS: We compared the two most used dissimilarity measures for SV clustering in terms of Mendelian inheritance errors (MIE), kinship prediction, and deviation from Hardy–Weinberg equilibrium. We analyzed the occurrence of Mendelian-inconsistent SV clusters that can be collapsed into one Mendelian-consistent SV as a new measure of dataset consistency. We also developed a new method based on constrained clustering that explicitly identifies these types of clusters. CONCLUSIONS: We found that the dissimilarity measure based on the distance between SVs breakpoints produces slightly better results than the measure based on SVs overlap. This difference is evident in trivial and corrected clustering strategy, but not in constrained clustering strategy. However, constrained clustering strategy provided the best results in all aspects, regardless of the dissimilarity measure used. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04374-3. |
format | Online Article Text |
id | pubmed-8474851 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-84748512021-09-28 Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem Geryk, Jan Zinkova, Alzbeta Zedníková, Iveta Simková, Halina Stenzl, Vlastimil Korabecna, Marie BMC Bioinformatics Research BACKGROUND: Structural variants (SVs) represent an important source of genetic variation. One of the most critical problems in their detection is breakpoint uncertainty associated with the inability to determine their exact genomic position. Breakpoint uncertainty is a characteristic issue of structural variants detected via short-read sequencing methods and complicates subsequent population analyses. The commonly used heuristic strategy reduces this issue by clustering/merging nearby structural variants of the same type before the data from individual samples are merged. RESULTS: We compared the two most used dissimilarity measures for SV clustering in terms of Mendelian inheritance errors (MIE), kinship prediction, and deviation from Hardy–Weinberg equilibrium. We analyzed the occurrence of Mendelian-inconsistent SV clusters that can be collapsed into one Mendelian-consistent SV as a new measure of dataset consistency. We also developed a new method based on constrained clustering that explicitly identifies these types of clusters. CONCLUSIONS: We found that the dissimilarity measure based on the distance between SVs breakpoints produces slightly better results than the measure based on SVs overlap. This difference is evident in trivial and corrected clustering strategy, but not in constrained clustering strategy. However, constrained clustering strategy provided the best results in all aspects, regardless of the dissimilarity measure used. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04374-3. BioMed Central 2021-09-27 /pmc/articles/PMC8474851/ /pubmed/34579642 http://dx.doi.org/10.1186/s12859-021-04374-3 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Geryk, Jan Zinkova, Alzbeta Zedníková, Iveta Simková, Halina Stenzl, Vlastimil Korabecna, Marie Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem |
title | Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem |
title_full | Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem |
title_fullStr | Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem |
title_full_unstemmed | Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem |
title_short | Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem |
title_sort | improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8474851/ https://www.ncbi.nlm.nih.gov/pubmed/34579642 http://dx.doi.org/10.1186/s12859-021-04374-3 |
work_keys_str_mv | AT gerykjan improvingstructuralvariantclusteringtoreducethenegativeeffectofthebreakpointuncertaintyproblem AT zinkovaalzbeta improvingstructuralvariantclusteringtoreducethenegativeeffectofthebreakpointuncertaintyproblem AT zednikovaiveta improvingstructuralvariantclusteringtoreducethenegativeeffectofthebreakpointuncertaintyproblem AT simkovahalina improvingstructuralvariantclusteringtoreducethenegativeeffectofthebreakpointuncertaintyproblem AT stenzlvlastimil improvingstructuralvariantclusteringtoreducethenegativeeffectofthebreakpointuncertaintyproblem AT korabecnamarie improvingstructuralvariantclusteringtoreducethenegativeeffectofthebreakpointuncertaintyproblem |