Cargando…

Mako: A Graph-based Pattern Growth Approach to Detect Complex Structural Variants

Complex structural variants (CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants. However, detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy. As...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Jiadong, Yang, Xiaofei, Kosters, Walter, Xu, Tun, Jia, Yanyan, Wang, Songbo, Zhu, Qihui, Ryan, Mallory, Guo, Li, Zhang, Chengsheng, Lee, Charles, Devine, Scott E., Eichler, Evan E., Ye, Kai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9510932/
https://www.ncbi.nlm.nih.gov/pubmed/34224879
http://dx.doi.org/10.1016/j.gpb.2021.03.007
_version_ 1784797551932735488
author Lin, Jiadong
Yang, Xiaofei
Kosters, Walter
Xu, Tun
Jia, Yanyan
Wang, Songbo
Zhu, Qihui
Ryan, Mallory
Guo, Li
Zhang, Chengsheng
Lee, Charles
Devine, Scott E.
Eichler, Evan E.
Ye, Kai
author_facet Lin, Jiadong
Yang, Xiaofei
Kosters, Walter
Xu, Tun
Jia, Yanyan
Wang, Songbo
Zhu, Qihui
Ryan, Mallory
Guo, Li
Zhang, Chengsheng
Lee, Charles
Devine, Scott E.
Eichler, Evan E.
Ye, Kai
author_sort Lin, Jiadong
collection PubMed
description Complex structural variants (CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants. However, detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy. As a result, there has been limited progress for CSV discovery compared with simple structural variants. Here, we systematically analyzed the multi-breakpoint connection feature of CSVs, and proposed Mako, utilizing a bottom-up guided model-free strategy, to detect CSVs from paired-end short-read sequencing. Specifically, we implemented a graph-based pattern growth approach, where the graph depicts potential breakpoint connections, and pattern growth enables CSV detection without pre-defined models. Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms. Notably, validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%, where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp, respectively. Moreover, the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types, including two novel types of adjacent segment swap and tandem dispersed duplication. Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs. Mako is publicly available at https://github.com/xjtu-omics/Mako.
format Online
Article
Text
id pubmed-9510932
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-95109322022-09-27 Mako: A Graph-based Pattern Growth Approach to Detect Complex Structural Variants Lin, Jiadong Yang, Xiaofei Kosters, Walter Xu, Tun Jia, Yanyan Wang, Songbo Zhu, Qihui Ryan, Mallory Guo, Li Zhang, Chengsheng Lee, Charles Devine, Scott E. Eichler, Evan E. Ye, Kai Genomics Proteomics Bioinformatics Method Complex structural variants (CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants. However, detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy. As a result, there has been limited progress for CSV discovery compared with simple structural variants. Here, we systematically analyzed the multi-breakpoint connection feature of CSVs, and proposed Mako, utilizing a bottom-up guided model-free strategy, to detect CSVs from paired-end short-read sequencing. Specifically, we implemented a graph-based pattern growth approach, where the graph depicts potential breakpoint connections, and pattern growth enables CSV detection without pre-defined models. Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms. Notably, validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%, where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp, respectively. Moreover, the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types, including two novel types of adjacent segment swap and tandem dispersed duplication. Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs. Mako is publicly available at https://github.com/xjtu-omics/Mako. Elsevier 2022-02 2021-07-03 /pmc/articles/PMC9510932/ /pubmed/34224879 http://dx.doi.org/10.1016/j.gpb.2021.03.007 Text en © 2022 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Method
Lin, Jiadong
Yang, Xiaofei
Kosters, Walter
Xu, Tun
Jia, Yanyan
Wang, Songbo
Zhu, Qihui
Ryan, Mallory
Guo, Li
Zhang, Chengsheng
Lee, Charles
Devine, Scott E.
Eichler, Evan E.
Ye, Kai
Mako: A Graph-based Pattern Growth Approach to Detect Complex Structural Variants
title Mako: A Graph-based Pattern Growth Approach to Detect Complex Structural Variants
title_full Mako: A Graph-based Pattern Growth Approach to Detect Complex Structural Variants
title_fullStr Mako: A Graph-based Pattern Growth Approach to Detect Complex Structural Variants
title_full_unstemmed Mako: A Graph-based Pattern Growth Approach to Detect Complex Structural Variants
title_short Mako: A Graph-based Pattern Growth Approach to Detect Complex Structural Variants
title_sort mako: a graph-based pattern growth approach to detect complex structural variants
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9510932/
https://www.ncbi.nlm.nih.gov/pubmed/34224879
http://dx.doi.org/10.1016/j.gpb.2021.03.007
work_keys_str_mv AT linjiadong makoagraphbasedpatterngrowthapproachtodetectcomplexstructuralvariants
AT yangxiaofei makoagraphbasedpatterngrowthapproachtodetectcomplexstructuralvariants
AT kosterswalter makoagraphbasedpatterngrowthapproachtodetectcomplexstructuralvariants
AT xutun makoagraphbasedpatterngrowthapproachtodetectcomplexstructuralvariants
AT jiayanyan makoagraphbasedpatterngrowthapproachtodetectcomplexstructuralvariants
AT wangsongbo makoagraphbasedpatterngrowthapproachtodetectcomplexstructuralvariants
AT zhuqihui makoagraphbasedpatterngrowthapproachtodetectcomplexstructuralvariants
AT ryanmallory makoagraphbasedpatterngrowthapproachtodetectcomplexstructuralvariants
AT guoli makoagraphbasedpatterngrowthapproachtodetectcomplexstructuralvariants
AT zhangchengsheng makoagraphbasedpatterngrowthapproachtodetectcomplexstructuralvariants
AT makoagraphbasedpatterngrowthapproachtodetectcomplexstructuralvariants
AT leecharles makoagraphbasedpatterngrowthapproachtodetectcomplexstructuralvariants
AT devinescotte makoagraphbasedpatterngrowthapproachtodetectcomplexstructuralvariants
AT eichlerevane makoagraphbasedpatterngrowthapproachtodetectcomplexstructuralvariants
AT yekai makoagraphbasedpatterngrowthapproachtodetectcomplexstructuralvariants