Cargando…

stLFRsv: A Germline Structural Variant Analysis Pipeline Using Co-barcoded Reads

Co-barcoded reads originating from long DNA fragments (mean length >30 kbp) maintain both single base level accuracy and long-range genomic information. We propose a pipeline, stLFRsv, to detect structural variation using co-barcoded reads. stLFRsv identifies abnormal large gaps between co-barcod...

Descripción completa

Detalles Bibliográficos
Autores principales: Guo, Junfu, Shi, Chang, Chen, Xi, Wang, Ou, Liu, Ping, Yang, Huanming, Xu, Xun, Zhang, Wenwei, Zhu, Hongmei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8012683/
https://www.ncbi.nlm.nih.gov/pubmed/33815469
http://dx.doi.org/10.3389/fgene.2021.636239
_version_ 1783673416634597376
author Guo, Junfu
Shi, Chang
Chen, Xi
Wang, Ou
Liu, Ping
Yang, Huanming
Xu, Xun
Zhang, Wenwei
Zhu, Hongmei
author_facet Guo, Junfu
Shi, Chang
Chen, Xi
Wang, Ou
Liu, Ping
Yang, Huanming
Xu, Xun
Zhang, Wenwei
Zhu, Hongmei
author_sort Guo, Junfu
collection PubMed
description Co-barcoded reads originating from long DNA fragments (mean length >30 kbp) maintain both single base level accuracy and long-range genomic information. We propose a pipeline, stLFRsv, to detect structural variation using co-barcoded reads. stLFRsv identifies abnormal large gaps between co-barcoded reads to detect potential breakpoints and reconstruct complex structural variants (SVs). Haplotype phasing by co-barcoded reads increases the signal to noise ratio, and barcode sharing profiles are used to filter out false positives. We integrate the short read SV caller smoove for smaller variants with stLFRsv. The integrated pipeline was evaluated on the well-characterized genome HG002/NA24385, and 74.5% precision and a 22.4% recall rate were obtained for deletions. stLFRsv revealed some large variants not included in the benchmark set that were verified by long reads or assembly. For the HG001/NA12878 genome, stLFRsv also achieved the best performance for both resource usage and the detection of large variants. Our work indicates that co-barcoded read technology has the potential to improve genome completeness.
format Online
Article
Text
id pubmed-8012683
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-80126832021-04-02 stLFRsv: A Germline Structural Variant Analysis Pipeline Using Co-barcoded Reads Guo, Junfu Shi, Chang Chen, Xi Wang, Ou Liu, Ping Yang, Huanming Xu, Xun Zhang, Wenwei Zhu, Hongmei Front Genet Genetics Co-barcoded reads originating from long DNA fragments (mean length >30 kbp) maintain both single base level accuracy and long-range genomic information. We propose a pipeline, stLFRsv, to detect structural variation using co-barcoded reads. stLFRsv identifies abnormal large gaps between co-barcoded reads to detect potential breakpoints and reconstruct complex structural variants (SVs). Haplotype phasing by co-barcoded reads increases the signal to noise ratio, and barcode sharing profiles are used to filter out false positives. We integrate the short read SV caller smoove for smaller variants with stLFRsv. The integrated pipeline was evaluated on the well-characterized genome HG002/NA24385, and 74.5% precision and a 22.4% recall rate were obtained for deletions. stLFRsv revealed some large variants not included in the benchmark set that were verified by long reads or assembly. For the HG001/NA12878 genome, stLFRsv also achieved the best performance for both resource usage and the detection of large variants. Our work indicates that co-barcoded read technology has the potential to improve genome completeness. Frontiers Media S.A. 2021-03-18 /pmc/articles/PMC8012683/ /pubmed/33815469 http://dx.doi.org/10.3389/fgene.2021.636239 Text en Copyright © 2021 Guo, Shi, Chen, Wang, Liu, Yang, Xu, Zhang and Zhu. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Guo, Junfu
Shi, Chang
Chen, Xi
Wang, Ou
Liu, Ping
Yang, Huanming
Xu, Xun
Zhang, Wenwei
Zhu, Hongmei
stLFRsv: A Germline Structural Variant Analysis Pipeline Using Co-barcoded Reads
title stLFRsv: A Germline Structural Variant Analysis Pipeline Using Co-barcoded Reads
title_full stLFRsv: A Germline Structural Variant Analysis Pipeline Using Co-barcoded Reads
title_fullStr stLFRsv: A Germline Structural Variant Analysis Pipeline Using Co-barcoded Reads
title_full_unstemmed stLFRsv: A Germline Structural Variant Analysis Pipeline Using Co-barcoded Reads
title_short stLFRsv: A Germline Structural Variant Analysis Pipeline Using Co-barcoded Reads
title_sort stlfrsv: a germline structural variant analysis pipeline using co-barcoded reads
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8012683/
https://www.ncbi.nlm.nih.gov/pubmed/33815469
http://dx.doi.org/10.3389/fgene.2021.636239
work_keys_str_mv AT guojunfu stlfrsvagermlinestructuralvariantanalysispipelineusingcobarcodedreads
AT shichang stlfrsvagermlinestructuralvariantanalysispipelineusingcobarcodedreads
AT chenxi stlfrsvagermlinestructuralvariantanalysispipelineusingcobarcodedreads
AT wangou stlfrsvagermlinestructuralvariantanalysispipelineusingcobarcodedreads
AT liuping stlfrsvagermlinestructuralvariantanalysispipelineusingcobarcodedreads
AT yanghuanming stlfrsvagermlinestructuralvariantanalysispipelineusingcobarcodedreads
AT xuxun stlfrsvagermlinestructuralvariantanalysispipelineusingcobarcodedreads
AT zhangwenwei stlfrsvagermlinestructuralvariantanalysispipelineusingcobarcodedreads
AT zhuhongmei stlfrsvagermlinestructuralvariantanalysispipelineusingcobarcodedreads