Cargando…

Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing

BACKGROUND: Structural variations (SVs) or copy number variations (CNVs) greatly impact the functions of the genes encoded in the genome and are responsible for diverse human diseases. Although a number of existing SV detection algorithms can detect many types of SVs using whole genome sequencing (W...

Descripción completa

Detalles Bibliográficos
Autores principales: Kosugi, Shunichi, Momozawa, Yukihide, Liu, Xiaoxi, Terao, Chikashi, Kubo, Michiaki, Kamatani, Yoichiro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6547561/
https://www.ncbi.nlm.nih.gov/pubmed/31159850
http://dx.doi.org/10.1186/s13059-019-1720-5
_version_ 1783423706083622912
author Kosugi, Shunichi
Momozawa, Yukihide
Liu, Xiaoxi
Terao, Chikashi
Kubo, Michiaki
Kamatani, Yoichiro
author_facet Kosugi, Shunichi
Momozawa, Yukihide
Liu, Xiaoxi
Terao, Chikashi
Kubo, Michiaki
Kamatani, Yoichiro
author_sort Kosugi, Shunichi
collection PubMed
description BACKGROUND: Structural variations (SVs) or copy number variations (CNVs) greatly impact the functions of the genes encoded in the genome and are responsible for diverse human diseases. Although a number of existing SV detection algorithms can detect many types of SVs using whole genome sequencing (WGS) data, no single algorithm can call every type of SVs with high precision and high recall. RESULTS: We comprehensively evaluate the performance of 69 existing SV detection algorithms using multiple simulated and real WGS datasets. The results highlight a subset of algorithms that accurately call SVs depending on specific types and size ranges of the SVs and that accurately determine breakpoints, sizes, and genotypes of the SVs. We enumerate potential good algorithms for each SV category, among which GRIDSS, Lumpy, SVseq2, SoftSV, Manta, and Wham are better algorithms in deletion or duplication categories. To improve the accuracy of SV calling, we systematically evaluate the accuracy of overlapping calls between possible combinations of algorithms for every type and size range of SVs. The results demonstrate that both the precision and recall for overlapping calls vary depending on the combinations of specific algorithms rather than the combinations of methods used in the algorithms. CONCLUSION: These results suggest that careful selection of the algorithms for each type and size range of SVs is required for accurate calling of SVs. The selection of specific pairs of algorithms for overlapping calls promises to effectively improve the SV detection accuracy. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1720-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6547561
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-65475612019-06-06 Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing Kosugi, Shunichi Momozawa, Yukihide Liu, Xiaoxi Terao, Chikashi Kubo, Michiaki Kamatani, Yoichiro Genome Biol Research BACKGROUND: Structural variations (SVs) or copy number variations (CNVs) greatly impact the functions of the genes encoded in the genome and are responsible for diverse human diseases. Although a number of existing SV detection algorithms can detect many types of SVs using whole genome sequencing (WGS) data, no single algorithm can call every type of SVs with high precision and high recall. RESULTS: We comprehensively evaluate the performance of 69 existing SV detection algorithms using multiple simulated and real WGS datasets. The results highlight a subset of algorithms that accurately call SVs depending on specific types and size ranges of the SVs and that accurately determine breakpoints, sizes, and genotypes of the SVs. We enumerate potential good algorithms for each SV category, among which GRIDSS, Lumpy, SVseq2, SoftSV, Manta, and Wham are better algorithms in deletion or duplication categories. To improve the accuracy of SV calling, we systematically evaluate the accuracy of overlapping calls between possible combinations of algorithms for every type and size range of SVs. The results demonstrate that both the precision and recall for overlapping calls vary depending on the combinations of specific algorithms rather than the combinations of methods used in the algorithms. CONCLUSION: These results suggest that careful selection of the algorithms for each type and size range of SVs is required for accurate calling of SVs. The selection of specific pairs of algorithms for overlapping calls promises to effectively improve the SV detection accuracy. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13059-019-1720-5) contains supplementary material, which is available to authorized users. BioMed Central 2019-06-03 /pmc/articles/PMC6547561/ /pubmed/31159850 http://dx.doi.org/10.1186/s13059-019-1720-5 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Kosugi, Shunichi
Momozawa, Yukihide
Liu, Xiaoxi
Terao, Chikashi
Kubo, Michiaki
Kamatani, Yoichiro
Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing
title Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing
title_full Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing
title_fullStr Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing
title_full_unstemmed Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing
title_short Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing
title_sort comprehensive evaluation of structural variation detection algorithms for whole genome sequencing
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6547561/
https://www.ncbi.nlm.nih.gov/pubmed/31159850
http://dx.doi.org/10.1186/s13059-019-1720-5
work_keys_str_mv AT kosugishunichi comprehensiveevaluationofstructuralvariationdetectionalgorithmsforwholegenomesequencing
AT momozawayukihide comprehensiveevaluationofstructuralvariationdetectionalgorithmsforwholegenomesequencing
AT liuxiaoxi comprehensiveevaluationofstructuralvariationdetectionalgorithmsforwholegenomesequencing
AT teraochikashi comprehensiveevaluationofstructuralvariationdetectionalgorithmsforwholegenomesequencing
AT kubomichiaki comprehensiveevaluationofstructuralvariationdetectionalgorithmsforwholegenomesequencing
AT kamataniyoichiro comprehensiveevaluationofstructuralvariationdetectionalgorithmsforwholegenomesequencing