Cargando…

iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data

BACKGROUND: Structural variations (SVs), such as insertions, deletions, inversions, and duplications, are a common feature in human genomes, and a number of studies have reported that such SVs are associated with human diseases. Although the progress of next generation sequencing (NGS) technologies...

Descripción completa

Detalles Bibliográficos
Autores principales: Mimori, Takahiro, Nariai, Naoki, Kojima, Kaname, Takahashi, Mamoru, Ono, Akira, Sato, Yukuto, Yamaguchi-Kabata, Yumi, Nagasaki, Masao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4029547/
https://www.ncbi.nlm.nih.gov/pubmed/24564972
http://dx.doi.org/10.1186/1752-0509-7-S6-S8
_version_ 1782317228266356736
author Mimori, Takahiro
Nariai, Naoki
Kojima, Kaname
Takahashi, Mamoru
Ono, Akira
Sato, Yukuto
Yamaguchi-Kabata, Yumi
Nagasaki, Masao
author_facet Mimori, Takahiro
Nariai, Naoki
Kojima, Kaname
Takahashi, Mamoru
Ono, Akira
Sato, Yukuto
Yamaguchi-Kabata, Yumi
Nagasaki, Masao
author_sort Mimori, Takahiro
collection PubMed
description BACKGROUND: Structural variations (SVs), such as insertions, deletions, inversions, and duplications, are a common feature in human genomes, and a number of studies have reported that such SVs are associated with human diseases. Although the progress of next generation sequencing (NGS) technologies has led to the discovery of a large number of SVs, accurate and genome-wide detection of SVs remains challenging. Thus far, various calling algorithms based on NGS data have been proposed. However, their strategies are diverse and there is no tool able to detect a full range of SVs accurately. RESULTS: We focused on evaluating the performance of existing deletion calling algorithms for various spanning ranges from low- to high-coverage simulation data. The simulation data was generated from a whole genome sequence with artificial SVs constructed based on the distribution of variants obtained from the 1000 Genomes Project. From the simulation analysis, deletion calls of various deletion sizes were obtained with each caller, and it was found that the performance was quite different according to the type of algorithms and targeting deletion size. Based on these results, we propose an integrated structural variant calling pipeline (iSVP) that combines existing methods with a newly devised filtering and merging processes. It achieved highly accurate deletion calling with >90% precision and >90% recall on the 30× read data for a broad range of size. We applied iSVP to the whole-genome sequence data of a CEU HapMap sample, and detected a large number of deletions, including notable peaks around 300 bp and 6,000 bp, which corresponded to Alus and long interspersed nuclear elements, respectively. In addition, many of the predicted deletions were highly consistent with experimentally validated ones by other studies. CONCLUSIONS: We present iSVP, a new deletion calling pipeline to obtain a genome-wide landscape of deletions in a highly accurate manner. From simulation and real data analysis, we show that iSVP is broadly applicable to human whole-genome sequencing data, which will elucidate relationships between SVs across genomes and associated diseases or biological functions.
format Online
Article
Text
id pubmed-4029547
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40295472014-06-06 iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data Mimori, Takahiro Nariai, Naoki Kojima, Kaname Takahashi, Mamoru Ono, Akira Sato, Yukuto Yamaguchi-Kabata, Yumi Nagasaki, Masao BMC Syst Biol Research BACKGROUND: Structural variations (SVs), such as insertions, deletions, inversions, and duplications, are a common feature in human genomes, and a number of studies have reported that such SVs are associated with human diseases. Although the progress of next generation sequencing (NGS) technologies has led to the discovery of a large number of SVs, accurate and genome-wide detection of SVs remains challenging. Thus far, various calling algorithms based on NGS data have been proposed. However, their strategies are diverse and there is no tool able to detect a full range of SVs accurately. RESULTS: We focused on evaluating the performance of existing deletion calling algorithms for various spanning ranges from low- to high-coverage simulation data. The simulation data was generated from a whole genome sequence with artificial SVs constructed based on the distribution of variants obtained from the 1000 Genomes Project. From the simulation analysis, deletion calls of various deletion sizes were obtained with each caller, and it was found that the performance was quite different according to the type of algorithms and targeting deletion size. Based on these results, we propose an integrated structural variant calling pipeline (iSVP) that combines existing methods with a newly devised filtering and merging processes. It achieved highly accurate deletion calling with >90% precision and >90% recall on the 30× read data for a broad range of size. We applied iSVP to the whole-genome sequence data of a CEU HapMap sample, and detected a large number of deletions, including notable peaks around 300 bp and 6,000 bp, which corresponded to Alus and long interspersed nuclear elements, respectively. In addition, many of the predicted deletions were highly consistent with experimentally validated ones by other studies. CONCLUSIONS: We present iSVP, a new deletion calling pipeline to obtain a genome-wide landscape of deletions in a highly accurate manner. From simulation and real data analysis, we show that iSVP is broadly applicable to human whole-genome sequencing data, which will elucidate relationships between SVs across genomes and associated diseases or biological functions. BioMed Central 2013-12-13 /pmc/articles/PMC4029547/ /pubmed/24564972 http://dx.doi.org/10.1186/1752-0509-7-S6-S8 Text en Copyright © 2013 Mimori et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Mimori, Takahiro
Nariai, Naoki
Kojima, Kaname
Takahashi, Mamoru
Ono, Akira
Sato, Yukuto
Yamaguchi-Kabata, Yumi
Nagasaki, Masao
iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data
title iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data
title_full iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data
title_fullStr iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data
title_full_unstemmed iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data
title_short iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data
title_sort isvp: an integrated structural variant calling pipeline from high-throughput sequencing data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4029547/
https://www.ncbi.nlm.nih.gov/pubmed/24564972
http://dx.doi.org/10.1186/1752-0509-7-S6-S8
work_keys_str_mv AT mimoritakahiro isvpanintegratedstructuralvariantcallingpipelinefromhighthroughputsequencingdata
AT nariainaoki isvpanintegratedstructuralvariantcallingpipelinefromhighthroughputsequencingdata
AT kojimakaname isvpanintegratedstructuralvariantcallingpipelinefromhighthroughputsequencingdata
AT takahashimamoru isvpanintegratedstructuralvariantcallingpipelinefromhighthroughputsequencingdata
AT onoakira isvpanintegratedstructuralvariantcallingpipelinefromhighthroughputsequencingdata
AT satoyukuto isvpanintegratedstructuralvariantcallingpipelinefromhighthroughputsequencingdata
AT yamaguchikabatayumi isvpanintegratedstructuralvariantcallingpipelinefromhighthroughputsequencingdata
AT nagasakimasao isvpanintegratedstructuralvariantcallingpipelinefromhighthroughputsequencingdata