Cargando…
iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data
BACKGROUND: Structural variations (SVs), such as insertions, deletions, inversions, and duplications, are a common feature in human genomes, and a number of studies have reported that such SVs are associated with human diseases. Although the progress of next generation sequencing (NGS) technologies...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4029547/ https://www.ncbi.nlm.nih.gov/pubmed/24564972 http://dx.doi.org/10.1186/1752-0509-7-S6-S8 |
_version_ | 1782317228266356736 |
---|---|
author | Mimori, Takahiro Nariai, Naoki Kojima, Kaname Takahashi, Mamoru Ono, Akira Sato, Yukuto Yamaguchi-Kabata, Yumi Nagasaki, Masao |
author_facet | Mimori, Takahiro Nariai, Naoki Kojima, Kaname Takahashi, Mamoru Ono, Akira Sato, Yukuto Yamaguchi-Kabata, Yumi Nagasaki, Masao |
author_sort | Mimori, Takahiro |
collection | PubMed |
description | BACKGROUND: Structural variations (SVs), such as insertions, deletions, inversions, and duplications, are a common feature in human genomes, and a number of studies have reported that such SVs are associated with human diseases. Although the progress of next generation sequencing (NGS) technologies has led to the discovery of a large number of SVs, accurate and genome-wide detection of SVs remains challenging. Thus far, various calling algorithms based on NGS data have been proposed. However, their strategies are diverse and there is no tool able to detect a full range of SVs accurately. RESULTS: We focused on evaluating the performance of existing deletion calling algorithms for various spanning ranges from low- to high-coverage simulation data. The simulation data was generated from a whole genome sequence with artificial SVs constructed based on the distribution of variants obtained from the 1000 Genomes Project. From the simulation analysis, deletion calls of various deletion sizes were obtained with each caller, and it was found that the performance was quite different according to the type of algorithms and targeting deletion size. Based on these results, we propose an integrated structural variant calling pipeline (iSVP) that combines existing methods with a newly devised filtering and merging processes. It achieved highly accurate deletion calling with >90% precision and >90% recall on the 30× read data for a broad range of size. We applied iSVP to the whole-genome sequence data of a CEU HapMap sample, and detected a large number of deletions, including notable peaks around 300 bp and 6,000 bp, which corresponded to Alus and long interspersed nuclear elements, respectively. In addition, many of the predicted deletions were highly consistent with experimentally validated ones by other studies. CONCLUSIONS: We present iSVP, a new deletion calling pipeline to obtain a genome-wide landscape of deletions in a highly accurate manner. From simulation and real data analysis, we show that iSVP is broadly applicable to human whole-genome sequencing data, which will elucidate relationships between SVs across genomes and associated diseases or biological functions. |
format | Online Article Text |
id | pubmed-4029547 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-40295472014-06-06 iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data Mimori, Takahiro Nariai, Naoki Kojima, Kaname Takahashi, Mamoru Ono, Akira Sato, Yukuto Yamaguchi-Kabata, Yumi Nagasaki, Masao BMC Syst Biol Research BACKGROUND: Structural variations (SVs), such as insertions, deletions, inversions, and duplications, are a common feature in human genomes, and a number of studies have reported that such SVs are associated with human diseases. Although the progress of next generation sequencing (NGS) technologies has led to the discovery of a large number of SVs, accurate and genome-wide detection of SVs remains challenging. Thus far, various calling algorithms based on NGS data have been proposed. However, their strategies are diverse and there is no tool able to detect a full range of SVs accurately. RESULTS: We focused on evaluating the performance of existing deletion calling algorithms for various spanning ranges from low- to high-coverage simulation data. The simulation data was generated from a whole genome sequence with artificial SVs constructed based on the distribution of variants obtained from the 1000 Genomes Project. From the simulation analysis, deletion calls of various deletion sizes were obtained with each caller, and it was found that the performance was quite different according to the type of algorithms and targeting deletion size. Based on these results, we propose an integrated structural variant calling pipeline (iSVP) that combines existing methods with a newly devised filtering and merging processes. It achieved highly accurate deletion calling with >90% precision and >90% recall on the 30× read data for a broad range of size. We applied iSVP to the whole-genome sequence data of a CEU HapMap sample, and detected a large number of deletions, including notable peaks around 300 bp and 6,000 bp, which corresponded to Alus and long interspersed nuclear elements, respectively. In addition, many of the predicted deletions were highly consistent with experimentally validated ones by other studies. CONCLUSIONS: We present iSVP, a new deletion calling pipeline to obtain a genome-wide landscape of deletions in a highly accurate manner. From simulation and real data analysis, we show that iSVP is broadly applicable to human whole-genome sequencing data, which will elucidate relationships between SVs across genomes and associated diseases or biological functions. BioMed Central 2013-12-13 /pmc/articles/PMC4029547/ /pubmed/24564972 http://dx.doi.org/10.1186/1752-0509-7-S6-S8 Text en Copyright © 2013 Mimori et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Mimori, Takahiro Nariai, Naoki Kojima, Kaname Takahashi, Mamoru Ono, Akira Sato, Yukuto Yamaguchi-Kabata, Yumi Nagasaki, Masao iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data |
title | iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data |
title_full | iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data |
title_fullStr | iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data |
title_full_unstemmed | iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data |
title_short | iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data |
title_sort | isvp: an integrated structural variant calling pipeline from high-throughput sequencing data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4029547/ https://www.ncbi.nlm.nih.gov/pubmed/24564972 http://dx.doi.org/10.1186/1752-0509-7-S6-S8 |
work_keys_str_mv | AT mimoritakahiro isvpanintegratedstructuralvariantcallingpipelinefromhighthroughputsequencingdata AT nariainaoki isvpanintegratedstructuralvariantcallingpipelinefromhighthroughputsequencingdata AT kojimakaname isvpanintegratedstructuralvariantcallingpipelinefromhighthroughputsequencingdata AT takahashimamoru isvpanintegratedstructuralvariantcallingpipelinefromhighthroughputsequencingdata AT onoakira isvpanintegratedstructuralvariantcallingpipelinefromhighthroughputsequencingdata AT satoyukuto isvpanintegratedstructuralvariantcallingpipelinefromhighthroughputsequencingdata AT yamaguchikabatayumi isvpanintegratedstructuralvariantcallingpipelinefromhighthroughputsequencingdata AT nagasakimasao isvpanintegratedstructuralvariantcallingpipelinefromhighthroughputsequencingdata |