Cargando…

Evaluating nanopore sequencing data processing pipelines for structural variation identification

BACKGROUND: Structural variations (SVs) account for about 1% of the differences among human genomes and play a significant role in phenotypic variation and disease susceptibility. The emerging nanopore sequencing technology can generate long sequence reads and can potentially provide accurate SV ide...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Anbo, Lin, Timothy, Xing, Jinchuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6857234/
https://www.ncbi.nlm.nih.gov/pubmed/31727126
http://dx.doi.org/10.1186/s13059-019-1858-1
_version_ 1783470724388749312
author Zhou, Anbo
Lin, Timothy
Xing, Jinchuan
author_facet Zhou, Anbo
Lin, Timothy
Xing, Jinchuan
author_sort Zhou, Anbo
collection PubMed
description BACKGROUND: Structural variations (SVs) account for about 1% of the differences among human genomes and play a significant role in phenotypic variation and disease susceptibility. The emerging nanopore sequencing technology can generate long sequence reads and can potentially provide accurate SV identification. However, the tools for aligning long-read data and detecting SVs have not been thoroughly evaluated. RESULTS: Using four nanopore datasets, including both empirical and simulated reads, we evaluate four alignment tools and three SV detection tools. We also evaluate the impact of sequencing depth on SV detection. Finally, we develop a machine learning approach to integrate call sets from multiple pipelines. Overall SV callers’ performance varies depending on the SV types. For an initial data assessment, we recommend using aligner minimap2 in combination with SV caller Sniffles because of their speed and relatively balanced performance. For detailed analysis, we recommend incorporating information from multiple call sets to improve the SV call performance. CONCLUSIONS: We present a workflow for evaluating aligners and SV callers for nanopore sequencing data and approaches for integrating multiple call sets. Our results indicate that additional optimizations are needed to improve SV detection accuracy and sensitivity, and an integrated call set can provide enhanced performance. The nanopore technology is improving, and the sequencing community is likely to grow accordingly. In turn, better benchmark call sets will be available to more accurately assess the performance of available tools and facilitate further tool development.
format Online
Article
Text
id pubmed-6857234
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68572342019-11-29 Evaluating nanopore sequencing data processing pipelines for structural variation identification Zhou, Anbo Lin, Timothy Xing, Jinchuan Genome Biol Research BACKGROUND: Structural variations (SVs) account for about 1% of the differences among human genomes and play a significant role in phenotypic variation and disease susceptibility. The emerging nanopore sequencing technology can generate long sequence reads and can potentially provide accurate SV identification. However, the tools for aligning long-read data and detecting SVs have not been thoroughly evaluated. RESULTS: Using four nanopore datasets, including both empirical and simulated reads, we evaluate four alignment tools and three SV detection tools. We also evaluate the impact of sequencing depth on SV detection. Finally, we develop a machine learning approach to integrate call sets from multiple pipelines. Overall SV callers’ performance varies depending on the SV types. For an initial data assessment, we recommend using aligner minimap2 in combination with SV caller Sniffles because of their speed and relatively balanced performance. For detailed analysis, we recommend incorporating information from multiple call sets to improve the SV call performance. CONCLUSIONS: We present a workflow for evaluating aligners and SV callers for nanopore sequencing data and approaches for integrating multiple call sets. Our results indicate that additional optimizations are needed to improve SV detection accuracy and sensitivity, and an integrated call set can provide enhanced performance. The nanopore technology is improving, and the sequencing community is likely to grow accordingly. In turn, better benchmark call sets will be available to more accurately assess the performance of available tools and facilitate further tool development. BioMed Central 2019-11-14 /pmc/articles/PMC6857234/ /pubmed/31727126 http://dx.doi.org/10.1186/s13059-019-1858-1 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Zhou, Anbo
Lin, Timothy
Xing, Jinchuan
Evaluating nanopore sequencing data processing pipelines for structural variation identification
title Evaluating nanopore sequencing data processing pipelines for structural variation identification
title_full Evaluating nanopore sequencing data processing pipelines for structural variation identification
title_fullStr Evaluating nanopore sequencing data processing pipelines for structural variation identification
title_full_unstemmed Evaluating nanopore sequencing data processing pipelines for structural variation identification
title_short Evaluating nanopore sequencing data processing pipelines for structural variation identification
title_sort evaluating nanopore sequencing data processing pipelines for structural variation identification
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6857234/
https://www.ncbi.nlm.nih.gov/pubmed/31727126
http://dx.doi.org/10.1186/s13059-019-1858-1
work_keys_str_mv AT zhouanbo evaluatingnanoporesequencingdataprocessingpipelinesforstructuralvariationidentification
AT lintimothy evaluatingnanoporesequencingdataprocessingpipelinesforstructuralvariationidentification
AT xingjinchuan evaluatingnanoporesequencingdataprocessingpipelinesforstructuralvariationidentification