Cargando…

Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms

Nucleotides ratcheted through the biomolecular pores of nanopore sequencers generate raw picoamperage currents, which are segmented into step-current level signals representing the nucleotide sequence. These ‘squiggles’ are a noisy, distorted representation of the underlying true stepped current lev...

Descripción completa

Detalles Bibliográficos
Autores principales: Smith, Michael, Chan, Rachel, Gordon, Paul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6638935/
https://www.ncbi.nlm.nih.gov/pubmed/31318901
http://dx.doi.org/10.1371/journal.pone.0219495
_version_ 1783436378223149056
author Smith, Michael
Chan, Rachel
Gordon, Paul
author_facet Smith, Michael
Chan, Rachel
Gordon, Paul
author_sort Smith, Michael
collection PubMed
description Nucleotides ratcheted through the biomolecular pores of nanopore sequencers generate raw picoamperage currents, which are segmented into step-current level signals representing the nucleotide sequence. These ‘squiggles’ are a noisy, distorted representation of the underlying true stepped current levels due to experimental and algorithmic factors. We were interested in developing a simulation model to support a white-box approach to identify common distortions, rather than relying on commonly used black box neural network techniques for basecalling nanopore signals. Dynamic time warped-space averaging (DTWA) techniques can generate a consensus from multiple noisy signals without introducing key feature distortions that occur with standard averaging. As a preprocessing tool, DTWA could provide cleaner and more accurate current signals for direct RNA or DNA analysis tools. However, DTWA approaches need modification to take advantage of the a-priori knowledge regarding a common, underlying gold-standard RNA / DNA sequence. Using experimental data, we derive a simulation model to provide known squiggle distortion signals to assist in validating the performance of analysis tools such as DTWA. Simulation models were evaluated by comparing mocked and experimental squiggle characteristics from one Enolase mRNA squiggle group produced by an Oxford MinION nanopore sequencer, and cross-validated using other Enolase, Sequin R1_71_1 and Sequin R2_55_3 mRNA studies. New techniques identified high inserted but low deleted base rates, generating consistent x1.7 squiggle event to base called ratios. Similar probability density and cumulative distribution functions, PDF and CDF, were found across all studies. Experimental PDFs were not the normal distributions expected if squiggle distortion arose from segmentation algorithm artefacts, or through individual nucleotides randomly interacting with individual nanopores. Matching experimental and mocked CDFs required the assumption that there are unique features associated with individual raw-current data streams. Z-normalized signal-to-noise ratios suggest intrinsic sensor limitations being responsible for half the gold standard and noisy squiggle DTW differences.
format Online
Article
Text
id pubmed-6638935
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-66389352019-07-25 Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms Smith, Michael Chan, Rachel Gordon, Paul PLoS One Research Article Nucleotides ratcheted through the biomolecular pores of nanopore sequencers generate raw picoamperage currents, which are segmented into step-current level signals representing the nucleotide sequence. These ‘squiggles’ are a noisy, distorted representation of the underlying true stepped current levels due to experimental and algorithmic factors. We were interested in developing a simulation model to support a white-box approach to identify common distortions, rather than relying on commonly used black box neural network techniques for basecalling nanopore signals. Dynamic time warped-space averaging (DTWA) techniques can generate a consensus from multiple noisy signals without introducing key feature distortions that occur with standard averaging. As a preprocessing tool, DTWA could provide cleaner and more accurate current signals for direct RNA or DNA analysis tools. However, DTWA approaches need modification to take advantage of the a-priori knowledge regarding a common, underlying gold-standard RNA / DNA sequence. Using experimental data, we derive a simulation model to provide known squiggle distortion signals to assist in validating the performance of analysis tools such as DTWA. Simulation models were evaluated by comparing mocked and experimental squiggle characteristics from one Enolase mRNA squiggle group produced by an Oxford MinION nanopore sequencer, and cross-validated using other Enolase, Sequin R1_71_1 and Sequin R2_55_3 mRNA studies. New techniques identified high inserted but low deleted base rates, generating consistent x1.7 squiggle event to base called ratios. Similar probability density and cumulative distribution functions, PDF and CDF, were found across all studies. Experimental PDFs were not the normal distributions expected if squiggle distortion arose from segmentation algorithm artefacts, or through individual nucleotides randomly interacting with individual nanopores. Matching experimental and mocked CDFs required the assumption that there are unique features associated with individual raw-current data streams. Z-normalized signal-to-noise ratios suggest intrinsic sensor limitations being responsible for half the gold standard and noisy squiggle DTW differences. Public Library of Science 2019-07-18 /pmc/articles/PMC6638935/ /pubmed/31318901 http://dx.doi.org/10.1371/journal.pone.0219495 Text en © 2019 Smith et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Smith, Michael
Chan, Rachel
Gordon, Paul
Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms
title Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms
title_full Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms
title_fullStr Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms
title_full_unstemmed Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms
title_short Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms
title_sort evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6638935/
https://www.ncbi.nlm.nih.gov/pubmed/31318901
http://dx.doi.org/10.1371/journal.pone.0219495
work_keys_str_mv AT smithmichael evaluationofsimulationmodelstomimicthedistortionsintroducedintosquigglesbynanoporesequencersandsegmentationalgorithms
AT chanrachel evaluationofsimulationmodelstomimicthedistortionsintroducedintosquigglesbynanoporesequencersandsegmentationalgorithms
AT gordonpaul evaluationofsimulationmodelstomimicthedistortionsintroducedintosquigglesbynanoporesequencersandsegmentationalgorithms