Cargando…
Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms
Nucleotides ratcheted through the biomolecular pores of nanopore sequencers generate raw picoamperage currents, which are segmented into step-current level signals representing the nucleotide sequence. These ‘squiggles’ are a noisy, distorted representation of the underlying true stepped current lev...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6638935/ https://www.ncbi.nlm.nih.gov/pubmed/31318901 http://dx.doi.org/10.1371/journal.pone.0219495 |
_version_ | 1783436378223149056 |
---|---|
author | Smith, Michael Chan, Rachel Gordon, Paul |
author_facet | Smith, Michael Chan, Rachel Gordon, Paul |
author_sort | Smith, Michael |
collection | PubMed |
description | Nucleotides ratcheted through the biomolecular pores of nanopore sequencers generate raw picoamperage currents, which are segmented into step-current level signals representing the nucleotide sequence. These ‘squiggles’ are a noisy, distorted representation of the underlying true stepped current levels due to experimental and algorithmic factors. We were interested in developing a simulation model to support a white-box approach to identify common distortions, rather than relying on commonly used black box neural network techniques for basecalling nanopore signals. Dynamic time warped-space averaging (DTWA) techniques can generate a consensus from multiple noisy signals without introducing key feature distortions that occur with standard averaging. As a preprocessing tool, DTWA could provide cleaner and more accurate current signals for direct RNA or DNA analysis tools. However, DTWA approaches need modification to take advantage of the a-priori knowledge regarding a common, underlying gold-standard RNA / DNA sequence. Using experimental data, we derive a simulation model to provide known squiggle distortion signals to assist in validating the performance of analysis tools such as DTWA. Simulation models were evaluated by comparing mocked and experimental squiggle characteristics from one Enolase mRNA squiggle group produced by an Oxford MinION nanopore sequencer, and cross-validated using other Enolase, Sequin R1_71_1 and Sequin R2_55_3 mRNA studies. New techniques identified high inserted but low deleted base rates, generating consistent x1.7 squiggle event to base called ratios. Similar probability density and cumulative distribution functions, PDF and CDF, were found across all studies. Experimental PDFs were not the normal distributions expected if squiggle distortion arose from segmentation algorithm artefacts, or through individual nucleotides randomly interacting with individual nanopores. Matching experimental and mocked CDFs required the assumption that there are unique features associated with individual raw-current data streams. Z-normalized signal-to-noise ratios suggest intrinsic sensor limitations being responsible for half the gold standard and noisy squiggle DTW differences. |
format | Online Article Text |
id | pubmed-6638935 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-66389352019-07-25 Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms Smith, Michael Chan, Rachel Gordon, Paul PLoS One Research Article Nucleotides ratcheted through the biomolecular pores of nanopore sequencers generate raw picoamperage currents, which are segmented into step-current level signals representing the nucleotide sequence. These ‘squiggles’ are a noisy, distorted representation of the underlying true stepped current levels due to experimental and algorithmic factors. We were interested in developing a simulation model to support a white-box approach to identify common distortions, rather than relying on commonly used black box neural network techniques for basecalling nanopore signals. Dynamic time warped-space averaging (DTWA) techniques can generate a consensus from multiple noisy signals without introducing key feature distortions that occur with standard averaging. As a preprocessing tool, DTWA could provide cleaner and more accurate current signals for direct RNA or DNA analysis tools. However, DTWA approaches need modification to take advantage of the a-priori knowledge regarding a common, underlying gold-standard RNA / DNA sequence. Using experimental data, we derive a simulation model to provide known squiggle distortion signals to assist in validating the performance of analysis tools such as DTWA. Simulation models were evaluated by comparing mocked and experimental squiggle characteristics from one Enolase mRNA squiggle group produced by an Oxford MinION nanopore sequencer, and cross-validated using other Enolase, Sequin R1_71_1 and Sequin R2_55_3 mRNA studies. New techniques identified high inserted but low deleted base rates, generating consistent x1.7 squiggle event to base called ratios. Similar probability density and cumulative distribution functions, PDF and CDF, were found across all studies. Experimental PDFs were not the normal distributions expected if squiggle distortion arose from segmentation algorithm artefacts, or through individual nucleotides randomly interacting with individual nanopores. Matching experimental and mocked CDFs required the assumption that there are unique features associated with individual raw-current data streams. Z-normalized signal-to-noise ratios suggest intrinsic sensor limitations being responsible for half the gold standard and noisy squiggle DTW differences. Public Library of Science 2019-07-18 /pmc/articles/PMC6638935/ /pubmed/31318901 http://dx.doi.org/10.1371/journal.pone.0219495 Text en © 2019 Smith et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Smith, Michael Chan, Rachel Gordon, Paul Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms |
title | Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms |
title_full | Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms |
title_fullStr | Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms |
title_full_unstemmed | Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms |
title_short | Evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms |
title_sort | evaluation of simulation models to mimic the distortions introduced into squiggles by nanopore sequencers and segmentation algorithms |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6638935/ https://www.ncbi.nlm.nih.gov/pubmed/31318901 http://dx.doi.org/10.1371/journal.pone.0219495 |
work_keys_str_mv | AT smithmichael evaluationofsimulationmodelstomimicthedistortionsintroducedintosquigglesbynanoporesequencersandsegmentationalgorithms AT chanrachel evaluationofsimulationmodelstomimicthedistortionsintroducedintosquigglesbynanoporesequencersandsegmentationalgorithms AT gordonpaul evaluationofsimulationmodelstomimicthedistortionsintroducedintosquigglesbynanoporesequencersandsegmentationalgorithms |