Cargando…
Long-read amplicon denoising
Long-read next-generation amplicon sequencing shows promise for studying complete genes or genomes from complex and diverse populations. Current long-read sequencing technologies have challenging error profiles, hindering data processing and incorporation into downstream analyses. Here we consider t...
Autores principales: | , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6765106/ https://www.ncbi.nlm.nih.gov/pubmed/31418021 http://dx.doi.org/10.1093/nar/gkz657 |
_version_ | 1783454502611845120 |
---|---|
author | Kumar, Venkatesh Vollbrecht, Thomas Chernyshev, Mark Mohan, Sanjay Hanst, Brian Bavafa, Nicholas Lorenzo, Antonia Kumar, Nikesh Ketteringham, Robert Eren, Kemal Golden, Michael Oliveira, Michelli F Murrell, Ben |
author_facet | Kumar, Venkatesh Vollbrecht, Thomas Chernyshev, Mark Mohan, Sanjay Hanst, Brian Bavafa, Nicholas Lorenzo, Antonia Kumar, Nikesh Ketteringham, Robert Eren, Kemal Golden, Michael Oliveira, Michelli F Murrell, Ben |
author_sort | Kumar, Venkatesh |
collection | PubMed |
description | Long-read next-generation amplicon sequencing shows promise for studying complete genes or genomes from complex and diverse populations. Current long-read sequencing technologies have challenging error profiles, hindering data processing and incorporation into downstream analyses. Here we consider the problem of how to reconstruct, free of sequencing error, the true sequence variants and their associated frequencies from PacBio reads. Called ‘amplicon denoising’, this problem has been extensively studied for short-read sequencing technologies, but current solutions do not always successfully generalize to long reads with high indel error rates. We introduce two methods: one that runs nearly instantly and is very accurate for medium length reads and high template coverage, and another, slower method that is more robust when reads are very long or coverage is lower. On two Mock Virus Community datasets with ground truth, each sequenced on a different PacBio instrument, and on a number of simulated datasets, we compare our two approaches to each other and to existing algorithms. We outperform all tested methods in accuracy, with competitive run times even for our slower method, successfully discriminating templates that differ by a just single nucleotide. Julia implementations of Fast Amplicon Denoising (FAD) and Robust Amplicon Denoising (RAD), and a webserver interface, are freely available. |
format | Online Article Text |
id | pubmed-6765106 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-67651062019-10-02 Long-read amplicon denoising Kumar, Venkatesh Vollbrecht, Thomas Chernyshev, Mark Mohan, Sanjay Hanst, Brian Bavafa, Nicholas Lorenzo, Antonia Kumar, Nikesh Ketteringham, Robert Eren, Kemal Golden, Michael Oliveira, Michelli F Murrell, Ben Nucleic Acids Res Methods Online Long-read next-generation amplicon sequencing shows promise for studying complete genes or genomes from complex and diverse populations. Current long-read sequencing technologies have challenging error profiles, hindering data processing and incorporation into downstream analyses. Here we consider the problem of how to reconstruct, free of sequencing error, the true sequence variants and their associated frequencies from PacBio reads. Called ‘amplicon denoising’, this problem has been extensively studied for short-read sequencing technologies, but current solutions do not always successfully generalize to long reads with high indel error rates. We introduce two methods: one that runs nearly instantly and is very accurate for medium length reads and high template coverage, and another, slower method that is more robust when reads are very long or coverage is lower. On two Mock Virus Community datasets with ground truth, each sequenced on a different PacBio instrument, and on a number of simulated datasets, we compare our two approaches to each other and to existing algorithms. We outperform all tested methods in accuracy, with competitive run times even for our slower method, successfully discriminating templates that differ by a just single nucleotide. Julia implementations of Fast Amplicon Denoising (FAD) and Robust Amplicon Denoising (RAD), and a webserver interface, are freely available. Oxford University Press 2019-10-10 2019-08-16 /pmc/articles/PMC6765106/ /pubmed/31418021 http://dx.doi.org/10.1093/nar/gkz657 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Kumar, Venkatesh Vollbrecht, Thomas Chernyshev, Mark Mohan, Sanjay Hanst, Brian Bavafa, Nicholas Lorenzo, Antonia Kumar, Nikesh Ketteringham, Robert Eren, Kemal Golden, Michael Oliveira, Michelli F Murrell, Ben Long-read amplicon denoising |
title | Long-read amplicon denoising |
title_full | Long-read amplicon denoising |
title_fullStr | Long-read amplicon denoising |
title_full_unstemmed | Long-read amplicon denoising |
title_short | Long-read amplicon denoising |
title_sort | long-read amplicon denoising |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6765106/ https://www.ncbi.nlm.nih.gov/pubmed/31418021 http://dx.doi.org/10.1093/nar/gkz657 |
work_keys_str_mv | AT kumarvenkatesh longreadamplicondenoising AT vollbrechtthomas longreadamplicondenoising AT chernyshevmark longreadamplicondenoising AT mohansanjay longreadamplicondenoising AT hanstbrian longreadamplicondenoising AT bavafanicholas longreadamplicondenoising AT lorenzoantonia longreadamplicondenoising AT kumarnikesh longreadamplicondenoising AT ketteringhamrobert longreadamplicondenoising AT erenkemal longreadamplicondenoising AT goldenmichael longreadamplicondenoising AT oliveiramichellif longreadamplicondenoising AT murrellben longreadamplicondenoising |