Cargando…

Long-read amplicon denoising

Long-read next-generation amplicon sequencing shows promise for studying complete genes or genomes from complex and diverse populations. Current long-read sequencing technologies have challenging error profiles, hindering data processing and incorporation into downstream analyses. Here we consider t...

Descripción completa

Detalles Bibliográficos
Autores principales: Kumar, Venkatesh, Vollbrecht, Thomas, Chernyshev, Mark, Mohan, Sanjay, Hanst, Brian, Bavafa, Nicholas, Lorenzo, Antonia, Kumar, Nikesh, Ketteringham, Robert, Eren, Kemal, Golden, Michael, Oliveira, Michelli F, Murrell, Ben
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6765106/
https://www.ncbi.nlm.nih.gov/pubmed/31418021
http://dx.doi.org/10.1093/nar/gkz657
_version_ 1783454502611845120
author Kumar, Venkatesh
Vollbrecht, Thomas
Chernyshev, Mark
Mohan, Sanjay
Hanst, Brian
Bavafa, Nicholas
Lorenzo, Antonia
Kumar, Nikesh
Ketteringham, Robert
Eren, Kemal
Golden, Michael
Oliveira, Michelli F
Murrell, Ben
author_facet Kumar, Venkatesh
Vollbrecht, Thomas
Chernyshev, Mark
Mohan, Sanjay
Hanst, Brian
Bavafa, Nicholas
Lorenzo, Antonia
Kumar, Nikesh
Ketteringham, Robert
Eren, Kemal
Golden, Michael
Oliveira, Michelli F
Murrell, Ben
author_sort Kumar, Venkatesh
collection PubMed
description Long-read next-generation amplicon sequencing shows promise for studying complete genes or genomes from complex and diverse populations. Current long-read sequencing technologies have challenging error profiles, hindering data processing and incorporation into downstream analyses. Here we consider the problem of how to reconstruct, free of sequencing error, the true sequence variants and their associated frequencies from PacBio reads. Called ‘amplicon denoising’, this problem has been extensively studied for short-read sequencing technologies, but current solutions do not always successfully generalize to long reads with high indel error rates. We introduce two methods: one that runs nearly instantly and is very accurate for medium length reads and high template coverage, and another, slower method that is more robust when reads are very long or coverage is lower. On two Mock Virus Community datasets with ground truth, each sequenced on a different PacBio instrument, and on a number of simulated datasets, we compare our two approaches to each other and to existing algorithms. We outperform all tested methods in accuracy, with competitive run times even for our slower method, successfully discriminating templates that differ by a just single nucleotide. Julia implementations of Fast Amplicon Denoising (FAD) and Robust Amplicon Denoising (RAD), and a webserver interface, are freely available.
format Online
Article
Text
id pubmed-6765106
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-67651062019-10-02 Long-read amplicon denoising Kumar, Venkatesh Vollbrecht, Thomas Chernyshev, Mark Mohan, Sanjay Hanst, Brian Bavafa, Nicholas Lorenzo, Antonia Kumar, Nikesh Ketteringham, Robert Eren, Kemal Golden, Michael Oliveira, Michelli F Murrell, Ben Nucleic Acids Res Methods Online Long-read next-generation amplicon sequencing shows promise for studying complete genes or genomes from complex and diverse populations. Current long-read sequencing technologies have challenging error profiles, hindering data processing and incorporation into downstream analyses. Here we consider the problem of how to reconstruct, free of sequencing error, the true sequence variants and their associated frequencies from PacBio reads. Called ‘amplicon denoising’, this problem has been extensively studied for short-read sequencing technologies, but current solutions do not always successfully generalize to long reads with high indel error rates. We introduce two methods: one that runs nearly instantly and is very accurate for medium length reads and high template coverage, and another, slower method that is more robust when reads are very long or coverage is lower. On two Mock Virus Community datasets with ground truth, each sequenced on a different PacBio instrument, and on a number of simulated datasets, we compare our two approaches to each other and to existing algorithms. We outperform all tested methods in accuracy, with competitive run times even for our slower method, successfully discriminating templates that differ by a just single nucleotide. Julia implementations of Fast Amplicon Denoising (FAD) and Robust Amplicon Denoising (RAD), and a webserver interface, are freely available. Oxford University Press 2019-10-10 2019-08-16 /pmc/articles/PMC6765106/ /pubmed/31418021 http://dx.doi.org/10.1093/nar/gkz657 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Kumar, Venkatesh
Vollbrecht, Thomas
Chernyshev, Mark
Mohan, Sanjay
Hanst, Brian
Bavafa, Nicholas
Lorenzo, Antonia
Kumar, Nikesh
Ketteringham, Robert
Eren, Kemal
Golden, Michael
Oliveira, Michelli F
Murrell, Ben
Long-read amplicon denoising
title Long-read amplicon denoising
title_full Long-read amplicon denoising
title_fullStr Long-read amplicon denoising
title_full_unstemmed Long-read amplicon denoising
title_short Long-read amplicon denoising
title_sort long-read amplicon denoising
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6765106/
https://www.ncbi.nlm.nih.gov/pubmed/31418021
http://dx.doi.org/10.1093/nar/gkz657
work_keys_str_mv AT kumarvenkatesh longreadamplicondenoising
AT vollbrechtthomas longreadamplicondenoising
AT chernyshevmark longreadamplicondenoising
AT mohansanjay longreadamplicondenoising
AT hanstbrian longreadamplicondenoising
AT bavafanicholas longreadamplicondenoising
AT lorenzoantonia longreadamplicondenoising
AT kumarnikesh longreadamplicondenoising
AT ketteringhamrobert longreadamplicondenoising
AT erenkemal longreadamplicondenoising
AT goldenmichael longreadamplicondenoising
AT oliveiramichellif longreadamplicondenoising
AT murrellben longreadamplicondenoising