Cargando…
Testing the advantages and disadvantages of short- and long- read eukaryotic metagenomics using simulated reads
BACKGROUND: The first step in understanding ecological community diversity and dynamics is quantifying community membership. An increasingly common method for doing so is through metagenomics. Because of the rapidly increasing popularity of this approach, a large number of computational tools and pi...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7257156/ https://www.ncbi.nlm.nih.gov/pubmed/32471343 http://dx.doi.org/10.1186/s12859-020-3528-4 |
_version_ | 1783540035776151552 |
---|---|
author | Pearman, William S. Freed, Nikki E. Silander, Olin K. |
author_facet | Pearman, William S. Freed, Nikki E. Silander, Olin K. |
author_sort | Pearman, William S. |
collection | PubMed |
description | BACKGROUND: The first step in understanding ecological community diversity and dynamics is quantifying community membership. An increasingly common method for doing so is through metagenomics. Because of the rapidly increasing popularity of this approach, a large number of computational tools and pipelines are available for analysing metagenomic data. However, the majority of these tools have been designed and benchmarked using highly accurate short read data (i.e. Illumina), with few studies benchmarking classification accuracy for long error-prone reads (PacBio or Oxford Nanopore). In addition, few tools have been benchmarked for non-microbial communities. RESULTS: Here we compare simulated long reads from Oxford Nanopore and Pacific Biosciences (PacBio) with high accuracy Illumina read sets to systematically investigate the effects of sequence length and taxon type on classification accuracy for metagenomic data from both microbial and non-microbial communities. We show that very generally, classification accuracy is far lower for non-microbial communities, even at low taxonomic resolution (e.g. family rather than genus). We then show that for two popular taxonomic classifiers, long reads can significantly increase classification accuracy, and this is most pronounced for non-microbial communities. CONCLUSIONS: This work provides insight on the expected accuracy for metagenomic analyses for different taxonomic groups, and establishes the point at which read length becomes more important than error rate for assigning the correct taxon. |
format | Online Article Text |
id | pubmed-7257156 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-72571562020-06-07 Testing the advantages and disadvantages of short- and long- read eukaryotic metagenomics using simulated reads Pearman, William S. Freed, Nikki E. Silander, Olin K. BMC Bioinformatics Research Article BACKGROUND: The first step in understanding ecological community diversity and dynamics is quantifying community membership. An increasingly common method for doing so is through metagenomics. Because of the rapidly increasing popularity of this approach, a large number of computational tools and pipelines are available for analysing metagenomic data. However, the majority of these tools have been designed and benchmarked using highly accurate short read data (i.e. Illumina), with few studies benchmarking classification accuracy for long error-prone reads (PacBio or Oxford Nanopore). In addition, few tools have been benchmarked for non-microbial communities. RESULTS: Here we compare simulated long reads from Oxford Nanopore and Pacific Biosciences (PacBio) with high accuracy Illumina read sets to systematically investigate the effects of sequence length and taxon type on classification accuracy for metagenomic data from both microbial and non-microbial communities. We show that very generally, classification accuracy is far lower for non-microbial communities, even at low taxonomic resolution (e.g. family rather than genus). We then show that for two popular taxonomic classifiers, long reads can significantly increase classification accuracy, and this is most pronounced for non-microbial communities. CONCLUSIONS: This work provides insight on the expected accuracy for metagenomic analyses for different taxonomic groups, and establishes the point at which read length becomes more important than error rate for assigning the correct taxon. BioMed Central 2020-05-29 /pmc/articles/PMC7257156/ /pubmed/32471343 http://dx.doi.org/10.1186/s12859-020-3528-4 Text en © The Author(s). 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Pearman, William S. Freed, Nikki E. Silander, Olin K. Testing the advantages and disadvantages of short- and long- read eukaryotic metagenomics using simulated reads |
title | Testing the advantages and disadvantages of short- and long- read eukaryotic metagenomics using simulated reads |
title_full | Testing the advantages and disadvantages of short- and long- read eukaryotic metagenomics using simulated reads |
title_fullStr | Testing the advantages and disadvantages of short- and long- read eukaryotic metagenomics using simulated reads |
title_full_unstemmed | Testing the advantages and disadvantages of short- and long- read eukaryotic metagenomics using simulated reads |
title_short | Testing the advantages and disadvantages of short- and long- read eukaryotic metagenomics using simulated reads |
title_sort | testing the advantages and disadvantages of short- and long- read eukaryotic metagenomics using simulated reads |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7257156/ https://www.ncbi.nlm.nih.gov/pubmed/32471343 http://dx.doi.org/10.1186/s12859-020-3528-4 |
work_keys_str_mv | AT pearmanwilliams testingtheadvantagesanddisadvantagesofshortandlongreadeukaryoticmetagenomicsusingsimulatedreads AT freednikkie testingtheadvantagesanddisadvantagesofshortandlongreadeukaryoticmetagenomicsusingsimulatedreads AT silanderolink testingtheadvantagesanddisadvantagesofshortandlongreadeukaryoticmetagenomicsusingsimulatedreads |