Cargando…

Using pseudoalignment and base quality to accurately quantify microbial community composition

Pooled DNA from multiple unknown organisms arises in a variety of contexts, for example microbial samples from ecological or human health research. Determining the composition of pooled samples can be difficult, especially at the scale of modern sequencing data and reference databases. Here we propo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Reppell, Mark, Novembre, John
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2018
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5945057/ https://www.ncbi.nlm.nih.gov/pubmed/29659582 http://dx.doi.org/10.1371/journal.pcbi.1006096

_version_	1783321936452911104
author	Reppell, Mark Novembre, John
author_facet	Reppell, Mark Novembre, John
author_sort	Reppell, Mark
collection	PubMed
description	Pooled DNA from multiple unknown organisms arises in a variety of contexts, for example microbial samples from ecological or human health research. Determining the composition of pooled samples can be difficult, especially at the scale of modern sequencing data and reference databases. Here we propose a novel method for taxonomic profiling in pooled DNA that combines the speed and low-memory requirements of k-mer based pseudoalignment with a likelihood framework that uses base quality information to better resolve multiply mapped reads. We apply the method to the problem of classifying 16S rRNA reads using a reference database of known organisms, a common challenge in microbiome research. Using simulations, we show the method is accurate across a variety of read lengths, with different length reference sequences, at different sample depths, and when samples contain reads originating from organisms absent from the reference. We also assess performance in real 16S data, where we reanalyze previous genetic association data to show our method discovers a larger number of quantitative trait associations than other widely used methods. We implement our method in the software Karp, for k-mer based analysis of read pools, to provide a novel combination of speed and accuracy that is uniquely suited for enhancing discoveries in microbial studies.
format	Online Article Text
id	pubmed-5945057
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-59450572018-05-25 Using pseudoalignment and base quality to accurately quantify microbial community composition Reppell, Mark Novembre, John PLoS Comput Biol Research Article Pooled DNA from multiple unknown organisms arises in a variety of contexts, for example microbial samples from ecological or human health research. Determining the composition of pooled samples can be difficult, especially at the scale of modern sequencing data and reference databases. Here we propose a novel method for taxonomic profiling in pooled DNA that combines the speed and low-memory requirements of k-mer based pseudoalignment with a likelihood framework that uses base quality information to better resolve multiply mapped reads. We apply the method to the problem of classifying 16S rRNA reads using a reference database of known organisms, a common challenge in microbiome research. Using simulations, we show the method is accurate across a variety of read lengths, with different length reference sequences, at different sample depths, and when samples contain reads originating from organisms absent from the reference. We also assess performance in real 16S data, where we reanalyze previous genetic association data to show our method discovers a larger number of quantitative trait associations than other widely used methods. We implement our method in the software Karp, for k-mer based analysis of read pools, to provide a novel combination of speed and accuracy that is uniquely suited for enhancing discoveries in microbial studies. Public Library of Science 2018-04-16 /pmc/articles/PMC5945057/ /pubmed/29659582 http://dx.doi.org/10.1371/journal.pcbi.1006096 Text en © 2018 Reppell, Novembre http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Reppell, Mark Novembre, John Using pseudoalignment and base quality to accurately quantify microbial community composition
title	Using pseudoalignment and base quality to accurately quantify microbial community composition
title_full	Using pseudoalignment and base quality to accurately quantify microbial community composition
title_fullStr	Using pseudoalignment and base quality to accurately quantify microbial community composition
title_full_unstemmed	Using pseudoalignment and base quality to accurately quantify microbial community composition
title_short	Using pseudoalignment and base quality to accurately quantify microbial community composition
title_sort	using pseudoalignment and base quality to accurately quantify microbial community composition
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5945057/ https://www.ncbi.nlm.nih.gov/pubmed/29659582 http://dx.doi.org/10.1371/journal.pcbi.1006096
work_keys_str_mv	AT reppellmark usingpseudoalignmentandbasequalitytoaccuratelyquantifymicrobialcommunitycomposition AT novembrejohn usingpseudoalignmentandbasequalitytoaccuratelyquantifymicrobialcommunitycomposition

Using pseudoalignment and base quality to accurately quantify microbial community composition

Ejemplares similares