Cargando…

Kelpie: generating full-length ‘amplicons’ from whole-metagenome datasets

INTRODUCTION: Whole-metagenome sequencing can be a rich source of information about the structure and function of entire metagenomic communities, but getting accurate and reliable results from these datasets can be challenging. Analysis of these datasets is founded on the mapping of sequencing reads...

Descripción completa

Detalles Bibliográficos
Autores principales: Greenfield, Paul, Tran-Dinh, Nai, Midgley, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6359901/
https://www.ncbi.nlm.nih.gov/pubmed/30723610
http://dx.doi.org/10.7717/peerj.6174
_version_ 1783392381474701312
author Greenfield, Paul
Tran-Dinh, Nai
Midgley, David
author_facet Greenfield, Paul
Tran-Dinh, Nai
Midgley, David
author_sort Greenfield, Paul
collection PubMed
description INTRODUCTION: Whole-metagenome sequencing can be a rich source of information about the structure and function of entire metagenomic communities, but getting accurate and reliable results from these datasets can be challenging. Analysis of these datasets is founded on the mapping of sequencing reads onto known genomic regions from known organisms, but short reads will often map equally well to multiple regions, and to multiple reference organisms. Assembling metagenomic datasets prior to mapping can generate much longer and more precisely mappable sequences but the presence of closely related organisms and highly conserved regions makes metagenomic assembly challenging, and some regions of particular interest can assemble poorly. One solution to these problems is to use specialised tools, such as Kelpie, that can accurately extract and assemble full-length sequences for defined genomic regions from whole-metagenome datasets. METHODS: Kelpie is a kMer-based tool that generates full-length amplicon-like sequences from whole-metagenome datasets. It takes a pair of primer sequences and a set of metagenomic reads, and uses a combination of kMer filtering, error correction and assembly techniques to construct sets of full-length inter-primer sequences. RESULTS: The effectiveness of Kelpie is demonstrated here through the extraction and assembly of full-length ribosomal marker gene regions, as this allows comparisons with conventional amplicon sequencing and published metagenomic benchmarks. The results show that the Kelpie-generated sequences and community profiles closely match those produced by amplicon sequencing, down to low abundance levels, and running Kelpie on the synthetic CAMI metagenomic benchmarking datasets shows similar high levels of both precision and recall. CONCLUSIONS: Kelpie can be thought of as being somewhat like an in-silico PCR tool, taking a primer pair and producing the resulting ‘amplicons’ from a whole-metagenome dataset. Marker regions from the 16S rRNA gene were used here as an example because this allowed the overall accuracy of Kelpie to be evaluated through comparisons with other datasets, approaches and benchmarks. Kelpie is not limited to this application though, and can be used to extract and assemble any genomic region present in a whole metagenome dataset, as long as it is bound by a pairs of highly conserved primer sequences.
format Online
Article
Text
id pubmed-6359901
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-63599012019-02-05 Kelpie: generating full-length ‘amplicons’ from whole-metagenome datasets Greenfield, Paul Tran-Dinh, Nai Midgley, David PeerJ Bioinformatics INTRODUCTION: Whole-metagenome sequencing can be a rich source of information about the structure and function of entire metagenomic communities, but getting accurate and reliable results from these datasets can be challenging. Analysis of these datasets is founded on the mapping of sequencing reads onto known genomic regions from known organisms, but short reads will often map equally well to multiple regions, and to multiple reference organisms. Assembling metagenomic datasets prior to mapping can generate much longer and more precisely mappable sequences but the presence of closely related organisms and highly conserved regions makes metagenomic assembly challenging, and some regions of particular interest can assemble poorly. One solution to these problems is to use specialised tools, such as Kelpie, that can accurately extract and assemble full-length sequences for defined genomic regions from whole-metagenome datasets. METHODS: Kelpie is a kMer-based tool that generates full-length amplicon-like sequences from whole-metagenome datasets. It takes a pair of primer sequences and a set of metagenomic reads, and uses a combination of kMer filtering, error correction and assembly techniques to construct sets of full-length inter-primer sequences. RESULTS: The effectiveness of Kelpie is demonstrated here through the extraction and assembly of full-length ribosomal marker gene regions, as this allows comparisons with conventional amplicon sequencing and published metagenomic benchmarks. The results show that the Kelpie-generated sequences and community profiles closely match those produced by amplicon sequencing, down to low abundance levels, and running Kelpie on the synthetic CAMI metagenomic benchmarking datasets shows similar high levels of both precision and recall. CONCLUSIONS: Kelpie can be thought of as being somewhat like an in-silico PCR tool, taking a primer pair and producing the resulting ‘amplicons’ from a whole-metagenome dataset. Marker regions from the 16S rRNA gene were used here as an example because this allowed the overall accuracy of Kelpie to be evaluated through comparisons with other datasets, approaches and benchmarks. Kelpie is not limited to this application though, and can be used to extract and assemble any genomic region present in a whole metagenome dataset, as long as it is bound by a pairs of highly conserved primer sequences. PeerJ Inc. 2019-01-30 /pmc/articles/PMC6359901/ /pubmed/30723610 http://dx.doi.org/10.7717/peerj.6174 Text en ©2019 Greenfield et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Greenfield, Paul
Tran-Dinh, Nai
Midgley, David
Kelpie: generating full-length ‘amplicons’ from whole-metagenome datasets
title Kelpie: generating full-length ‘amplicons’ from whole-metagenome datasets
title_full Kelpie: generating full-length ‘amplicons’ from whole-metagenome datasets
title_fullStr Kelpie: generating full-length ‘amplicons’ from whole-metagenome datasets
title_full_unstemmed Kelpie: generating full-length ‘amplicons’ from whole-metagenome datasets
title_short Kelpie: generating full-length ‘amplicons’ from whole-metagenome datasets
title_sort kelpie: generating full-length ‘amplicons’ from whole-metagenome datasets
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6359901/
https://www.ncbi.nlm.nih.gov/pubmed/30723610
http://dx.doi.org/10.7717/peerj.6174
work_keys_str_mv AT greenfieldpaul kelpiegeneratingfulllengthampliconsfromwholemetagenomedatasets
AT trandinhnai kelpiegeneratingfulllengthampliconsfromwholemetagenomedatasets
AT midgleydavid kelpiegeneratingfulllengthampliconsfromwholemetagenomedatasets