Cargando…

MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads

An increasing amount of species and gene identification studies rely on the use of next generation sequence analysis of either single isolate or metagenomics samples. Several methods are available to perform taxonomic annotations and a previous metagenomics benchmark study has shown that a vast numb...

Descripción completa

Detalles Bibliográficos
Autores principales: Petersen, Thomas Nordahl, Lukjancenko, Oksana, Thomsen, Martin Christen Frølund, Maddalena Sperotto, Maria, Lund, Ole, Møller Aarestrup, Frank, Sicheritz-Pontén, Thomas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5415185/
https://www.ncbi.nlm.nih.gov/pubmed/28467460
http://dx.doi.org/10.1371/journal.pone.0176469
_version_ 1783233485715013632
author Petersen, Thomas Nordahl
Lukjancenko, Oksana
Thomsen, Martin Christen Frølund
Maddalena Sperotto, Maria
Lund, Ole
Møller Aarestrup, Frank
Sicheritz-Pontén, Thomas
author_facet Petersen, Thomas Nordahl
Lukjancenko, Oksana
Thomsen, Martin Christen Frølund
Maddalena Sperotto, Maria
Lund, Ole
Møller Aarestrup, Frank
Sicheritz-Pontén, Thomas
author_sort Petersen, Thomas Nordahl
collection PubMed
description An increasing amount of species and gene identification studies rely on the use of next generation sequence analysis of either single isolate or metagenomics samples. Several methods are available to perform taxonomic annotations and a previous metagenomics benchmark study has shown that a vast number of false positive species annotations are a problem unless thresholds or post-processing are applied to differentiate between correct and false annotations. MGmapper is a package to process raw next generation sequence data and perform reference based sequence assignment, followed by a post-processing analysis to produce reliable taxonomy annotation at species and strain level resolution. An in-vitro bacterial mock community sample comprised of 8 genuses, 11 species and 12 strains was previously used to benchmark metagenomics classification methods. After applying a post-processing filter, we obtained 100% correct taxonomy assignments at species and genus level. A sensitivity and precision at 75% was obtained for strain level annotations. A comparison between MGmapper and Kraken at species level, shows MGmapper assigns taxonomy at species level using 84.8% of the sequence reads, compared to 70.5% for Kraken and both methods identified all species with no false positives. Extensive read count statistics are provided in plain text and excel sheets for both rejected and accepted taxonomy annotations. The use of custom databases is possible for the command-line version of MGmapper, and the complete pipeline is freely available as a bitbucked package (https://bitbucket.org/genomicepidemiology/mgmapper). A web-version (https://cge.cbs.dtu.dk/services/MGmapper) provides the basic functionality for analysis of small fastq datasets.
format Online
Article
Text
id pubmed-5415185
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-54151852017-05-14 MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads Petersen, Thomas Nordahl Lukjancenko, Oksana Thomsen, Martin Christen Frølund Maddalena Sperotto, Maria Lund, Ole Møller Aarestrup, Frank Sicheritz-Pontén, Thomas PLoS One Research Article An increasing amount of species and gene identification studies rely on the use of next generation sequence analysis of either single isolate or metagenomics samples. Several methods are available to perform taxonomic annotations and a previous metagenomics benchmark study has shown that a vast number of false positive species annotations are a problem unless thresholds or post-processing are applied to differentiate between correct and false annotations. MGmapper is a package to process raw next generation sequence data and perform reference based sequence assignment, followed by a post-processing analysis to produce reliable taxonomy annotation at species and strain level resolution. An in-vitro bacterial mock community sample comprised of 8 genuses, 11 species and 12 strains was previously used to benchmark metagenomics classification methods. After applying a post-processing filter, we obtained 100% correct taxonomy assignments at species and genus level. A sensitivity and precision at 75% was obtained for strain level annotations. A comparison between MGmapper and Kraken at species level, shows MGmapper assigns taxonomy at species level using 84.8% of the sequence reads, compared to 70.5% for Kraken and both methods identified all species with no false positives. Extensive read count statistics are provided in plain text and excel sheets for both rejected and accepted taxonomy annotations. The use of custom databases is possible for the command-line version of MGmapper, and the complete pipeline is freely available as a bitbucked package (https://bitbucket.org/genomicepidemiology/mgmapper). A web-version (https://cge.cbs.dtu.dk/services/MGmapper) provides the basic functionality for analysis of small fastq datasets. Public Library of Science 2017-05-03 /pmc/articles/PMC5415185/ /pubmed/28467460 http://dx.doi.org/10.1371/journal.pone.0176469 Text en © 2017 Petersen et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Petersen, Thomas Nordahl
Lukjancenko, Oksana
Thomsen, Martin Christen Frølund
Maddalena Sperotto, Maria
Lund, Ole
Møller Aarestrup, Frank
Sicheritz-Pontén, Thomas
MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads
title MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads
title_full MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads
title_fullStr MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads
title_full_unstemmed MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads
title_short MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads
title_sort mgmapper: reference based mapping and taxonomy annotation of metagenomics sequence reads
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5415185/
https://www.ncbi.nlm.nih.gov/pubmed/28467460
http://dx.doi.org/10.1371/journal.pone.0176469
work_keys_str_mv AT petersenthomasnordahl mgmapperreferencebasedmappingandtaxonomyannotationofmetagenomicssequencereads
AT lukjancenkooksana mgmapperreferencebasedmappingandtaxonomyannotationofmetagenomicssequencereads
AT thomsenmartinchristenfrølund mgmapperreferencebasedmappingandtaxonomyannotationofmetagenomicssequencereads
AT maddalenasperottomaria mgmapperreferencebasedmappingandtaxonomyannotationofmetagenomicssequencereads
AT lundole mgmapperreferencebasedmappingandtaxonomyannotationofmetagenomicssequencereads
AT mølleraarestrupfrank mgmapperreferencebasedmappingandtaxonomyannotationofmetagenomicssequencereads
AT sicheritzpontenthomas mgmapperreferencebasedmappingandtaxonomyannotationofmetagenomicssequencereads