Cargando…

SPANNER: taxonomic assignment of sequences using pyramid matching of similarity profiles

Background: Homology-based taxonomic assignment is impeded by differences between the unassigned read and reference database, forcing a rank-specific classification to the closest (and possibly incorrect) reference lineage. This assignment may be correct only to a general rank (e.g. order) and incor...

Descripción completa

Detalles Bibliográficos
Autores principales: Porter, Michael S., Beiko, Robert G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3712219/
https://www.ncbi.nlm.nih.gov/pubmed/23732273
http://dx.doi.org/10.1093/bioinformatics/btt313
_version_ 1782277042681675776
author Porter, Michael S.
Beiko, Robert G.
author_facet Porter, Michael S.
Beiko, Robert G.
author_sort Porter, Michael S.
collection PubMed
description Background: Homology-based taxonomic assignment is impeded by differences between the unassigned read and reference database, forcing a rank-specific classification to the closest (and possibly incorrect) reference lineage. This assignment may be correct only to a general rank (e.g. order) and incorrect below that rank (e.g. family and genus). Algorithms like LCA avoid this by varying the predicted taxonomic rank based on matches to a set of taxonomic references. LCA and related approaches can be conservative, especially if best matches are taxonomically widespread because of events such as lateral gene transfer (LGT). Results: Our extension to LCA called SPANNER (similarity profile annotater) uses the set of best homology matches (the LCA Profile) for a given sequence and compares this profile with a set of profiles inferred from taxonomic reference organisms. SPANNER provides an assignment that is less sensitive to LGT and other confounding phenomena. In a series of trials on real and artificial datasets, SPANNER outperformed LCA-style algorithms in terms of taxonomic precision and outperformed best BLAST at certain levels of taxonomic novelty in the dataset. We identify examples where LCA made an overly conservative prediction, but SPANNER produced a more precise and correct prediction. Conclusions: By using profiles of homology matches to represent patterns of genomic similarity that arise because of vertical and lateral inheritance, SPANNER offers an effective compromise between taxonomic assignment based on best BLAST scores, and the conservative approach of LCA and similar approaches. Availability: C++ source code and binaries are freely available at http://kiwi.cs.dal.ca/Software/SPANNER. Contact: beiko@cs.dal.ca Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-3712219
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-37122192013-07-17 SPANNER: taxonomic assignment of sequences using pyramid matching of similarity profiles Porter, Michael S. Beiko, Robert G. Bioinformatics Original Papers Background: Homology-based taxonomic assignment is impeded by differences between the unassigned read and reference database, forcing a rank-specific classification to the closest (and possibly incorrect) reference lineage. This assignment may be correct only to a general rank (e.g. order) and incorrect below that rank (e.g. family and genus). Algorithms like LCA avoid this by varying the predicted taxonomic rank based on matches to a set of taxonomic references. LCA and related approaches can be conservative, especially if best matches are taxonomically widespread because of events such as lateral gene transfer (LGT). Results: Our extension to LCA called SPANNER (similarity profile annotater) uses the set of best homology matches (the LCA Profile) for a given sequence and compares this profile with a set of profiles inferred from taxonomic reference organisms. SPANNER provides an assignment that is less sensitive to LGT and other confounding phenomena. In a series of trials on real and artificial datasets, SPANNER outperformed LCA-style algorithms in terms of taxonomic precision and outperformed best BLAST at certain levels of taxonomic novelty in the dataset. We identify examples where LCA made an overly conservative prediction, but SPANNER produced a more precise and correct prediction. Conclusions: By using profiles of homology matches to represent patterns of genomic similarity that arise because of vertical and lateral inheritance, SPANNER offers an effective compromise between taxonomic assignment based on best BLAST scores, and the conservative approach of LCA and similar approaches. Availability: C++ source code and binaries are freely available at http://kiwi.cs.dal.ca/Software/SPANNER. Contact: beiko@cs.dal.ca Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2013-08-01 2013-06-03 /pmc/articles/PMC3712219/ /pubmed/23732273 http://dx.doi.org/10.1093/bioinformatics/btt313 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Porter, Michael S.
Beiko, Robert G.
SPANNER: taxonomic assignment of sequences using pyramid matching of similarity profiles
title SPANNER: taxonomic assignment of sequences using pyramid matching of similarity profiles
title_full SPANNER: taxonomic assignment of sequences using pyramid matching of similarity profiles
title_fullStr SPANNER: taxonomic assignment of sequences using pyramid matching of similarity profiles
title_full_unstemmed SPANNER: taxonomic assignment of sequences using pyramid matching of similarity profiles
title_short SPANNER: taxonomic assignment of sequences using pyramid matching of similarity profiles
title_sort spanner: taxonomic assignment of sequences using pyramid matching of similarity profiles
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3712219/
https://www.ncbi.nlm.nih.gov/pubmed/23732273
http://dx.doi.org/10.1093/bioinformatics/btt313
work_keys_str_mv AT portermichaels spannertaxonomicassignmentofsequencesusingpyramidmatchingofsimilarityprofiles
AT beikorobertg spannertaxonomicassignmentofsequencesusingpyramidmatchingofsimilarityprofiles