Cargando…

MLgsc: A Maximum-Likelihood General Sequence Classifier

We present software package for classifying protein or nucleotide sequences to user-specified sets of reference sequences. The software trains a model using a multiple sequence alignment and a phylogenetic tree, both supplied by the user. The latter is used to guide model construction and as a decis...

Descripción completa

Detalles Bibliográficos
Autores principales: Junier, Thomas, Hervé, Vincent, Wunderlin, Tina, Junier, Pilar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4492669/
https://www.ncbi.nlm.nih.gov/pubmed/26148002
http://dx.doi.org/10.1371/journal.pone.0129384
_version_ 1782379804446687232
author Junier, Thomas
Hervé, Vincent
Wunderlin, Tina
Junier, Pilar
author_facet Junier, Thomas
Hervé, Vincent
Wunderlin, Tina
Junier, Pilar
author_sort Junier, Thomas
collection PubMed
description We present software package for classifying protein or nucleotide sequences to user-specified sets of reference sequences. The software trains a model using a multiple sequence alignment and a phylogenetic tree, both supplied by the user. The latter is used to guide model construction and as a decision tree to speed up the classification process. The software was evaluated on all the 16S rRNA gene sequences of the reference dataset found in the GreenGenes database. On this dataset, the software was shown to achieve an error rate of around 1% at genus level. Examples of applications based on the nitrogenase subunit NifH gene and a protein-coding gene found in endospore-forming Firmicutes is also presented. The programs in the package have a simple, straightforward command-line interface for the Unix shell, and are free and open-source. The package has minimal dependencies and thus can be easily integrated in command-line based classification pipelines.
format Online
Article
Text
id pubmed-4492669
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44926692015-07-15 MLgsc: A Maximum-Likelihood General Sequence Classifier Junier, Thomas Hervé, Vincent Wunderlin, Tina Junier, Pilar PLoS One Research Article We present software package for classifying protein or nucleotide sequences to user-specified sets of reference sequences. The software trains a model using a multiple sequence alignment and a phylogenetic tree, both supplied by the user. The latter is used to guide model construction and as a decision tree to speed up the classification process. The software was evaluated on all the 16S rRNA gene sequences of the reference dataset found in the GreenGenes database. On this dataset, the software was shown to achieve an error rate of around 1% at genus level. Examples of applications based on the nitrogenase subunit NifH gene and a protein-coding gene found in endospore-forming Firmicutes is also presented. The programs in the package have a simple, straightforward command-line interface for the Unix shell, and are free and open-source. The package has minimal dependencies and thus can be easily integrated in command-line based classification pipelines. Public Library of Science 2015-07-06 /pmc/articles/PMC4492669/ /pubmed/26148002 http://dx.doi.org/10.1371/journal.pone.0129384 Text en © 2015 Junier et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Junier, Thomas
Hervé, Vincent
Wunderlin, Tina
Junier, Pilar
MLgsc: A Maximum-Likelihood General Sequence Classifier
title MLgsc: A Maximum-Likelihood General Sequence Classifier
title_full MLgsc: A Maximum-Likelihood General Sequence Classifier
title_fullStr MLgsc: A Maximum-Likelihood General Sequence Classifier
title_full_unstemmed MLgsc: A Maximum-Likelihood General Sequence Classifier
title_short MLgsc: A Maximum-Likelihood General Sequence Classifier
title_sort mlgsc: a maximum-likelihood general sequence classifier
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4492669/
https://www.ncbi.nlm.nih.gov/pubmed/26148002
http://dx.doi.org/10.1371/journal.pone.0129384
work_keys_str_mv AT junierthomas mlgscamaximumlikelihoodgeneralsequenceclassifier
AT hervevincent mlgscamaximumlikelihoodgeneralsequenceclassifier
AT wunderlintina mlgscamaximumlikelihoodgeneralsequenceclassifier
AT junierpilar mlgscamaximumlikelihoodgeneralsequenceclassifier