Cargando…

Metagenome fragment classification based on multiple motif-occurrence profiles

A vast amount of metagenomic data has been obtained by extracting multiple genomes simultaneously from microbial communities, including genomes from uncultivable microbes. By analyzing these metagenomic data, novel microbes are discovered and new microbial functions are elucidated. The first step in...

Descripción completa

Detalles Bibliográficos
Autores principales: Matsushita, Naoki, Seno, Shigeto, Takenaka, Yoichi, Matsuda, Hideo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4157293/
https://www.ncbi.nlm.nih.gov/pubmed/25210663
http://dx.doi.org/10.7717/peerj.559
_version_ 1782333855008555008
author Matsushita, Naoki
Seno, Shigeto
Takenaka, Yoichi
Matsuda, Hideo
author_facet Matsushita, Naoki
Seno, Shigeto
Takenaka, Yoichi
Matsuda, Hideo
author_sort Matsushita, Naoki
collection PubMed
description A vast amount of metagenomic data has been obtained by extracting multiple genomes simultaneously from microbial communities, including genomes from uncultivable microbes. By analyzing these metagenomic data, novel microbes are discovered and new microbial functions are elucidated. The first step in analyzing these data is sequenced-read classification into reference genomes from which each read can be derived. The Naïve Bayes Classifier is a method for this classification. To identify the derivation of the reads, this method calculates a score based on the occurrence of a DNA sequence motif in each reference genome. However, large differences in the sizes of the reference genomes can bias the scoring of the reads. This bias might cause erroneous classification and decrease the classification accuracy. To address this issue, we have updated the Naïve Bayes Classifier method using multiple sets of occurrence profiles for each reference genome by normalizing the genome sizes, dividing each genome sequence into a set of subsequences of similar length and generating profiles for each subsequence. This multiple profile strategy improves the accuracy of the results generated by the Naïve Bayes Classifier method for simulated and Sargasso Sea datasets.
format Online
Article
Text
id pubmed-4157293
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-41572932014-09-10 Metagenome fragment classification based on multiple motif-occurrence profiles Matsushita, Naoki Seno, Shigeto Takenaka, Yoichi Matsuda, Hideo PeerJ Bioinformatics A vast amount of metagenomic data has been obtained by extracting multiple genomes simultaneously from microbial communities, including genomes from uncultivable microbes. By analyzing these metagenomic data, novel microbes are discovered and new microbial functions are elucidated. The first step in analyzing these data is sequenced-read classification into reference genomes from which each read can be derived. The Naïve Bayes Classifier is a method for this classification. To identify the derivation of the reads, this method calculates a score based on the occurrence of a DNA sequence motif in each reference genome. However, large differences in the sizes of the reference genomes can bias the scoring of the reads. This bias might cause erroneous classification and decrease the classification accuracy. To address this issue, we have updated the Naïve Bayes Classifier method using multiple sets of occurrence profiles for each reference genome by normalizing the genome sizes, dividing each genome sequence into a set of subsequences of similar length and generating profiles for each subsequence. This multiple profile strategy improves the accuracy of the results generated by the Naïve Bayes Classifier method for simulated and Sargasso Sea datasets. PeerJ Inc. 2014-09-04 /pmc/articles/PMC4157293/ /pubmed/25210663 http://dx.doi.org/10.7717/peerj.559 Text en © 2014 Matsushita et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Matsushita, Naoki
Seno, Shigeto
Takenaka, Yoichi
Matsuda, Hideo
Metagenome fragment classification based on multiple motif-occurrence profiles
title Metagenome fragment classification based on multiple motif-occurrence profiles
title_full Metagenome fragment classification based on multiple motif-occurrence profiles
title_fullStr Metagenome fragment classification based on multiple motif-occurrence profiles
title_full_unstemmed Metagenome fragment classification based on multiple motif-occurrence profiles
title_short Metagenome fragment classification based on multiple motif-occurrence profiles
title_sort metagenome fragment classification based on multiple motif-occurrence profiles
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4157293/
https://www.ncbi.nlm.nih.gov/pubmed/25210663
http://dx.doi.org/10.7717/peerj.559
work_keys_str_mv AT matsushitanaoki metagenomefragmentclassificationbasedonmultiplemotifoccurrenceprofiles
AT senoshigeto metagenomefragmentclassificationbasedonmultiplemotifoccurrenceprofiles
AT takenakayoichi metagenomefragmentclassificationbasedonmultiplemotifoccurrenceprofiles
AT matsudahideo metagenomefragmentclassificationbasedonmultiplemotifoccurrenceprofiles