Cargando…
Metagenome fragment classification based on multiple motif-occurrence profiles
A vast amount of metagenomic data has been obtained by extracting multiple genomes simultaneously from microbial communities, including genomes from uncultivable microbes. By analyzing these metagenomic data, novel microbes are discovered and new microbial functions are elucidated. The first step in...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4157293/ https://www.ncbi.nlm.nih.gov/pubmed/25210663 http://dx.doi.org/10.7717/peerj.559 |
_version_ | 1782333855008555008 |
---|---|
author | Matsushita, Naoki Seno, Shigeto Takenaka, Yoichi Matsuda, Hideo |
author_facet | Matsushita, Naoki Seno, Shigeto Takenaka, Yoichi Matsuda, Hideo |
author_sort | Matsushita, Naoki |
collection | PubMed |
description | A vast amount of metagenomic data has been obtained by extracting multiple genomes simultaneously from microbial communities, including genomes from uncultivable microbes. By analyzing these metagenomic data, novel microbes are discovered and new microbial functions are elucidated. The first step in analyzing these data is sequenced-read classification into reference genomes from which each read can be derived. The Naïve Bayes Classifier is a method for this classification. To identify the derivation of the reads, this method calculates a score based on the occurrence of a DNA sequence motif in each reference genome. However, large differences in the sizes of the reference genomes can bias the scoring of the reads. This bias might cause erroneous classification and decrease the classification accuracy. To address this issue, we have updated the Naïve Bayes Classifier method using multiple sets of occurrence profiles for each reference genome by normalizing the genome sizes, dividing each genome sequence into a set of subsequences of similar length and generating profiles for each subsequence. This multiple profile strategy improves the accuracy of the results generated by the Naïve Bayes Classifier method for simulated and Sargasso Sea datasets. |
format | Online Article Text |
id | pubmed-4157293 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-41572932014-09-10 Metagenome fragment classification based on multiple motif-occurrence profiles Matsushita, Naoki Seno, Shigeto Takenaka, Yoichi Matsuda, Hideo PeerJ Bioinformatics A vast amount of metagenomic data has been obtained by extracting multiple genomes simultaneously from microbial communities, including genomes from uncultivable microbes. By analyzing these metagenomic data, novel microbes are discovered and new microbial functions are elucidated. The first step in analyzing these data is sequenced-read classification into reference genomes from which each read can be derived. The Naïve Bayes Classifier is a method for this classification. To identify the derivation of the reads, this method calculates a score based on the occurrence of a DNA sequence motif in each reference genome. However, large differences in the sizes of the reference genomes can bias the scoring of the reads. This bias might cause erroneous classification and decrease the classification accuracy. To address this issue, we have updated the Naïve Bayes Classifier method using multiple sets of occurrence profiles for each reference genome by normalizing the genome sizes, dividing each genome sequence into a set of subsequences of similar length and generating profiles for each subsequence. This multiple profile strategy improves the accuracy of the results generated by the Naïve Bayes Classifier method for simulated and Sargasso Sea datasets. PeerJ Inc. 2014-09-04 /pmc/articles/PMC4157293/ /pubmed/25210663 http://dx.doi.org/10.7717/peerj.559 Text en © 2014 Matsushita et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Matsushita, Naoki Seno, Shigeto Takenaka, Yoichi Matsuda, Hideo Metagenome fragment classification based on multiple motif-occurrence profiles |
title | Metagenome fragment classification based on multiple motif-occurrence profiles |
title_full | Metagenome fragment classification based on multiple motif-occurrence profiles |
title_fullStr | Metagenome fragment classification based on multiple motif-occurrence profiles |
title_full_unstemmed | Metagenome fragment classification based on multiple motif-occurrence profiles |
title_short | Metagenome fragment classification based on multiple motif-occurrence profiles |
title_sort | metagenome fragment classification based on multiple motif-occurrence profiles |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4157293/ https://www.ncbi.nlm.nih.gov/pubmed/25210663 http://dx.doi.org/10.7717/peerj.559 |
work_keys_str_mv | AT matsushitanaoki metagenomefragmentclassificationbasedonmultiplemotifoccurrenceprofiles AT senoshigeto metagenomefragmentclassificationbasedonmultiplemotifoccurrenceprofiles AT takenakayoichi metagenomefragmentclassificationbasedonmultiplemotifoccurrenceprofiles AT matsudahideo metagenomefragmentclassificationbasedonmultiplemotifoccurrenceprofiles |