Cargando…

Metagenome fragment classification based on multiple motif-occurrence profiles

A vast amount of metagenomic data has been obtained by extracting multiple genomes simultaneously from microbial communities, including genomes from uncultivable microbes. By analyzing these metagenomic data, novel microbes are discovered and new microbial functions are elucidated. The first step in...

Descripción completa

Detalles Bibliográficos
Autores principales: Matsushita, Naoki, Seno, Shigeto, Takenaka, Yoichi, Matsuda, Hideo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4157293/
https://www.ncbi.nlm.nih.gov/pubmed/25210663
http://dx.doi.org/10.7717/peerj.559
Descripción
Sumario:A vast amount of metagenomic data has been obtained by extracting multiple genomes simultaneously from microbial communities, including genomes from uncultivable microbes. By analyzing these metagenomic data, novel microbes are discovered and new microbial functions are elucidated. The first step in analyzing these data is sequenced-read classification into reference genomes from which each read can be derived. The Naïve Bayes Classifier is a method for this classification. To identify the derivation of the reads, this method calculates a score based on the occurrence of a DNA sequence motif in each reference genome. However, large differences in the sizes of the reference genomes can bias the scoring of the reads. This bias might cause erroneous classification and decrease the classification accuracy. To address this issue, we have updated the Naïve Bayes Classifier method using multiple sets of occurrence profiles for each reference genome by normalizing the genome sizes, dividing each genome sequence into a set of subsequences of similar length and generating profiles for each subsequence. This multiple profile strategy improves the accuracy of the results generated by the Naïve Bayes Classifier method for simulated and Sargasso Sea datasets.