Cargando…

Technical Aspects of Nominal Partitions on Accuracy of Data Mining Classification of Intestinal Microbiota — Comparison between 7 Restriction Enzymes

The application of data mining analyses (DM) is effective for the quantitative classification of human intestinal microbiota (HIM). However, there remain various technical problems that must be overcome. This paper deals with the number of nominal partitions (NP) of the target dataset, which is a ma...

Descripción completa

Detalles Bibliográficos
Autores principales: KOBAYASHI, Toshio, FUJIWARA, Kenji
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Bioscience of Microbiota, Food and Health 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4098652/
https://www.ncbi.nlm.nih.gov/pubmed/25032086
http://dx.doi.org/10.12938/bmfh.33.129
Descripción
Sumario:The application of data mining analyses (DM) is effective for the quantitative classification of human intestinal microbiota (HIM). However, there remain various technical problems that must be overcome. This paper deals with the number of nominal partitions (NP) of the target dataset, which is a major technical problem. We used here terminal restriction fragment length polymorphism data, which was obtained from the feces of 92 Japanese men. Data comprised operational taxonomic units (OTUs) and subject smoking and drinking habits, which were effectively classified by two NP (2-NP; Yes or No). Using the same OTU data, 3-NP and 5-NP were examined here and results were obtained, focusing on the accuracies of prediction, and the reliability of the selected OTUs by DM were compared to the former 2-NP. Restriction enzymes for PCR were further affected by the accuracy and were compared with 7 enzymes. There were subjects who possess HIM at the border zones of partitions, and the greater the number of partitions, the lower the obtained DM accuracy. The application of balance nodes boosted and duplicated the data, and was able to improve accuracy. More accurate and reliable DM operations are applicable to the classification of unknown subjects for identifying various characteristics, including disease.