Cargando…

Feature selection with a genetic algorithm can help improve the distinguishing power of microbiota information in monozygotic twins' identification

INTRODUCTION: Personal identification of monozygotic twins (MZT) has been challenging in forensic genetics. Previous research has demonstrated that microbial markers have potential value due to their specificity and long-term stability. However, those studies would use the complete information of de...

Descripción completa

Detalles Bibliográficos
Autores principales: Fu, Guangping, Ma, Guanju, Dou, Shujie, Wang, Qian, Fu, Lihong, Zhang, Xiaojing, Lu, Chaolong, Cong, Bin, Li, Shujin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10406218/
https://www.ncbi.nlm.nih.gov/pubmed/37555059
http://dx.doi.org/10.3389/fmicb.2023.1210638
_version_ 1785085702897139712
author Fu, Guangping
Ma, Guanju
Dou, Shujie
Wang, Qian
Fu, Lihong
Zhang, Xiaojing
Lu, Chaolong
Cong, Bin
Li, Shujin
author_facet Fu, Guangping
Ma, Guanju
Dou, Shujie
Wang, Qian
Fu, Lihong
Zhang, Xiaojing
Lu, Chaolong
Cong, Bin
Li, Shujin
author_sort Fu, Guangping
collection PubMed
description INTRODUCTION: Personal identification of monozygotic twins (MZT) has been challenging in forensic genetics. Previous research has demonstrated that microbial markers have potential value due to their specificity and long-term stability. However, those studies would use the complete information of detected microbial communities, and low-value species would limit the performance of previous models. METHODS: To address this issue, we collected 80 saliva samples from 10 pairs of MZTs at four different time points and used 16s rRNA V3–V4 region sequencing to obtain microbiota information. The data formed 280 inner-individual (Self) or MZT sample pairs, divided into four groups based on the individual relationship and time interval, and then randomly divided into training and testing sets with an 8:2 ratio. We built 12 identification models based on the time interval ( ≤ 1 year or ≥ 2 months), data basis (Amplicon sequence variants, ASVs or Operational taxonomic unit, OTUs), and distance parameter selection (Jaccard distance, Bray-Curist distance, or Hellinger distance) and then improved their identification power through genetic algorithm processes. The best combination of databases with distance parameters was selected as the final model for the two types of time intervals. Bayes theory was introduced to provide a numerical indicator of the evidence's effectiveness in practical cases. RESULTS: From the 80 saliva samples, 369 OTUs and 1130 ASVs were detected. After the feature selection process, ASV-Jaccard distance models were selected as the final models for the two types of time intervals. For short interval samples, the final model can completely distinguish MZT pairs from Self ones in both training and test sets. DISCUSSION: Our findings support the microbiota solution to the challenging MZT identification problem and highlight the importance of feature selection in improving model performance.
format Online
Article
Text
id pubmed-10406218
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-104062182023-08-08 Feature selection with a genetic algorithm can help improve the distinguishing power of microbiota information in monozygotic twins' identification Fu, Guangping Ma, Guanju Dou, Shujie Wang, Qian Fu, Lihong Zhang, Xiaojing Lu, Chaolong Cong, Bin Li, Shujin Front Microbiol Microbiology INTRODUCTION: Personal identification of monozygotic twins (MZT) has been challenging in forensic genetics. Previous research has demonstrated that microbial markers have potential value due to their specificity and long-term stability. However, those studies would use the complete information of detected microbial communities, and low-value species would limit the performance of previous models. METHODS: To address this issue, we collected 80 saliva samples from 10 pairs of MZTs at four different time points and used 16s rRNA V3–V4 region sequencing to obtain microbiota information. The data formed 280 inner-individual (Self) or MZT sample pairs, divided into four groups based on the individual relationship and time interval, and then randomly divided into training and testing sets with an 8:2 ratio. We built 12 identification models based on the time interval ( ≤ 1 year or ≥ 2 months), data basis (Amplicon sequence variants, ASVs or Operational taxonomic unit, OTUs), and distance parameter selection (Jaccard distance, Bray-Curist distance, or Hellinger distance) and then improved their identification power through genetic algorithm processes. The best combination of databases with distance parameters was selected as the final model for the two types of time intervals. Bayes theory was introduced to provide a numerical indicator of the evidence's effectiveness in practical cases. RESULTS: From the 80 saliva samples, 369 OTUs and 1130 ASVs were detected. After the feature selection process, ASV-Jaccard distance models were selected as the final models for the two types of time intervals. For short interval samples, the final model can completely distinguish MZT pairs from Self ones in both training and test sets. DISCUSSION: Our findings support the microbiota solution to the challenging MZT identification problem and highlight the importance of feature selection in improving model performance. Frontiers Media S.A. 2023-07-24 /pmc/articles/PMC10406218/ /pubmed/37555059 http://dx.doi.org/10.3389/fmicb.2023.1210638 Text en Copyright © 2023 Fu, Ma, Dou, Wang, Fu, Zhang, Lu, Cong and Li. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Fu, Guangping
Ma, Guanju
Dou, Shujie
Wang, Qian
Fu, Lihong
Zhang, Xiaojing
Lu, Chaolong
Cong, Bin
Li, Shujin
Feature selection with a genetic algorithm can help improve the distinguishing power of microbiota information in monozygotic twins' identification
title Feature selection with a genetic algorithm can help improve the distinguishing power of microbiota information in monozygotic twins' identification
title_full Feature selection with a genetic algorithm can help improve the distinguishing power of microbiota information in monozygotic twins' identification
title_fullStr Feature selection with a genetic algorithm can help improve the distinguishing power of microbiota information in monozygotic twins' identification
title_full_unstemmed Feature selection with a genetic algorithm can help improve the distinguishing power of microbiota information in monozygotic twins' identification
title_short Feature selection with a genetic algorithm can help improve the distinguishing power of microbiota information in monozygotic twins' identification
title_sort feature selection with a genetic algorithm can help improve the distinguishing power of microbiota information in monozygotic twins' identification
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10406218/
https://www.ncbi.nlm.nih.gov/pubmed/37555059
http://dx.doi.org/10.3389/fmicb.2023.1210638
work_keys_str_mv AT fuguangping featureselectionwithageneticalgorithmcanhelpimprovethedistinguishingpowerofmicrobiotainformationinmonozygotictwinsidentification
AT maguanju featureselectionwithageneticalgorithmcanhelpimprovethedistinguishingpowerofmicrobiotainformationinmonozygotictwinsidentification
AT doushujie featureselectionwithageneticalgorithmcanhelpimprovethedistinguishingpowerofmicrobiotainformationinmonozygotictwinsidentification
AT wangqian featureselectionwithageneticalgorithmcanhelpimprovethedistinguishingpowerofmicrobiotainformationinmonozygotictwinsidentification
AT fulihong featureselectionwithageneticalgorithmcanhelpimprovethedistinguishingpowerofmicrobiotainformationinmonozygotictwinsidentification
AT zhangxiaojing featureselectionwithageneticalgorithmcanhelpimprovethedistinguishingpowerofmicrobiotainformationinmonozygotictwinsidentification
AT luchaolong featureselectionwithageneticalgorithmcanhelpimprovethedistinguishingpowerofmicrobiotainformationinmonozygotictwinsidentification
AT congbin featureselectionwithageneticalgorithmcanhelpimprovethedistinguishingpowerofmicrobiotainformationinmonozygotictwinsidentification
AT lishujin featureselectionwithageneticalgorithmcanhelpimprovethedistinguishingpowerofmicrobiotainformationinmonozygotictwinsidentification