Cargando…
Feature selection with a genetic algorithm can help improve the distinguishing power of microbiota information in monozygotic twins' identification
INTRODUCTION: Personal identification of monozygotic twins (MZT) has been challenging in forensic genetics. Previous research has demonstrated that microbial markers have potential value due to their specificity and long-term stability. However, those studies would use the complete information of de...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10406218/ https://www.ncbi.nlm.nih.gov/pubmed/37555059 http://dx.doi.org/10.3389/fmicb.2023.1210638 |
_version_ | 1785085702897139712 |
---|---|
author | Fu, Guangping Ma, Guanju Dou, Shujie Wang, Qian Fu, Lihong Zhang, Xiaojing Lu, Chaolong Cong, Bin Li, Shujin |
author_facet | Fu, Guangping Ma, Guanju Dou, Shujie Wang, Qian Fu, Lihong Zhang, Xiaojing Lu, Chaolong Cong, Bin Li, Shujin |
author_sort | Fu, Guangping |
collection | PubMed |
description | INTRODUCTION: Personal identification of monozygotic twins (MZT) has been challenging in forensic genetics. Previous research has demonstrated that microbial markers have potential value due to their specificity and long-term stability. However, those studies would use the complete information of detected microbial communities, and low-value species would limit the performance of previous models. METHODS: To address this issue, we collected 80 saliva samples from 10 pairs of MZTs at four different time points and used 16s rRNA V3–V4 region sequencing to obtain microbiota information. The data formed 280 inner-individual (Self) or MZT sample pairs, divided into four groups based on the individual relationship and time interval, and then randomly divided into training and testing sets with an 8:2 ratio. We built 12 identification models based on the time interval ( ≤ 1 year or ≥ 2 months), data basis (Amplicon sequence variants, ASVs or Operational taxonomic unit, OTUs), and distance parameter selection (Jaccard distance, Bray-Curist distance, or Hellinger distance) and then improved their identification power through genetic algorithm processes. The best combination of databases with distance parameters was selected as the final model for the two types of time intervals. Bayes theory was introduced to provide a numerical indicator of the evidence's effectiveness in practical cases. RESULTS: From the 80 saliva samples, 369 OTUs and 1130 ASVs were detected. After the feature selection process, ASV-Jaccard distance models were selected as the final models for the two types of time intervals. For short interval samples, the final model can completely distinguish MZT pairs from Self ones in both training and test sets. DISCUSSION: Our findings support the microbiota solution to the challenging MZT identification problem and highlight the importance of feature selection in improving model performance. |
format | Online Article Text |
id | pubmed-10406218 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-104062182023-08-08 Feature selection with a genetic algorithm can help improve the distinguishing power of microbiota information in monozygotic twins' identification Fu, Guangping Ma, Guanju Dou, Shujie Wang, Qian Fu, Lihong Zhang, Xiaojing Lu, Chaolong Cong, Bin Li, Shujin Front Microbiol Microbiology INTRODUCTION: Personal identification of monozygotic twins (MZT) has been challenging in forensic genetics. Previous research has demonstrated that microbial markers have potential value due to their specificity and long-term stability. However, those studies would use the complete information of detected microbial communities, and low-value species would limit the performance of previous models. METHODS: To address this issue, we collected 80 saliva samples from 10 pairs of MZTs at four different time points and used 16s rRNA V3–V4 region sequencing to obtain microbiota information. The data formed 280 inner-individual (Self) or MZT sample pairs, divided into four groups based on the individual relationship and time interval, and then randomly divided into training and testing sets with an 8:2 ratio. We built 12 identification models based on the time interval ( ≤ 1 year or ≥ 2 months), data basis (Amplicon sequence variants, ASVs or Operational taxonomic unit, OTUs), and distance parameter selection (Jaccard distance, Bray-Curist distance, or Hellinger distance) and then improved their identification power through genetic algorithm processes. The best combination of databases with distance parameters was selected as the final model for the two types of time intervals. Bayes theory was introduced to provide a numerical indicator of the evidence's effectiveness in practical cases. RESULTS: From the 80 saliva samples, 369 OTUs and 1130 ASVs were detected. After the feature selection process, ASV-Jaccard distance models were selected as the final models for the two types of time intervals. For short interval samples, the final model can completely distinguish MZT pairs from Self ones in both training and test sets. DISCUSSION: Our findings support the microbiota solution to the challenging MZT identification problem and highlight the importance of feature selection in improving model performance. Frontiers Media S.A. 2023-07-24 /pmc/articles/PMC10406218/ /pubmed/37555059 http://dx.doi.org/10.3389/fmicb.2023.1210638 Text en Copyright © 2023 Fu, Ma, Dou, Wang, Fu, Zhang, Lu, Cong and Li. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Microbiology Fu, Guangping Ma, Guanju Dou, Shujie Wang, Qian Fu, Lihong Zhang, Xiaojing Lu, Chaolong Cong, Bin Li, Shujin Feature selection with a genetic algorithm can help improve the distinguishing power of microbiota information in monozygotic twins' identification |
title | Feature selection with a genetic algorithm can help improve the distinguishing power of microbiota information in monozygotic twins' identification |
title_full | Feature selection with a genetic algorithm can help improve the distinguishing power of microbiota information in monozygotic twins' identification |
title_fullStr | Feature selection with a genetic algorithm can help improve the distinguishing power of microbiota information in monozygotic twins' identification |
title_full_unstemmed | Feature selection with a genetic algorithm can help improve the distinguishing power of microbiota information in monozygotic twins' identification |
title_short | Feature selection with a genetic algorithm can help improve the distinguishing power of microbiota information in monozygotic twins' identification |
title_sort | feature selection with a genetic algorithm can help improve the distinguishing power of microbiota information in monozygotic twins' identification |
topic | Microbiology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10406218/ https://www.ncbi.nlm.nih.gov/pubmed/37555059 http://dx.doi.org/10.3389/fmicb.2023.1210638 |
work_keys_str_mv | AT fuguangping featureselectionwithageneticalgorithmcanhelpimprovethedistinguishingpowerofmicrobiotainformationinmonozygotictwinsidentification AT maguanju featureselectionwithageneticalgorithmcanhelpimprovethedistinguishingpowerofmicrobiotainformationinmonozygotictwinsidentification AT doushujie featureselectionwithageneticalgorithmcanhelpimprovethedistinguishingpowerofmicrobiotainformationinmonozygotictwinsidentification AT wangqian featureselectionwithageneticalgorithmcanhelpimprovethedistinguishingpowerofmicrobiotainformationinmonozygotictwinsidentification AT fulihong featureselectionwithageneticalgorithmcanhelpimprovethedistinguishingpowerofmicrobiotainformationinmonozygotictwinsidentification AT zhangxiaojing featureselectionwithageneticalgorithmcanhelpimprovethedistinguishingpowerofmicrobiotainformationinmonozygotictwinsidentification AT luchaolong featureselectionwithageneticalgorithmcanhelpimprovethedistinguishingpowerofmicrobiotainformationinmonozygotictwinsidentification AT congbin featureselectionwithageneticalgorithmcanhelpimprovethedistinguishingpowerofmicrobiotainformationinmonozygotictwinsidentification AT lishujin featureselectionwithageneticalgorithmcanhelpimprovethedistinguishingpowerofmicrobiotainformationinmonozygotictwinsidentification |