Cargando…

Machine learning approach to support taxonomic species discrimination based on helminth collections data

BACKGROUND: There are more than 300 species of capillariids that parasitize various vertebrate groups worldwide. Species identification is hindered because of the few taxonomically informative structures available, making the task laborious and genus definition controversial. Thus, its taxonomy is o...

Descripción completa

Detalles Bibliográficos
Autores principales: Borba, Victor Hugo, Martin, Coralie, Machado-Silva, José Roberto, Xavier, Samanta C. C., de Mello, Flávio L., Iñiguez, Alena Mayo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8088700/
https://www.ncbi.nlm.nih.gov/pubmed/33933139
http://dx.doi.org/10.1186/s13071-021-04721-6
_version_ 1783686896747020288
author Borba, Victor Hugo
Martin, Coralie
Machado-Silva, José Roberto
Xavier, Samanta C. C.
de Mello, Flávio L.
Iñiguez, Alena Mayo
author_facet Borba, Victor Hugo
Martin, Coralie
Machado-Silva, José Roberto
Xavier, Samanta C. C.
de Mello, Flávio L.
Iñiguez, Alena Mayo
author_sort Borba, Victor Hugo
collection PubMed
description BACKGROUND: There are more than 300 species of capillariids that parasitize various vertebrate groups worldwide. Species identification is hindered because of the few taxonomically informative structures available, making the task laborious and genus definition controversial. Thus, its taxonomy is one of the most complex among Nematoda. Eggs are the parasitic structures most viewed in coprological analysis in both modern and ancient samples; consequently, their presence is indicative of positive diagnosis for infection. The structure of the egg could play a role in genera or species discrimination. Institutional biological collections are taxonomic repositories of specimens described and strictly identified by systematics specialists. METHODS: The present work aims to characterize eggs of capillariid species deposited in institutional helminth collections and to process the morphological, morphometric and ecological data using machine learning (ML) as a new approach for taxonomic identification. Specimens of 28 species and 8 genera deposited at Coleção Helmintológica do Instituto Oswaldo Cruz (CHIOC, IOC/FIOCRUZ/Brazil) and Collection de Nématodes Zooparasites du Muséum National d’Histoire Naturelle de Paris (MNHN/France) were examined under light microscopy. In the morphological and morphometric analyses (MM), the total length and width of eggs as well as plugs and shell thickness were considered. In addition, eggshell ornamentations and ecological parameters of the geographical location (GL) and host (H) were included. RESULTS: The performance of the logistic model tree (LMT) algorithm showed the highest values in all metrics compared with the other algorithms. Algorithm J48 produced the most reliable decision tree for species identification alongside REPTree. The Majority Voting algorithm showed high metric values, but the combined classifiers did not attenuate the errors revealed in each algorithm alone. The statistical evaluation of the dataset indicated a significant difference between trees, with GL + H + MM and MM only with the best scores. CONCLUSIONS: The present research proposed a novel procedure for taxonomic species identification, integrating data from centenary biological collections and the logic of artificial intelligence techniques. This study will support future research on taxonomic identification and diagnosis of both modern and archaeological capillariids. GRAPHICAL ABSTRACT: [Image: see text] SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13071-021-04721-6.
format Online
Article
Text
id pubmed-8088700
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-80887002021-05-03 Machine learning approach to support taxonomic species discrimination based on helminth collections data Borba, Victor Hugo Martin, Coralie Machado-Silva, José Roberto Xavier, Samanta C. C. de Mello, Flávio L. Iñiguez, Alena Mayo Parasit Vectors Research BACKGROUND: There are more than 300 species of capillariids that parasitize various vertebrate groups worldwide. Species identification is hindered because of the few taxonomically informative structures available, making the task laborious and genus definition controversial. Thus, its taxonomy is one of the most complex among Nematoda. Eggs are the parasitic structures most viewed in coprological analysis in both modern and ancient samples; consequently, their presence is indicative of positive diagnosis for infection. The structure of the egg could play a role in genera or species discrimination. Institutional biological collections are taxonomic repositories of specimens described and strictly identified by systematics specialists. METHODS: The present work aims to characterize eggs of capillariid species deposited in institutional helminth collections and to process the morphological, morphometric and ecological data using machine learning (ML) as a new approach for taxonomic identification. Specimens of 28 species and 8 genera deposited at Coleção Helmintológica do Instituto Oswaldo Cruz (CHIOC, IOC/FIOCRUZ/Brazil) and Collection de Nématodes Zooparasites du Muséum National d’Histoire Naturelle de Paris (MNHN/France) were examined under light microscopy. In the morphological and morphometric analyses (MM), the total length and width of eggs as well as plugs and shell thickness were considered. In addition, eggshell ornamentations and ecological parameters of the geographical location (GL) and host (H) were included. RESULTS: The performance of the logistic model tree (LMT) algorithm showed the highest values in all metrics compared with the other algorithms. Algorithm J48 produced the most reliable decision tree for species identification alongside REPTree. The Majority Voting algorithm showed high metric values, but the combined classifiers did not attenuate the errors revealed in each algorithm alone. The statistical evaluation of the dataset indicated a significant difference between trees, with GL + H + MM and MM only with the best scores. CONCLUSIONS: The present research proposed a novel procedure for taxonomic species identification, integrating data from centenary biological collections and the logic of artificial intelligence techniques. This study will support future research on taxonomic identification and diagnosis of both modern and archaeological capillariids. GRAPHICAL ABSTRACT: [Image: see text] SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13071-021-04721-6. BioMed Central 2021-05-01 /pmc/articles/PMC8088700/ /pubmed/33933139 http://dx.doi.org/10.1186/s13071-021-04721-6 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Borba, Victor Hugo
Martin, Coralie
Machado-Silva, José Roberto
Xavier, Samanta C. C.
de Mello, Flávio L.
Iñiguez, Alena Mayo
Machine learning approach to support taxonomic species discrimination based on helminth collections data
title Machine learning approach to support taxonomic species discrimination based on helminth collections data
title_full Machine learning approach to support taxonomic species discrimination based on helminth collections data
title_fullStr Machine learning approach to support taxonomic species discrimination based on helminth collections data
title_full_unstemmed Machine learning approach to support taxonomic species discrimination based on helminth collections data
title_short Machine learning approach to support taxonomic species discrimination based on helminth collections data
title_sort machine learning approach to support taxonomic species discrimination based on helminth collections data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8088700/
https://www.ncbi.nlm.nih.gov/pubmed/33933139
http://dx.doi.org/10.1186/s13071-021-04721-6
work_keys_str_mv AT borbavictorhugo machinelearningapproachtosupporttaxonomicspeciesdiscriminationbasedonhelminthcollectionsdata
AT martincoralie machinelearningapproachtosupporttaxonomicspeciesdiscriminationbasedonhelminthcollectionsdata
AT machadosilvajoseroberto machinelearningapproachtosupporttaxonomicspeciesdiscriminationbasedonhelminthcollectionsdata
AT xaviersamantacc machinelearningapproachtosupporttaxonomicspeciesdiscriminationbasedonhelminthcollectionsdata
AT demelloflaviol machinelearningapproachtosupporttaxonomicspeciesdiscriminationbasedonhelminthcollectionsdata
AT iniguezalenamayo machinelearningapproachtosupporttaxonomicspeciesdiscriminationbasedonhelminthcollectionsdata