Cargando…

Learning to classify species with barcodes

BACKGROUND: According to many field experts, specimens classification based on morphological keys needs to be supported with automated techniques based on the analysis of DNA fragments. The most successful results in this area are those obtained from a particular fragment of mitochondrial DNA, the g...

Descripción completa

Detalles Bibliográficos
Autores principales: Bertolazzi, Paola, Felici, Giovanni, Weitschek, Emanuel
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2775153/
https://www.ncbi.nlm.nih.gov/pubmed/19900303
http://dx.doi.org/10.1186/1471-2105-10-S14-S7
_version_ 1782173992728133632
author Bertolazzi, Paola
Felici, Giovanni
Weitschek, Emanuel
author_facet Bertolazzi, Paola
Felici, Giovanni
Weitschek, Emanuel
author_sort Bertolazzi, Paola
collection PubMed
description BACKGROUND: According to many field experts, specimens classification based on morphological keys needs to be supported with automated techniques based on the analysis of DNA fragments. The most successful results in this area are those obtained from a particular fragment of mitochondrial DNA, the gene cytochrome c oxidase I (COI) (the "barcode"). Since 2004 the Consortium for the Barcode of Life (CBOL) promotes the collection of barcode specimens and the development of methods to analyze the barcode for several tasks, among which the identification of rules to correctly classify an individual into its species by reading its barcode. RESULTS: We adopt a Logic Mining method based on two optimization models and present the results obtained on two datasets where a number of COI fragments are used to describe the individuals that belong to different species. The method proposed exhibits high correct recognition rates on a training-testing split of the available data using a small proportion of the information available (e.g., correct recognition approx. 97% when only 20 sites of the 648 available are used). The method is able to provide compact formulas on the values (A, C, G, T) at the selected sites that synthesize the characteristic of each species, a relevant information for taxonomists. CONCLUSION: We have presented a Logic Mining technique designed to analyze barcode data and to provide detailed output of interest to the taxonomists and the barcode community represented in the CBOL Consortium. The method has proven to be effective, efficient and precise.
format Text
id pubmed-2775153
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27751532009-11-10 Learning to classify species with barcodes Bertolazzi, Paola Felici, Giovanni Weitschek, Emanuel BMC Bioinformatics Research BACKGROUND: According to many field experts, specimens classification based on morphological keys needs to be supported with automated techniques based on the analysis of DNA fragments. The most successful results in this area are those obtained from a particular fragment of mitochondrial DNA, the gene cytochrome c oxidase I (COI) (the "barcode"). Since 2004 the Consortium for the Barcode of Life (CBOL) promotes the collection of barcode specimens and the development of methods to analyze the barcode for several tasks, among which the identification of rules to correctly classify an individual into its species by reading its barcode. RESULTS: We adopt a Logic Mining method based on two optimization models and present the results obtained on two datasets where a number of COI fragments are used to describe the individuals that belong to different species. The method proposed exhibits high correct recognition rates on a training-testing split of the available data using a small proportion of the information available (e.g., correct recognition approx. 97% when only 20 sites of the 648 available are used). The method is able to provide compact formulas on the values (A, C, G, T) at the selected sites that synthesize the characteristic of each species, a relevant information for taxonomists. CONCLUSION: We have presented a Logic Mining technique designed to analyze barcode data and to provide detailed output of interest to the taxonomists and the barcode community represented in the CBOL Consortium. The method has proven to be effective, efficient and precise. BioMed Central 2009-11-10 /pmc/articles/PMC2775153/ /pubmed/19900303 http://dx.doi.org/10.1186/1471-2105-10-S14-S7 Text en Copyright © 2009 Bertolazzi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Bertolazzi, Paola
Felici, Giovanni
Weitschek, Emanuel
Learning to classify species with barcodes
title Learning to classify species with barcodes
title_full Learning to classify species with barcodes
title_fullStr Learning to classify species with barcodes
title_full_unstemmed Learning to classify species with barcodes
title_short Learning to classify species with barcodes
title_sort learning to classify species with barcodes
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2775153/
https://www.ncbi.nlm.nih.gov/pubmed/19900303
http://dx.doi.org/10.1186/1471-2105-10-S14-S7
work_keys_str_mv AT bertolazzipaola learningtoclassifyspecieswithbarcodes
AT felicigiovanni learningtoclassifyspecieswithbarcodes
AT weitschekemanuel learningtoclassifyspecieswithbarcodes