Cargando…

An application of kernel methods to variety identification based on SSR markers genetic fingerprinting

BACKGROUND: In crop production systems, genetic markers are increasingly used to distinguish individuals within a larger population based on their genetic make-up. Supervised approaches cannot be applied directly to genotyping data due to the specific nature of those data which are neither continuou...

Descripción completa

Detalles Bibliográficos
Autor principal: Martin, Florian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3128031/
https://www.ncbi.nlm.nih.gov/pubmed/21595989
http://dx.doi.org/10.1186/1471-2105-12-177
_version_ 1782207407908192256
author Martin, Florian
author_facet Martin, Florian
author_sort Martin, Florian
collection PubMed
description BACKGROUND: In crop production systems, genetic markers are increasingly used to distinguish individuals within a larger population based on their genetic make-up. Supervised approaches cannot be applied directly to genotyping data due to the specific nature of those data which are neither continuous, nor nominal, nor ordinal but only partially ordered. Therefore, a strategy is needed to encode the polymorphism between samples such that known supervised approaches can be applied. Moreover, finding a minimal set of molecular markers that have optimal ability to discriminate, for example, between given groups of varieties, is important as the genotyping process can be costly in terms of laboratory consumables, labor, and time. This feature selection problem also needs special care due to the specific nature of the data used. RESULTS: An approach encoding SSR polymorphisms in a positive definite kernel is presented, which then allows the usage of any kernel supervised method. The polymorphism between the samples is encoded through the Nei-Li genetic distance, which is shown to define a positive definite kernel between the genotyped samples. Additionally, a greedy feature selection algorithm for selecting SSR marker kits is presented to build economical and efficient prediction models for discrimination. The algorithm is a filter method and outperforms other filter methods adapted to this setting. When combined with kernel linear discriminant analysis or kernel principal component analysis followed by linear discriminant analysis, the approach leads to very satisfactory prediction models. CONCLUSIONS: The main advantage of the approach is to benefit from a flexible way to encode polymorphisms in a kernel and when combined with a feature selection algorithm resulting in a few specific markers, it leads to accurate and economical identification models based on SSR genotyping.
format Online
Article
Text
id pubmed-3128031
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31280312011-07-01 An application of kernel methods to variety identification based on SSR markers genetic fingerprinting Martin, Florian BMC Bioinformatics Research Article BACKGROUND: In crop production systems, genetic markers are increasingly used to distinguish individuals within a larger population based on their genetic make-up. Supervised approaches cannot be applied directly to genotyping data due to the specific nature of those data which are neither continuous, nor nominal, nor ordinal but only partially ordered. Therefore, a strategy is needed to encode the polymorphism between samples such that known supervised approaches can be applied. Moreover, finding a minimal set of molecular markers that have optimal ability to discriminate, for example, between given groups of varieties, is important as the genotyping process can be costly in terms of laboratory consumables, labor, and time. This feature selection problem also needs special care due to the specific nature of the data used. RESULTS: An approach encoding SSR polymorphisms in a positive definite kernel is presented, which then allows the usage of any kernel supervised method. The polymorphism between the samples is encoded through the Nei-Li genetic distance, which is shown to define a positive definite kernel between the genotyped samples. Additionally, a greedy feature selection algorithm for selecting SSR marker kits is presented to build economical and efficient prediction models for discrimination. The algorithm is a filter method and outperforms other filter methods adapted to this setting. When combined with kernel linear discriminant analysis or kernel principal component analysis followed by linear discriminant analysis, the approach leads to very satisfactory prediction models. CONCLUSIONS: The main advantage of the approach is to benefit from a flexible way to encode polymorphisms in a kernel and when combined with a feature selection algorithm resulting in a few specific markers, it leads to accurate and economical identification models based on SSR genotyping. BioMed Central 2011-05-20 /pmc/articles/PMC3128031/ /pubmed/21595989 http://dx.doi.org/10.1186/1471-2105-12-177 Text en Copyright ©2011 Martin; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Martin, Florian
An application of kernel methods to variety identification based on SSR markers genetic fingerprinting
title An application of kernel methods to variety identification based on SSR markers genetic fingerprinting
title_full An application of kernel methods to variety identification based on SSR markers genetic fingerprinting
title_fullStr An application of kernel methods to variety identification based on SSR markers genetic fingerprinting
title_full_unstemmed An application of kernel methods to variety identification based on SSR markers genetic fingerprinting
title_short An application of kernel methods to variety identification based on SSR markers genetic fingerprinting
title_sort application of kernel methods to variety identification based on ssr markers genetic fingerprinting
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3128031/
https://www.ncbi.nlm.nih.gov/pubmed/21595989
http://dx.doi.org/10.1186/1471-2105-12-177
work_keys_str_mv AT martinflorian anapplicationofkernelmethodstovarietyidentificationbasedonssrmarkersgeneticfingerprinting
AT martinflorian applicationofkernelmethodstovarietyidentificationbasedonssrmarkersgeneticfingerprinting