Cargando…
An application of kernel methods to variety identification based on SSR markers genetic fingerprinting
BACKGROUND: In crop production systems, genetic markers are increasingly used to distinguish individuals within a larger population based on their genetic make-up. Supervised approaches cannot be applied directly to genotyping data due to the specific nature of those data which are neither continuou...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3128031/ https://www.ncbi.nlm.nih.gov/pubmed/21595989 http://dx.doi.org/10.1186/1471-2105-12-177 |
_version_ | 1782207407908192256 |
---|---|
author | Martin, Florian |
author_facet | Martin, Florian |
author_sort | Martin, Florian |
collection | PubMed |
description | BACKGROUND: In crop production systems, genetic markers are increasingly used to distinguish individuals within a larger population based on their genetic make-up. Supervised approaches cannot be applied directly to genotyping data due to the specific nature of those data which are neither continuous, nor nominal, nor ordinal but only partially ordered. Therefore, a strategy is needed to encode the polymorphism between samples such that known supervised approaches can be applied. Moreover, finding a minimal set of molecular markers that have optimal ability to discriminate, for example, between given groups of varieties, is important as the genotyping process can be costly in terms of laboratory consumables, labor, and time. This feature selection problem also needs special care due to the specific nature of the data used. RESULTS: An approach encoding SSR polymorphisms in a positive definite kernel is presented, which then allows the usage of any kernel supervised method. The polymorphism between the samples is encoded through the Nei-Li genetic distance, which is shown to define a positive definite kernel between the genotyped samples. Additionally, a greedy feature selection algorithm for selecting SSR marker kits is presented to build economical and efficient prediction models for discrimination. The algorithm is a filter method and outperforms other filter methods adapted to this setting. When combined with kernel linear discriminant analysis or kernel principal component analysis followed by linear discriminant analysis, the approach leads to very satisfactory prediction models. CONCLUSIONS: The main advantage of the approach is to benefit from a flexible way to encode polymorphisms in a kernel and when combined with a feature selection algorithm resulting in a few specific markers, it leads to accurate and economical identification models based on SSR genotyping. |
format | Online Article Text |
id | pubmed-3128031 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-31280312011-07-01 An application of kernel methods to variety identification based on SSR markers genetic fingerprinting Martin, Florian BMC Bioinformatics Research Article BACKGROUND: In crop production systems, genetic markers are increasingly used to distinguish individuals within a larger population based on their genetic make-up. Supervised approaches cannot be applied directly to genotyping data due to the specific nature of those data which are neither continuous, nor nominal, nor ordinal but only partially ordered. Therefore, a strategy is needed to encode the polymorphism between samples such that known supervised approaches can be applied. Moreover, finding a minimal set of molecular markers that have optimal ability to discriminate, for example, between given groups of varieties, is important as the genotyping process can be costly in terms of laboratory consumables, labor, and time. This feature selection problem also needs special care due to the specific nature of the data used. RESULTS: An approach encoding SSR polymorphisms in a positive definite kernel is presented, which then allows the usage of any kernel supervised method. The polymorphism between the samples is encoded through the Nei-Li genetic distance, which is shown to define a positive definite kernel between the genotyped samples. Additionally, a greedy feature selection algorithm for selecting SSR marker kits is presented to build economical and efficient prediction models for discrimination. The algorithm is a filter method and outperforms other filter methods adapted to this setting. When combined with kernel linear discriminant analysis or kernel principal component analysis followed by linear discriminant analysis, the approach leads to very satisfactory prediction models. CONCLUSIONS: The main advantage of the approach is to benefit from a flexible way to encode polymorphisms in a kernel and when combined with a feature selection algorithm resulting in a few specific markers, it leads to accurate and economical identification models based on SSR genotyping. BioMed Central 2011-05-20 /pmc/articles/PMC3128031/ /pubmed/21595989 http://dx.doi.org/10.1186/1471-2105-12-177 Text en Copyright ©2011 Martin; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Martin, Florian An application of kernel methods to variety identification based on SSR markers genetic fingerprinting |
title | An application of kernel methods to variety identification based on SSR markers genetic fingerprinting |
title_full | An application of kernel methods to variety identification based on SSR markers genetic fingerprinting |
title_fullStr | An application of kernel methods to variety identification based on SSR markers genetic fingerprinting |
title_full_unstemmed | An application of kernel methods to variety identification based on SSR markers genetic fingerprinting |
title_short | An application of kernel methods to variety identification based on SSR markers genetic fingerprinting |
title_sort | application of kernel methods to variety identification based on ssr markers genetic fingerprinting |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3128031/ https://www.ncbi.nlm.nih.gov/pubmed/21595989 http://dx.doi.org/10.1186/1471-2105-12-177 |
work_keys_str_mv | AT martinflorian anapplicationofkernelmethodstovarietyidentificationbasedonssrmarkersgeneticfingerprinting AT martinflorian applicationofkernelmethodstovarietyidentificationbasedonssrmarkersgeneticfingerprinting |