Cargando…

Genotype-phenotype matching analysis of 38 Lactococcus lactis strains using random forest methods

BACKGROUND: Lactococcus lactis is used in dairy food fermentation and for the efficient production of industrially relevant enzymes. The genome content and different phenotypes have been determined for multiple L. lactis strains in order to understand intra-species genotype and phenotype diversity a...

Descripción completa

Detalles Bibliográficos
Autores principales: Bayjanov, Jumamurat R, Starrenburg, Marjo JC, van der Sijde, Marijke R, Siezen, Roland J, van Hijum, Sacha AFT
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3637802/
https://www.ncbi.nlm.nih.gov/pubmed/23530958
http://dx.doi.org/10.1186/1471-2180-13-68
_version_ 1782475781405933568
author Bayjanov, Jumamurat R
Starrenburg, Marjo JC
van der Sijde, Marijke R
Siezen, Roland J
van Hijum, Sacha AFT
author_facet Bayjanov, Jumamurat R
Starrenburg, Marjo JC
van der Sijde, Marijke R
Siezen, Roland J
van Hijum, Sacha AFT
author_sort Bayjanov, Jumamurat R
collection PubMed
description BACKGROUND: Lactococcus lactis is used in dairy food fermentation and for the efficient production of industrially relevant enzymes. The genome content and different phenotypes have been determined for multiple L. lactis strains in order to understand intra-species genotype and phenotype diversity and annotate gene functions. In this study, we identified relations between gene presence and a collection of 207 phenotypes across 38 L. lactis strains of dairy and plant origin. Gene occurrence and phenotype data were used in an iterative gene selection procedure, based on the Random Forest algorithm, to identify genotype-phenotype relations. RESULTS: A total of 1388 gene-phenotype relations were found, of which some confirmed known gene-phenotype relations, such as the importance of arabinose utilization genes only for strains of plant origin. We also identified a gene cluster related to growth on melibiose, a plant disaccharide; this cluster is present only in melibiose-positive strains and can be used as a genetic marker in trait improvement. Additionally, several novel gene-phenotype relations were uncovered, for instance, genes related to arsenite resistance or arginine metabolism. CONCLUSIONS: Our results indicate that genotype-phenotype matching by integrating large data sets provides the possibility to identify gene-phenotype relations, possibly improve gene function annotation and identified relations can be used for screening bacterial culture collections for desired phenotypes. In addition to all gene-phenotype relations, we also provide coherent phenotype data for 38 Lactococcus strains assessed in 207 different phenotyping experiments, which to our knowledge is the largest to date for the Lactococcus lactis species.
format Online
Article
Text
id pubmed-3637802
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36378022013-04-28 Genotype-phenotype matching analysis of 38 Lactococcus lactis strains using random forest methods Bayjanov, Jumamurat R Starrenburg, Marjo JC van der Sijde, Marijke R Siezen, Roland J van Hijum, Sacha AFT BMC Microbiol Research Article BACKGROUND: Lactococcus lactis is used in dairy food fermentation and for the efficient production of industrially relevant enzymes. The genome content and different phenotypes have been determined for multiple L. lactis strains in order to understand intra-species genotype and phenotype diversity and annotate gene functions. In this study, we identified relations between gene presence and a collection of 207 phenotypes across 38 L. lactis strains of dairy and plant origin. Gene occurrence and phenotype data were used in an iterative gene selection procedure, based on the Random Forest algorithm, to identify genotype-phenotype relations. RESULTS: A total of 1388 gene-phenotype relations were found, of which some confirmed known gene-phenotype relations, such as the importance of arabinose utilization genes only for strains of plant origin. We also identified a gene cluster related to growth on melibiose, a plant disaccharide; this cluster is present only in melibiose-positive strains and can be used as a genetic marker in trait improvement. Additionally, several novel gene-phenotype relations were uncovered, for instance, genes related to arsenite resistance or arginine metabolism. CONCLUSIONS: Our results indicate that genotype-phenotype matching by integrating large data sets provides the possibility to identify gene-phenotype relations, possibly improve gene function annotation and identified relations can be used for screening bacterial culture collections for desired phenotypes. In addition to all gene-phenotype relations, we also provide coherent phenotype data for 38 Lactococcus strains assessed in 207 different phenotyping experiments, which to our knowledge is the largest to date for the Lactococcus lactis species. BioMed Central 2013-03-26 /pmc/articles/PMC3637802/ /pubmed/23530958 http://dx.doi.org/10.1186/1471-2180-13-68 Text en Copyright © 2013 Bayjanov et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Bayjanov, Jumamurat R
Starrenburg, Marjo JC
van der Sijde, Marijke R
Siezen, Roland J
van Hijum, Sacha AFT
Genotype-phenotype matching analysis of 38 Lactococcus lactis strains using random forest methods
title Genotype-phenotype matching analysis of 38 Lactococcus lactis strains using random forest methods
title_full Genotype-phenotype matching analysis of 38 Lactococcus lactis strains using random forest methods
title_fullStr Genotype-phenotype matching analysis of 38 Lactococcus lactis strains using random forest methods
title_full_unstemmed Genotype-phenotype matching analysis of 38 Lactococcus lactis strains using random forest methods
title_short Genotype-phenotype matching analysis of 38 Lactococcus lactis strains using random forest methods
title_sort genotype-phenotype matching analysis of 38 lactococcus lactis strains using random forest methods
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3637802/
https://www.ncbi.nlm.nih.gov/pubmed/23530958
http://dx.doi.org/10.1186/1471-2180-13-68
work_keys_str_mv AT bayjanovjumamuratr genotypephenotypematchinganalysisof38lactococcuslactisstrainsusingrandomforestmethods
AT starrenburgmarjojc genotypephenotypematchinganalysisof38lactococcuslactisstrainsusingrandomforestmethods
AT vandersijdemarijker genotypephenotypematchinganalysisof38lactococcuslactisstrainsusingrandomforestmethods
AT siezenrolandj genotypephenotypematchinganalysisof38lactococcuslactisstrainsusingrandomforestmethods
AT vanhijumsachaaft genotypephenotypematchinganalysisof38lactococcuslactisstrainsusingrandomforestmethods