Cargando…

Machine learning predicts and provides insights into milk acidification rates of Lactococcus lactis

Lactococcus lactis strains are important components in industrial starter cultures for cheese manufacturing. They have many strain-dependent properties, which affect the final product. Here, we explored the use of machine learning to create systematic, high-throughput screening methods for these pro...

Descripción completa

Detalles Bibliográficos
Autores principales: Karlsen, Signe Tang, Vesth, Tammi Camilla, Oregaard, Gunnar, Poulsen, Vera Kuzina, Lund, Ole, Henderson, Gemma, Bælum, Jacob
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7959382/
https://www.ncbi.nlm.nih.gov/pubmed/33720959
http://dx.doi.org/10.1371/journal.pone.0246287
_version_ 1783664958883495936
author Karlsen, Signe Tang
Vesth, Tammi Camilla
Oregaard, Gunnar
Poulsen, Vera Kuzina
Lund, Ole
Henderson, Gemma
Bælum, Jacob
author_facet Karlsen, Signe Tang
Vesth, Tammi Camilla
Oregaard, Gunnar
Poulsen, Vera Kuzina
Lund, Ole
Henderson, Gemma
Bælum, Jacob
author_sort Karlsen, Signe Tang
collection PubMed
description Lactococcus lactis strains are important components in industrial starter cultures for cheese manufacturing. They have many strain-dependent properties, which affect the final product. Here, we explored the use of machine learning to create systematic, high-throughput screening methods for these properties. Fast acidification of milk is such a strain-dependent property. To predict the maximum hourly acidification rate (V(max)), we trained Random Forest (RF) models on four different genomic representations: Presence/absence of gene families, counts of Pfam domains, the 8 nucleotide long subsequences of their DNA (8-mers), and the 9 nucleotide long subsequences of their DNA (9-mers). V(max) was measured at different temperatures, volumes, and in the presence or absence of yeast extract. These conditions were added as features in each RF model. The four models were trained on 257 strains, and the correlation between the measured V(max) and the predicted V(max) was evaluated with Pearson Correlation Coefficients (PC) on a separate dataset of 85 strains. The models all had high PC scores: 0.83 (gene presence/absence model), 0.84 (Pfam domain model), 0.76 (8-mer model), and 0.85 (9-mer model). The models all based their predictions on relevant genetic features and showed consensus on systems for lactose metabolism, degradation of casein, and pH stress response. Each model also predicted a set of features not found by the other models.
format Online
Article
Text
id pubmed-7959382
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-79593822021-03-25 Machine learning predicts and provides insights into milk acidification rates of Lactococcus lactis Karlsen, Signe Tang Vesth, Tammi Camilla Oregaard, Gunnar Poulsen, Vera Kuzina Lund, Ole Henderson, Gemma Bælum, Jacob PLoS One Research Article Lactococcus lactis strains are important components in industrial starter cultures for cheese manufacturing. They have many strain-dependent properties, which affect the final product. Here, we explored the use of machine learning to create systematic, high-throughput screening methods for these properties. Fast acidification of milk is such a strain-dependent property. To predict the maximum hourly acidification rate (V(max)), we trained Random Forest (RF) models on four different genomic representations: Presence/absence of gene families, counts of Pfam domains, the 8 nucleotide long subsequences of their DNA (8-mers), and the 9 nucleotide long subsequences of their DNA (9-mers). V(max) was measured at different temperatures, volumes, and in the presence or absence of yeast extract. These conditions were added as features in each RF model. The four models were trained on 257 strains, and the correlation between the measured V(max) and the predicted V(max) was evaluated with Pearson Correlation Coefficients (PC) on a separate dataset of 85 strains. The models all had high PC scores: 0.83 (gene presence/absence model), 0.84 (Pfam domain model), 0.76 (8-mer model), and 0.85 (9-mer model). The models all based their predictions on relevant genetic features and showed consensus on systems for lactose metabolism, degradation of casein, and pH stress response. Each model also predicted a set of features not found by the other models. Public Library of Science 2021-03-15 /pmc/articles/PMC7959382/ /pubmed/33720959 http://dx.doi.org/10.1371/journal.pone.0246287 Text en © 2021 Karlsen et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Karlsen, Signe Tang
Vesth, Tammi Camilla
Oregaard, Gunnar
Poulsen, Vera Kuzina
Lund, Ole
Henderson, Gemma
Bælum, Jacob
Machine learning predicts and provides insights into milk acidification rates of Lactococcus lactis
title Machine learning predicts and provides insights into milk acidification rates of Lactococcus lactis
title_full Machine learning predicts and provides insights into milk acidification rates of Lactococcus lactis
title_fullStr Machine learning predicts and provides insights into milk acidification rates of Lactococcus lactis
title_full_unstemmed Machine learning predicts and provides insights into milk acidification rates of Lactococcus lactis
title_short Machine learning predicts and provides insights into milk acidification rates of Lactococcus lactis
title_sort machine learning predicts and provides insights into milk acidification rates of lactococcus lactis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7959382/
https://www.ncbi.nlm.nih.gov/pubmed/33720959
http://dx.doi.org/10.1371/journal.pone.0246287
work_keys_str_mv AT karlsensignetang machinelearningpredictsandprovidesinsightsintomilkacidificationratesoflactococcuslactis
AT vesthtammicamilla machinelearningpredictsandprovidesinsightsintomilkacidificationratesoflactococcuslactis
AT oregaardgunnar machinelearningpredictsandprovidesinsightsintomilkacidificationratesoflactococcuslactis
AT poulsenverakuzina machinelearningpredictsandprovidesinsightsintomilkacidificationratesoflactococcuslactis
AT lundole machinelearningpredictsandprovidesinsightsintomilkacidificationratesoflactococcuslactis
AT hendersongemma machinelearningpredictsandprovidesinsightsintomilkacidificationratesoflactococcuslactis
AT bælumjacob machinelearningpredictsandprovidesinsightsintomilkacidificationratesoflactococcuslactis