Cargando…

Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared Spectra

SIMPLE SUMMARY: There is a growing interest in using milk mid-infrared (MIR) spectrometry to obtain new phenotypes to assist in the complex management of dairy farms. These predictive values can be erroneous for many reasons, even if the prediction equations used are accurate. Unfortunately, there i...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Lei, Li, Chunfang, Dehareng, Frédéric, Grelet, Clément, Colinet, Frédéric, Gengler, Nicolas, Brostaux, Yves, Soyeurt, Hélène
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7922538/
https://www.ncbi.nlm.nih.gov/pubmed/33670810
http://dx.doi.org/10.3390/ani11020533
_version_ 1783658713488293888
author Zhang, Lei
Li, Chunfang
Dehareng, Frédéric
Grelet, Clément
Colinet, Frédéric
Gengler, Nicolas
Brostaux, Yves
Soyeurt, Hélène
author_facet Zhang, Lei
Li, Chunfang
Dehareng, Frédéric
Grelet, Clément
Colinet, Frédéric
Gengler, Nicolas
Brostaux, Yves
Soyeurt, Hélène
author_sort Zhang, Lei
collection PubMed
description SIMPLE SUMMARY: There is a growing interest in using milk mid-infrared (MIR) spectrometry to obtain new phenotypes to assist in the complex management of dairy farms. These predictive values can be erroneous for many reasons, even if the prediction equations used are accurate. Unfortunately, there is no quality protocol routinely implemented to detect those abnormal predictive values in the database recorded by dairy herd improvement (DHI) organizations, except for fat and protein contents. However, for financial and practical reasons, it is unfeasible to adapt the quality protocol commonly used in milk laboratories to improve the accuracy of those traits. So, this study proposes three different statistical methods that would be easy to implement by DHI organizations to detect abnormal values and limit the spectral extrapolation in order to improve the accuracy of MIR-based predictive values. ABSTRACT: The use of abnormal milk mid-infrared (MIR) spectrum strongly affects prediction quality, even if the prediction equations used are accurate. So, this record must be detected after or before the prediction process to avoid erroneous spectral extrapolation or the use of poor-quality spectral data by dairy herd improvement (DHI) organizations. For financial or practical reasons, adapting the quality protocol currently used to improve the accuracy of fat and protein contents is unfeasible. This study proposed three different statistical methods that would be easy to implement by DHI organizations to solve this issue: the deletion of 1% of the extreme high and low predictive values (M1), the deletion of records based on the Global-H (GH) distance (M2), and the deletion of records based on the absolute fat residual value (M3). Additionally, the combinations of these three methods were investigated. A total of 346,818 milk samples were analyzed by MIR spectrometry to predict the contents of fat, protein, and fatty acids. Then, the same traits were also predicted externally using their corresponded standardized MIR spectra. The interest in cleaning procedures was assessed by estimating the root mean square differences (RMSDs) between those internal and external predicted phenotypes. All methods allowed for a decrease in the RMSD, with a gain ranging from 0.32% to 41.39%. Based on the obtained results, the “M1 and M2” combination should be preferred to be more parsimonious in the data loss, as it had the higher ratio of RMSD gain to data loss. This method deleted the records based on the 2% extreme predictions and a GH threshold set at 5. However, to ensure the lowest RMSD, the “M2 or M3” combination, considering a GH threshold of 5 and an absolute fat residual difference set at 0.30 g/dL of milk, was the most relevant. Both combinations involved M2 confirming the high interest of calculating the GH distance for all samples to predict. However, if it is impossible to estimate the GH distance due to a lack of relevant information to compute this statistical parameter, the obtained results recommended the use of M1 combined with M3. The limitation used in M3 must be adapted by the DHI, as this will depend on the spectral data and the equation used. The methodology proposed in this study can be generalized for other MIR-based phenotypes.
format Online
Article
Text
id pubmed-7922538
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-79225382021-03-03 Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared Spectra Zhang, Lei Li, Chunfang Dehareng, Frédéric Grelet, Clément Colinet, Frédéric Gengler, Nicolas Brostaux, Yves Soyeurt, Hélène Animals (Basel) Article SIMPLE SUMMARY: There is a growing interest in using milk mid-infrared (MIR) spectrometry to obtain new phenotypes to assist in the complex management of dairy farms. These predictive values can be erroneous for many reasons, even if the prediction equations used are accurate. Unfortunately, there is no quality protocol routinely implemented to detect those abnormal predictive values in the database recorded by dairy herd improvement (DHI) organizations, except for fat and protein contents. However, for financial and practical reasons, it is unfeasible to adapt the quality protocol commonly used in milk laboratories to improve the accuracy of those traits. So, this study proposes three different statistical methods that would be easy to implement by DHI organizations to detect abnormal values and limit the spectral extrapolation in order to improve the accuracy of MIR-based predictive values. ABSTRACT: The use of abnormal milk mid-infrared (MIR) spectrum strongly affects prediction quality, even if the prediction equations used are accurate. So, this record must be detected after or before the prediction process to avoid erroneous spectral extrapolation or the use of poor-quality spectral data by dairy herd improvement (DHI) organizations. For financial or practical reasons, adapting the quality protocol currently used to improve the accuracy of fat and protein contents is unfeasible. This study proposed three different statistical methods that would be easy to implement by DHI organizations to solve this issue: the deletion of 1% of the extreme high and low predictive values (M1), the deletion of records based on the Global-H (GH) distance (M2), and the deletion of records based on the absolute fat residual value (M3). Additionally, the combinations of these three methods were investigated. A total of 346,818 milk samples were analyzed by MIR spectrometry to predict the contents of fat, protein, and fatty acids. Then, the same traits were also predicted externally using their corresponded standardized MIR spectra. The interest in cleaning procedures was assessed by estimating the root mean square differences (RMSDs) between those internal and external predicted phenotypes. All methods allowed for a decrease in the RMSD, with a gain ranging from 0.32% to 41.39%. Based on the obtained results, the “M1 and M2” combination should be preferred to be more parsimonious in the data loss, as it had the higher ratio of RMSD gain to data loss. This method deleted the records based on the 2% extreme predictions and a GH threshold set at 5. However, to ensure the lowest RMSD, the “M2 or M3” combination, considering a GH threshold of 5 and an absolute fat residual difference set at 0.30 g/dL of milk, was the most relevant. Both combinations involved M2 confirming the high interest of calculating the GH distance for all samples to predict. However, if it is impossible to estimate the GH distance due to a lack of relevant information to compute this statistical parameter, the obtained results recommended the use of M1 combined with M3. The limitation used in M3 must be adapted by the DHI, as this will depend on the spectral data and the equation used. The methodology proposed in this study can be generalized for other MIR-based phenotypes. MDPI 2021-02-18 /pmc/articles/PMC7922538/ /pubmed/33670810 http://dx.doi.org/10.3390/ani11020533 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zhang, Lei
Li, Chunfang
Dehareng, Frédéric
Grelet, Clément
Colinet, Frédéric
Gengler, Nicolas
Brostaux, Yves
Soyeurt, Hélène
Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared Spectra
title Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared Spectra
title_full Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared Spectra
title_fullStr Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared Spectra
title_full_unstemmed Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared Spectra
title_short Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared Spectra
title_sort appropriate data quality checks improve the reliability of values predicted from milk mid-infrared spectra
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7922538/
https://www.ncbi.nlm.nih.gov/pubmed/33670810
http://dx.doi.org/10.3390/ani11020533
work_keys_str_mv AT zhanglei appropriatedataqualitychecksimprovethereliabilityofvaluespredictedfrommilkmidinfraredspectra
AT lichunfang appropriatedataqualitychecksimprovethereliabilityofvaluespredictedfrommilkmidinfraredspectra
AT deharengfrederic appropriatedataqualitychecksimprovethereliabilityofvaluespredictedfrommilkmidinfraredspectra
AT greletclement appropriatedataqualitychecksimprovethereliabilityofvaluespredictedfrommilkmidinfraredspectra
AT colinetfrederic appropriatedataqualitychecksimprovethereliabilityofvaluespredictedfrommilkmidinfraredspectra
AT genglernicolas appropriatedataqualitychecksimprovethereliabilityofvaluespredictedfrommilkmidinfraredspectra
AT brostauxyves appropriatedataqualitychecksimprovethereliabilityofvaluespredictedfrommilkmidinfraredspectra
AT soyeurthelene appropriatedataqualitychecksimprovethereliabilityofvaluespredictedfrommilkmidinfraredspectra