Cargando…

SpaTemHTP: A Data Analysis Pipeline for Efficient Processing and Utilization of Temporal High-Throughput Phenotyping Data

The rapid development of phenotyping technologies over the last years gave the opportunity to study plant development over time. The treatment of the massive amount of data collected by high-throughput phenotyping (HTP) platforms is however an important challenge for the plant science community. An...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kar, Soumyashree, Garin, Vincent, Kholová, Jana, Vadez, Vincent, Durbha, Surya S., Tanaka, Ryokei, Iwata, Hiroyoshi, Urban, Milan O., Adinarayana, J.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2020
Materias:	Plant Science
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7714717/ https://www.ncbi.nlm.nih.gov/pubmed/33329623 http://dx.doi.org/10.3389/fpls.2020.552509

_version_	1783618798921711616
author	Kar, Soumyashree Garin, Vincent Kholová, Jana Vadez, Vincent Durbha, Surya S. Tanaka, Ryokei Iwata, Hiroyoshi Urban, Milan O. Adinarayana, J.
author_facet	Kar, Soumyashree Garin, Vincent Kholová, Jana Vadez, Vincent Durbha, Surya S. Tanaka, Ryokei Iwata, Hiroyoshi Urban, Milan O. Adinarayana, J.
author_sort	Kar, Soumyashree
collection	PubMed
description	The rapid development of phenotyping technologies over the last years gave the opportunity to study plant development over time. The treatment of the massive amount of data collected by high-throughput phenotyping (HTP) platforms is however an important challenge for the plant science community. An important issue is to accurately estimate, over time, the genotypic component of plant phenotype. In outdoor and field-based HTP platforms, phenotype measurements can be substantially affected by data-generation inaccuracies or failures, leading to erroneous or missing data. To solve that problem, we developed an analytical pipeline composed of three modules: detection of outliers, imputation of missing values, and mixed-model genotype adjusted means computation with spatial adjustment. The pipeline was tested on three different traits (3D leaf area, projected leaf area, and plant height), in two crops (chickpea, sorghum), measured during two seasons. Using real-data analyses and simulations, we showed that the sequential application of the three pipeline steps was particularly useful to estimate smooth genotype growth curves from raw data containing a large amount of noise, a situation that is potentially frequent in data generated on outdoor HTP platforms. The procedure we propose can handle up to 50% of missing values. It is also robust to data contamination rates between 20 and 30% of the data. The pipeline was further extended to model the genotype time series data. A change-point analysis allowed the determination of growth phases and the optimal timing where genotypic differences were the largest. The estimated genotypic values were used to cluster the genotypes during the optimal growth phase. Through a two-way analysis of variance (ANOVA), clusters were found to be consistently defined throughout the growth duration. Therefore, we could show, on a wide range of scenarios, that the pipeline facilitated efficient extraction of useful information from outdoor HTP platform data. High-quality plant growth time series data is also provided to support breeding decisions. The R code of the pipeline is available at https://github.com/ICRISAT-GEMS/SpaTemHTP.
format	Online Article Text
id	pubmed-7714717
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-77147172020-12-15 SpaTemHTP: A Data Analysis Pipeline for Efficient Processing and Utilization of Temporal High-Throughput Phenotyping Data Kar, Soumyashree Garin, Vincent Kholová, Jana Vadez, Vincent Durbha, Surya S. Tanaka, Ryokei Iwata, Hiroyoshi Urban, Milan O. Adinarayana, J. Front Plant Sci Plant Science The rapid development of phenotyping technologies over the last years gave the opportunity to study plant development over time. The treatment of the massive amount of data collected by high-throughput phenotyping (HTP) platforms is however an important challenge for the plant science community. An important issue is to accurately estimate, over time, the genotypic component of plant phenotype. In outdoor and field-based HTP platforms, phenotype measurements can be substantially affected by data-generation inaccuracies or failures, leading to erroneous or missing data. To solve that problem, we developed an analytical pipeline composed of three modules: detection of outliers, imputation of missing values, and mixed-model genotype adjusted means computation with spatial adjustment. The pipeline was tested on three different traits (3D leaf area, projected leaf area, and plant height), in two crops (chickpea, sorghum), measured during two seasons. Using real-data analyses and simulations, we showed that the sequential application of the three pipeline steps was particularly useful to estimate smooth genotype growth curves from raw data containing a large amount of noise, a situation that is potentially frequent in data generated on outdoor HTP platforms. The procedure we propose can handle up to 50% of missing values. It is also robust to data contamination rates between 20 and 30% of the data. The pipeline was further extended to model the genotype time series data. A change-point analysis allowed the determination of growth phases and the optimal timing where genotypic differences were the largest. The estimated genotypic values were used to cluster the genotypes during the optimal growth phase. Through a two-way analysis of variance (ANOVA), clusters were found to be consistently defined throughout the growth duration. Therefore, we could show, on a wide range of scenarios, that the pipeline facilitated efficient extraction of useful information from outdoor HTP platform data. High-quality plant growth time series data is also provided to support breeding decisions. The R code of the pipeline is available at https://github.com/ICRISAT-GEMS/SpaTemHTP. Frontiers Media S.A. 2020-11-20 /pmc/articles/PMC7714717/ /pubmed/33329623 http://dx.doi.org/10.3389/fpls.2020.552509 Text en Copyright © 2020 Kar, Garin, Kholová, Vadez, Durbha, Tanaka, Iwata, Urban and Adinarayana. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Plant Science Kar, Soumyashree Garin, Vincent Kholová, Jana Vadez, Vincent Durbha, Surya S. Tanaka, Ryokei Iwata, Hiroyoshi Urban, Milan O. Adinarayana, J. SpaTemHTP: A Data Analysis Pipeline for Efficient Processing and Utilization of Temporal High-Throughput Phenotyping Data
title	SpaTemHTP: A Data Analysis Pipeline for Efficient Processing and Utilization of Temporal High-Throughput Phenotyping Data
title_full	SpaTemHTP: A Data Analysis Pipeline for Efficient Processing and Utilization of Temporal High-Throughput Phenotyping Data
title_fullStr	SpaTemHTP: A Data Analysis Pipeline for Efficient Processing and Utilization of Temporal High-Throughput Phenotyping Data
title_full_unstemmed	SpaTemHTP: A Data Analysis Pipeline for Efficient Processing and Utilization of Temporal High-Throughput Phenotyping Data
title_short	SpaTemHTP: A Data Analysis Pipeline for Efficient Processing and Utilization of Temporal High-Throughput Phenotyping Data
title_sort	spatemhtp: a data analysis pipeline for efficient processing and utilization of temporal high-throughput phenotyping data
topic	Plant Science
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7714717/ https://www.ncbi.nlm.nih.gov/pubmed/33329623 http://dx.doi.org/10.3389/fpls.2020.552509
work_keys_str_mv	AT karsoumyashree spatemhtpadataanalysispipelineforefficientprocessingandutilizationoftemporalhighthroughputphenotypingdata AT garinvincent spatemhtpadataanalysispipelineforefficientprocessingandutilizationoftemporalhighthroughputphenotypingdata AT kholovajana spatemhtpadataanalysispipelineforefficientprocessingandutilizationoftemporalhighthroughputphenotypingdata AT vadezvincent spatemhtpadataanalysispipelineforefficientprocessingandutilizationoftemporalhighthroughputphenotypingdata AT durbhasuryas spatemhtpadataanalysispipelineforefficientprocessingandutilizationoftemporalhighthroughputphenotypingdata AT tanakaryokei spatemhtpadataanalysispipelineforefficientprocessingandutilizationoftemporalhighthroughputphenotypingdata AT iwatahiroyoshi spatemhtpadataanalysispipelineforefficientprocessingandutilizationoftemporalhighthroughputphenotypingdata AT urbanmilano spatemhtpadataanalysispipelineforefficientprocessingandutilizationoftemporalhighthroughputphenotypingdata AT adinarayanaj spatemhtpadataanalysispipelineforefficientprocessingandutilizationoftemporalhighthroughputphenotypingdata

SpaTemHTP: A Data Analysis Pipeline for Efficient Processing and Utilization of Temporal High-Throughput Phenotyping Data

Ejemplares similares