Cargando…

PhytoOracle: Scalable, modular phenomics data processing pipelines

As phenomics data volume and dimensionality increase due to advancements in sensor technology, there is an urgent need to develop and implement scalable data processing pipelines. Current phenomics data processing pipelines lack modularity, extensibility, and processing distribution across sensor mo...

Descripción completa

Detalles Bibliográficos
Autores principales: Gonzalez, Emmanuel M., Zarei, Ariyan, Hendler, Nathanial, Simmons, Travis, Zarei, Arman, Demieville, Jeffrey, Strand, Robert, Rozzi, Bruno, Calleja, Sebastian, Ellingson, Holly, Cosi, Michele, Davey, Sean, Lavelle, Dean O., Truco, Maria José, Swetnam, Tyson L., Merchant, Nirav, Michelmore, Richard W., Lyons, Eric, Pauli, Duke
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10025408/
https://www.ncbi.nlm.nih.gov/pubmed/36950362
http://dx.doi.org/10.3389/fpls.2023.1112973
_version_ 1784909324955418624
author Gonzalez, Emmanuel M.
Zarei, Ariyan
Hendler, Nathanial
Simmons, Travis
Zarei, Arman
Demieville, Jeffrey
Strand, Robert
Rozzi, Bruno
Calleja, Sebastian
Ellingson, Holly
Cosi, Michele
Davey, Sean
Lavelle, Dean O.
Truco, Maria José
Swetnam, Tyson L.
Merchant, Nirav
Michelmore, Richard W.
Lyons, Eric
Pauli, Duke
author_facet Gonzalez, Emmanuel M.
Zarei, Ariyan
Hendler, Nathanial
Simmons, Travis
Zarei, Arman
Demieville, Jeffrey
Strand, Robert
Rozzi, Bruno
Calleja, Sebastian
Ellingson, Holly
Cosi, Michele
Davey, Sean
Lavelle, Dean O.
Truco, Maria José
Swetnam, Tyson L.
Merchant, Nirav
Michelmore, Richard W.
Lyons, Eric
Pauli, Duke
author_sort Gonzalez, Emmanuel M.
collection PubMed
description As phenomics data volume and dimensionality increase due to advancements in sensor technology, there is an urgent need to develop and implement scalable data processing pipelines. Current phenomics data processing pipelines lack modularity, extensibility, and processing distribution across sensor modalities and phenotyping platforms. To address these challenges, we developed PhytoOracle (PO), a suite of modular, scalable pipelines for processing large volumes of field phenomics RGB, thermal, PSII chlorophyll fluorescence 2D images, and 3D point clouds. PhytoOracle aims to (i) improve data processing efficiency; (ii) provide an extensible, reproducible computing framework; and (iii) enable data fusion of multi-modal phenomics data. PhytoOracle integrates open-source distributed computing frameworks for parallel processing on high-performance computing, cloud, and local computing environments. Each pipeline component is available as a standalone container, providing transferability, extensibility, and reproducibility. The PO pipeline extracts and associates individual plant traits across sensor modalities and collection time points, representing a unique multi-system approach to addressing the genotype-phenotype gap. To date, PO supports lettuce and sorghum phenotypic trait extraction, with a goal of widening the range of supported species in the future. At the maximum number of cores tested in this study (1,024 cores), PO processing times were: 235 minutes for 9,270 RGB images (140.7 GB), 235 minutes for 9,270 thermal images (5.4 GB), and 13 minutes for 39,678 PSII images (86.2 GB). These processing times represent end-to-end processing, from raw data to fully processed numerical phenotypic trait data. Repeatability values of 0.39-0.95 (bounding area), 0.81-0.95 (axis-aligned bounding volume), 0.79-0.94 (oriented bounding volume), 0.83-0.95 (plant height), and 0.81-0.95 (number of points) were observed in Field Scanalyzer data. We also show the ability of PO to process drone data with a repeatability of 0.55-0.95 (bounding area).
format Online
Article
Text
id pubmed-10025408
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-100254082023-03-21 PhytoOracle: Scalable, modular phenomics data processing pipelines Gonzalez, Emmanuel M. Zarei, Ariyan Hendler, Nathanial Simmons, Travis Zarei, Arman Demieville, Jeffrey Strand, Robert Rozzi, Bruno Calleja, Sebastian Ellingson, Holly Cosi, Michele Davey, Sean Lavelle, Dean O. Truco, Maria José Swetnam, Tyson L. Merchant, Nirav Michelmore, Richard W. Lyons, Eric Pauli, Duke Front Plant Sci Plant Science As phenomics data volume and dimensionality increase due to advancements in sensor technology, there is an urgent need to develop and implement scalable data processing pipelines. Current phenomics data processing pipelines lack modularity, extensibility, and processing distribution across sensor modalities and phenotyping platforms. To address these challenges, we developed PhytoOracle (PO), a suite of modular, scalable pipelines for processing large volumes of field phenomics RGB, thermal, PSII chlorophyll fluorescence 2D images, and 3D point clouds. PhytoOracle aims to (i) improve data processing efficiency; (ii) provide an extensible, reproducible computing framework; and (iii) enable data fusion of multi-modal phenomics data. PhytoOracle integrates open-source distributed computing frameworks for parallel processing on high-performance computing, cloud, and local computing environments. Each pipeline component is available as a standalone container, providing transferability, extensibility, and reproducibility. The PO pipeline extracts and associates individual plant traits across sensor modalities and collection time points, representing a unique multi-system approach to addressing the genotype-phenotype gap. To date, PO supports lettuce and sorghum phenotypic trait extraction, with a goal of widening the range of supported species in the future. At the maximum number of cores tested in this study (1,024 cores), PO processing times were: 235 minutes for 9,270 RGB images (140.7 GB), 235 minutes for 9,270 thermal images (5.4 GB), and 13 minutes for 39,678 PSII images (86.2 GB). These processing times represent end-to-end processing, from raw data to fully processed numerical phenotypic trait data. Repeatability values of 0.39-0.95 (bounding area), 0.81-0.95 (axis-aligned bounding volume), 0.79-0.94 (oriented bounding volume), 0.83-0.95 (plant height), and 0.81-0.95 (number of points) were observed in Field Scanalyzer data. We also show the ability of PO to process drone data with a repeatability of 0.55-0.95 (bounding area). Frontiers Media S.A. 2023-03-06 /pmc/articles/PMC10025408/ /pubmed/36950362 http://dx.doi.org/10.3389/fpls.2023.1112973 Text en Copyright © 2023 Gonzalez, Zarei, Hendler, Simmons, Zarei, Demieville, Strand, Rozzi, Calleja, Ellingson, Cosi, Davey, Lavelle, Truco, Swetnam, Merchant, Michelmore, Lyons and Pauli https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Plant Science
Gonzalez, Emmanuel M.
Zarei, Ariyan
Hendler, Nathanial
Simmons, Travis
Zarei, Arman
Demieville, Jeffrey
Strand, Robert
Rozzi, Bruno
Calleja, Sebastian
Ellingson, Holly
Cosi, Michele
Davey, Sean
Lavelle, Dean O.
Truco, Maria José
Swetnam, Tyson L.
Merchant, Nirav
Michelmore, Richard W.
Lyons, Eric
Pauli, Duke
PhytoOracle: Scalable, modular phenomics data processing pipelines
title PhytoOracle: Scalable, modular phenomics data processing pipelines
title_full PhytoOracle: Scalable, modular phenomics data processing pipelines
title_fullStr PhytoOracle: Scalable, modular phenomics data processing pipelines
title_full_unstemmed PhytoOracle: Scalable, modular phenomics data processing pipelines
title_short PhytoOracle: Scalable, modular phenomics data processing pipelines
title_sort phytooracle: scalable, modular phenomics data processing pipelines
topic Plant Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10025408/
https://www.ncbi.nlm.nih.gov/pubmed/36950362
http://dx.doi.org/10.3389/fpls.2023.1112973
work_keys_str_mv AT gonzalezemmanuelm phytooraclescalablemodularphenomicsdataprocessingpipelines
AT zareiariyan phytooraclescalablemodularphenomicsdataprocessingpipelines
AT hendlernathanial phytooraclescalablemodularphenomicsdataprocessingpipelines
AT simmonstravis phytooraclescalablemodularphenomicsdataprocessingpipelines
AT zareiarman phytooraclescalablemodularphenomicsdataprocessingpipelines
AT demievillejeffrey phytooraclescalablemodularphenomicsdataprocessingpipelines
AT strandrobert phytooraclescalablemodularphenomicsdataprocessingpipelines
AT rozzibruno phytooraclescalablemodularphenomicsdataprocessingpipelines
AT callejasebastian phytooraclescalablemodularphenomicsdataprocessingpipelines
AT ellingsonholly phytooraclescalablemodularphenomicsdataprocessingpipelines
AT cosimichele phytooraclescalablemodularphenomicsdataprocessingpipelines
AT daveysean phytooraclescalablemodularphenomicsdataprocessingpipelines
AT lavelledeano phytooraclescalablemodularphenomicsdataprocessingpipelines
AT trucomariajose phytooraclescalablemodularphenomicsdataprocessingpipelines
AT swetnamtysonl phytooraclescalablemodularphenomicsdataprocessingpipelines
AT merchantnirav phytooraclescalablemodularphenomicsdataprocessingpipelines
AT michelmorerichardw phytooraclescalablemodularphenomicsdataprocessingpipelines
AT lyonseric phytooraclescalablemodularphenomicsdataprocessingpipelines
AT pauliduke phytooraclescalablemodularphenomicsdataprocessingpipelines