Cargando…

Computing on Phenotypic Descriptions for Candidate Gene Discovery and Crop Improvement

Many newly observed phenotypes are first described, then experimentally manipulated. These language-based descriptions appear in both the literature and in community datastores. To standardize phenotypic descriptions and enable simple data aggregation and analysis, controlled vocabularies and specif...

Descripción completa

Detalles Bibliográficos
Autores principales:	Braun, Ian R., Yanarella, Colleen F., Lawrence-Dill, Carolyn J.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	AAAS 2020
Materias:	Perspective
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7706311/ https://www.ncbi.nlm.nih.gov/pubmed/33313544 http://dx.doi.org/10.34133/2020/1963251

_version_	1783617128235008000
author	Braun, Ian R. Yanarella, Colleen F. Lawrence-Dill, Carolyn J.
author_facet	Braun, Ian R. Yanarella, Colleen F. Lawrence-Dill, Carolyn J.
author_sort	Braun, Ian R.
collection	PubMed
description	Many newly observed phenotypes are first described, then experimentally manipulated. These language-based descriptions appear in both the literature and in community datastores. To standardize phenotypic descriptions and enable simple data aggregation and analysis, controlled vocabularies and specific data architectures have been developed. Such simplified descriptions have several advantages over natural language: they can be rigorously defined for a particular context or problem, they can be assigned and interpreted programmatically, and they can be organized in a way that allows for semantic reasoning (inference of implicit facts). Because researchers generally report phenotypes in the literature using natural language, curators have been translating phenotypic descriptions into controlled vocabularies for decades to make the information computable. Unfortunately, this methodology is highly dependent on human curation, which does not scale to the scope of all publications available across all of plant biology. Simultaneously, researchers in other domains have been working to enable computation on natural language. This has resulted in new, automated methods for computing on language that are now available, with early analyses showing great promise. Natural language processing (NLP) coupled with machine learning (ML) allows for the use of unstructured language for direct analysis of phenotypic descriptions. Indeed, we have found that these automated methods can be used to create data structures that perform as well or better than those generated by human curators on tasks such as predicting gene function and biochemical pathway membership. Here, we describe current and ongoing efforts to provide tools for the plant phenomics community to explore novel predictions that can be generated using these techniques. We also describe how these methods could be used along with mobile speech-to-text tools to collect and analyze in-field spoken phenotypic descriptions for association genetics and breeding applications.
format	Online Article Text
id	pubmed-7706311
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	AAAS
record_format	MEDLINE/PubMed
spelling	pubmed-77063112020-12-10 Computing on Phenotypic Descriptions for Candidate Gene Discovery and Crop Improvement Braun, Ian R. Yanarella, Colleen F. Lawrence-Dill, Carolyn J. Plant Phenomics Perspective Many newly observed phenotypes are first described, then experimentally manipulated. These language-based descriptions appear in both the literature and in community datastores. To standardize phenotypic descriptions and enable simple data aggregation and analysis, controlled vocabularies and specific data architectures have been developed. Such simplified descriptions have several advantages over natural language: they can be rigorously defined for a particular context or problem, they can be assigned and interpreted programmatically, and they can be organized in a way that allows for semantic reasoning (inference of implicit facts). Because researchers generally report phenotypes in the literature using natural language, curators have been translating phenotypic descriptions into controlled vocabularies for decades to make the information computable. Unfortunately, this methodology is highly dependent on human curation, which does not scale to the scope of all publications available across all of plant biology. Simultaneously, researchers in other domains have been working to enable computation on natural language. This has resulted in new, automated methods for computing on language that are now available, with early analyses showing great promise. Natural language processing (NLP) coupled with machine learning (ML) allows for the use of unstructured language for direct analysis of phenotypic descriptions. Indeed, we have found that these automated methods can be used to create data structures that perform as well or better than those generated by human curators on tasks such as predicting gene function and biochemical pathway membership. Here, we describe current and ongoing efforts to provide tools for the plant phenomics community to explore novel predictions that can be generated using these techniques. We also describe how these methods could be used along with mobile speech-to-text tools to collect and analyze in-field spoken phenotypic descriptions for association genetics and breeding applications. AAAS 2020-05-20 /pmc/articles/PMC7706311/ /pubmed/33313544 http://dx.doi.org/10.34133/2020/1963251 Text en Copyright © 2020 Ian R. Braun et al. http://creativecommons.org/licenses/by/4.0/ Exclusive Licensee Nanjing Agricultural University. Distributed under a Creative Commons Attribution License (CC BY 4.0).
spellingShingle	Perspective Braun, Ian R. Yanarella, Colleen F. Lawrence-Dill, Carolyn J. Computing on Phenotypic Descriptions for Candidate Gene Discovery and Crop Improvement
title	Computing on Phenotypic Descriptions for Candidate Gene Discovery and Crop Improvement
title_full	Computing on Phenotypic Descriptions for Candidate Gene Discovery and Crop Improvement
title_fullStr	Computing on Phenotypic Descriptions for Candidate Gene Discovery and Crop Improvement
title_full_unstemmed	Computing on Phenotypic Descriptions for Candidate Gene Discovery and Crop Improvement
title_short	Computing on Phenotypic Descriptions for Candidate Gene Discovery and Crop Improvement
title_sort	computing on phenotypic descriptions for candidate gene discovery and crop improvement
topic	Perspective
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7706311/ https://www.ncbi.nlm.nih.gov/pubmed/33313544 http://dx.doi.org/10.34133/2020/1963251
work_keys_str_mv	AT braunianr computingonphenotypicdescriptionsforcandidategenediscoveryandcropimprovement AT yanarellacolleenf computingonphenotypicdescriptionsforcandidategenediscoveryandcropimprovement AT lawrencedillcarolynj computingonphenotypicdescriptionsforcandidategenediscoveryandcropimprovement

Computing on Phenotypic Descriptions for Candidate Gene Discovery and Crop Improvement

Ejemplares similares