Cargando…

Moving the mountain: analysis of the effort required to transform comparative anatomy into computable anatomy

The diverse phenotypes of living organisms have been described for centuries, and though they may be digitized, they are not readily available in a computable form. Using over 100 morphological studies, the Phenoscape project has demonstrated that by annotating characters with community ontology ter...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dahdul, Wasila, Dececchi, T. Alexander, Ibrahim, Nizar, Lapp, Hilmar, Mabee, Paula
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2015
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4429748/ https://www.ncbi.nlm.nih.gov/pubmed/25972520 http://dx.doi.org/10.1093/database/bav040

_version_	1782371082346430464
author	Dahdul, Wasila Dececchi, T. Alexander Ibrahim, Nizar Lapp, Hilmar Mabee, Paula
author_facet	Dahdul, Wasila Dececchi, T. Alexander Ibrahim, Nizar Lapp, Hilmar Mabee, Paula
author_sort	Dahdul, Wasila
collection	PubMed
description	The diverse phenotypes of living organisms have been described for centuries, and though they may be digitized, they are not readily available in a computable form. Using over 100 morphological studies, the Phenoscape project has demonstrated that by annotating characters with community ontology terms, links between novel species anatomy and the genes that may underlie them can be made. But given the enormity of the legacy literature, how can this largely unexploited wealth of descriptive data be rendered amenable to large-scale computation? To identify the bottlenecks, we quantified the time involved in the major aspects of phenotype curation as we annotated characters from the vertebrate phylogenetic systematics literature. This involves attaching fully computable logical expressions consisting of ontology terms to the descriptions in character-by-taxon matrices. The workflow consists of: (i) data preparation, (ii) phenotype annotation, (iii) ontology development and (iv) curation team discussions and software development feedback. Our results showed that the completion of this work required two person-years by a team of two post-docs, a lead data curator, and students. Manual data preparation required close to 13% of the effort. This part in particular could be reduced substantially with better community data practices, such as depositing fully populated matrices in public repositories. Phenotype annotation required ∼40% of the effort. We are working to make this more efficient with Natural Language Processing tools. Ontology development (40%), however, remains a highly manual task requiring domain (anatomical) expertise and use of specialized software. The large overhead required for data preparation and ontology development contributed to a low annotation rate of approximately two characters per hour, compared with 14 characters per hour when activity was restricted to character annotation. Unlocking the potential of the vast stores of morphological descriptions requires better tools for efficiently processing natural language, and better community practices towards a born-digital morphology. Database URL: http://kb.phenoscape.org
format	Online Article Text
id	pubmed-4429748
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-44297482015-05-14 Moving the mountain: analysis of the effort required to transform comparative anatomy into computable anatomy Dahdul, Wasila Dececchi, T. Alexander Ibrahim, Nizar Lapp, Hilmar Mabee, Paula Database (Oxford) Original Article The diverse phenotypes of living organisms have been described for centuries, and though they may be digitized, they are not readily available in a computable form. Using over 100 morphological studies, the Phenoscape project has demonstrated that by annotating characters with community ontology terms, links between novel species anatomy and the genes that may underlie them can be made. But given the enormity of the legacy literature, how can this largely unexploited wealth of descriptive data be rendered amenable to large-scale computation? To identify the bottlenecks, we quantified the time involved in the major aspects of phenotype curation as we annotated characters from the vertebrate phylogenetic systematics literature. This involves attaching fully computable logical expressions consisting of ontology terms to the descriptions in character-by-taxon matrices. The workflow consists of: (i) data preparation, (ii) phenotype annotation, (iii) ontology development and (iv) curation team discussions and software development feedback. Our results showed that the completion of this work required two person-years by a team of two post-docs, a lead data curator, and students. Manual data preparation required close to 13% of the effort. This part in particular could be reduced substantially with better community data practices, such as depositing fully populated matrices in public repositories. Phenotype annotation required ∼40% of the effort. We are working to make this more efficient with Natural Language Processing tools. Ontology development (40%), however, remains a highly manual task requiring domain (anatomical) expertise and use of specialized software. The large overhead required for data preparation and ontology development contributed to a low annotation rate of approximately two characters per hour, compared with 14 characters per hour when activity was restricted to character annotation. Unlocking the potential of the vast stores of morphological descriptions requires better tools for efficiently processing natural language, and better community practices towards a born-digital morphology. Database URL: http://kb.phenoscape.org Oxford University Press 2015-05-13 /pmc/articles/PMC4429748/ /pubmed/25972520 http://dx.doi.org/10.1093/database/bav040 Text en © The Author(s) 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Dahdul, Wasila Dececchi, T. Alexander Ibrahim, Nizar Lapp, Hilmar Mabee, Paula Moving the mountain: analysis of the effort required to transform comparative anatomy into computable anatomy
title	Moving the mountain: analysis of the effort required to transform comparative anatomy into computable anatomy
title_full	Moving the mountain: analysis of the effort required to transform comparative anatomy into computable anatomy
title_fullStr	Moving the mountain: analysis of the effort required to transform comparative anatomy into computable anatomy
title_full_unstemmed	Moving the mountain: analysis of the effort required to transform comparative anatomy into computable anatomy
title_short	Moving the mountain: analysis of the effort required to transform comparative anatomy into computable anatomy
title_sort	moving the mountain: analysis of the effort required to transform comparative anatomy into computable anatomy
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4429748/ https://www.ncbi.nlm.nih.gov/pubmed/25972520 http://dx.doi.org/10.1093/database/bav040
work_keys_str_mv	AT dahdulwasila movingthemountainanalysisoftheeffortrequiredtotransformcomparativeanatomyintocomputableanatomy AT dececchitalexander movingthemountainanalysisoftheeffortrequiredtotransformcomparativeanatomyintocomputableanatomy AT ibrahimnizar movingthemountainanalysisoftheeffortrequiredtotransformcomparativeanatomyintocomputableanatomy AT lapphilmar movingthemountainanalysisoftheeffortrequiredtotransformcomparativeanatomyintocomputableanatomy AT mabeepaula movingthemountainanalysisoftheeffortrequiredtotransformcomparativeanatomyintocomputableanatomy

Moving the mountain: analysis of the effort required to transform comparative anatomy into computable anatomy

Ejemplares similares