Cargando…

Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation

Bioinformatics and computer aided drug design rely on the curation of a large number of protocols for biological assays that measure the ability of potential drugs to achieve a therapeutic effect. These assay protocols are generally published by scientists in the form of plain text, which needs to b...

Descripción completa

Detalles Bibliográficos
Autores principales:	Clark, Alex M., Bunin, Barry A., Litterman, Nadia K., Schürer, Stephan C., Visser, Ubbo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2014
Materias:	Bioinformatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4137659/ https://www.ncbi.nlm.nih.gov/pubmed/25165633 http://dx.doi.org/10.7717/peerj.524

_version_	1782331138813984768
author	Clark, Alex M. Bunin, Barry A. Litterman, Nadia K. Schürer, Stephan C. Visser, Ubbo
author_facet	Clark, Alex M. Bunin, Barry A. Litterman, Nadia K. Schürer, Stephan C. Visser, Ubbo
author_sort	Clark, Alex M.
collection	PubMed
description	Bioinformatics and computer aided drug design rely on the curation of a large number of protocols for biological assays that measure the ability of potential drugs to achieve a therapeutic effect. These assay protocols are generally published by scientists in the form of plain text, which needs to be more precisely annotated in order to be useful to software methods. We have developed a pragmatic approach to describing assays according to the semantic definitions of the BioAssay Ontology (BAO) project, using a hybrid of machine learning based on natural language processing, and a simplified user interface designed to help scientists curate their data with minimum effort. We have carried out this work based on the premise that pure machine learning is insufficiently accurate, and that expecting scientists to find the time to annotate their protocols manually is unrealistic. By combining these approaches, we have created an effective prototype for which annotation of bioassay text within the domain of the training set can be accomplished very quickly. Well-trained annotations require single-click user approval, while annotations from outside the training set domain can be identified using the search feature of a well-designed user interface, and subsequently used to improve the underlying models. By drastically reducing the time required for scientists to annotate their assays, we can realistically advocate for semantic annotation to become a standard part of the publication process. Once even a small proportion of the public body of bioassay data is marked up, bioinformatics researchers can begin to construct sophisticated and useful searching and analysis algorithms that will provide a diverse and powerful set of tools for drug discovery researchers.
format	Online Article Text
id	pubmed-4137659
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-41376592014-08-27 Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation Clark, Alex M. Bunin, Barry A. Litterman, Nadia K. Schürer, Stephan C. Visser, Ubbo PeerJ Bioinformatics Bioinformatics and computer aided drug design rely on the curation of a large number of protocols for biological assays that measure the ability of potential drugs to achieve a therapeutic effect. These assay protocols are generally published by scientists in the form of plain text, which needs to be more precisely annotated in order to be useful to software methods. We have developed a pragmatic approach to describing assays according to the semantic definitions of the BioAssay Ontology (BAO) project, using a hybrid of machine learning based on natural language processing, and a simplified user interface designed to help scientists curate their data with minimum effort. We have carried out this work based on the premise that pure machine learning is insufficiently accurate, and that expecting scientists to find the time to annotate their protocols manually is unrealistic. By combining these approaches, we have created an effective prototype for which annotation of bioassay text within the domain of the training set can be accomplished very quickly. Well-trained annotations require single-click user approval, while annotations from outside the training set domain can be identified using the search feature of a well-designed user interface, and subsequently used to improve the underlying models. By drastically reducing the time required for scientists to annotate their assays, we can realistically advocate for semantic annotation to become a standard part of the publication process. Once even a small proportion of the public body of bioassay data is marked up, bioinformatics researchers can begin to construct sophisticated and useful searching and analysis algorithms that will provide a diverse and powerful set of tools for drug discovery researchers. PeerJ Inc. 2014-08-14 /pmc/articles/PMC4137659/ /pubmed/25165633 http://dx.doi.org/10.7717/peerj.524 Text en © 2014 Clark et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle	Bioinformatics Clark, Alex M. Bunin, Barry A. Litterman, Nadia K. Schürer, Stephan C. Visser, Ubbo Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation
title	Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation
title_full	Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation
title_fullStr	Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation
title_full_unstemmed	Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation
title_short	Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation
title_sort	fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation
topic	Bioinformatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4137659/ https://www.ncbi.nlm.nih.gov/pubmed/25165633 http://dx.doi.org/10.7717/peerj.524
work_keys_str_mv	AT clarkalexm fastandaccuratesemanticannotationofbioassaysexploitingahybridofmachinelearninganduserconfirmation AT buninbarrya fastandaccuratesemanticannotationofbioassaysexploitingahybridofmachinelearninganduserconfirmation AT littermannadiak fastandaccuratesemanticannotationofbioassaysexploitingahybridofmachinelearninganduserconfirmation AT schurerstephanc fastandaccuratesemanticannotationofbioassaysexploitingahybridofmachinelearninganduserconfirmation AT visserubbo fastandaccuratesemanticannotationofbioassaysexploitingahybridofmachinelearninganduserconfirmation

Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation

Ejemplares similares