Cargando…

A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework

The breadth of information resources available to researchers on the Internet continues to expand, particularly in light of recently implemented data-sharing policies required by funding agencies. However, the nature of dense, multifaceted neuroscience data and the design of contemporary search engi...

Descripción completa

Detalles Bibliográficos
Autores principales: Bandrowski, A. E., Cachat, J., Li, Y., Müller, H. M., Sternberg, P. W., Ciccarese, P., Clark, T., Marenco, L., Wang, R., Astakhov, V., Grethe, J. S., Martone, M. E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308161/
https://www.ncbi.nlm.nih.gov/pubmed/22434839
http://dx.doi.org/10.1093/database/bas005
_version_ 1782227405279068160
author Bandrowski, A. E.
Cachat, J.
Li, Y.
Müller, H. M.
Sternberg, P. W.
Ciccarese, P.
Clark, T.
Marenco, L.
Wang, R.
Astakhov, V.
Grethe, J. S.
Martone, M. E.
author_facet Bandrowski, A. E.
Cachat, J.
Li, Y.
Müller, H. M.
Sternberg, P. W.
Ciccarese, P.
Clark, T.
Marenco, L.
Wang, R.
Astakhov, V.
Grethe, J. S.
Martone, M. E.
author_sort Bandrowski, A. E.
collection PubMed
description The breadth of information resources available to researchers on the Internet continues to expand, particularly in light of recently implemented data-sharing policies required by funding agencies. However, the nature of dense, multifaceted neuroscience data and the design of contemporary search engine systems makes efficient, reliable and relevant discovery of such information a significant challenge. This challenge is specifically pertinent for online databases, whose dynamic content is ‘hidden’ from search engines. The Neuroscience Information Framework (NIF; http://www.neuinfo.org) was funded by the NIH Blueprint for Neuroscience Research to address the problem of finding and utilizing neuroscience-relevant resources such as software tools, data sets, experimental animals and antibodies across the Internet. From the outset, NIF sought to provide an accounting of available resources, whereas developing technical solutions to finding, accessing and utilizing them. The curators therefore, are tasked with identifying and registering resources, examining data, writing configuration files to index and display data and keeping the contents current. In the initial phases of the project, all aspects of the registration and curation processes were manual. However, as the number of resources grew, manual curation became impractical. This report describes our experiences and successes with developing automated resource discovery and semiautomated type characterization with text-mining scripts that facilitate curation team efforts to discover, integrate and display new content. We also describe the DISCO framework, a suite of automated web services that significantly reduce manual curation efforts to periodically check for resource updates. Lastly, we discuss DOMEO, a semi-automated annotation tool that improves the discovery and curation of resources that are not necessarily website-based (i.e. reagents, software tools). Although the ultimate goal of automation was to reduce the workload of the curators, it has resulted in valuable analytic by-products that address accessibility, use and citation of resources that can now be shared with resource owners and the larger scientific community. Database URL: http://neuinfo.org
format Online
Article
Text
id pubmed-3308161
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-33081612012-03-20 A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework Bandrowski, A. E. Cachat, J. Li, Y. Müller, H. M. Sternberg, P. W. Ciccarese, P. Clark, T. Marenco, L. Wang, R. Astakhov, V. Grethe, J. S. Martone, M. E. Database (Oxford) Original Articles The breadth of information resources available to researchers on the Internet continues to expand, particularly in light of recently implemented data-sharing policies required by funding agencies. However, the nature of dense, multifaceted neuroscience data and the design of contemporary search engine systems makes efficient, reliable and relevant discovery of such information a significant challenge. This challenge is specifically pertinent for online databases, whose dynamic content is ‘hidden’ from search engines. The Neuroscience Information Framework (NIF; http://www.neuinfo.org) was funded by the NIH Blueprint for Neuroscience Research to address the problem of finding and utilizing neuroscience-relevant resources such as software tools, data sets, experimental animals and antibodies across the Internet. From the outset, NIF sought to provide an accounting of available resources, whereas developing technical solutions to finding, accessing and utilizing them. The curators therefore, are tasked with identifying and registering resources, examining data, writing configuration files to index and display data and keeping the contents current. In the initial phases of the project, all aspects of the registration and curation processes were manual. However, as the number of resources grew, manual curation became impractical. This report describes our experiences and successes with developing automated resource discovery and semiautomated type characterization with text-mining scripts that facilitate curation team efforts to discover, integrate and display new content. We also describe the DISCO framework, a suite of automated web services that significantly reduce manual curation efforts to periodically check for resource updates. Lastly, we discuss DOMEO, a semi-automated annotation tool that improves the discovery and curation of resources that are not necessarily website-based (i.e. reagents, software tools). Although the ultimate goal of automation was to reduce the workload of the curators, it has resulted in valuable analytic by-products that address accessibility, use and citation of resources that can now be shared with resource owners and the larger scientific community. Database URL: http://neuinfo.org Oxford University Press 2012-02-13 /pmc/articles/PMC3308161/ /pubmed/22434839 http://dx.doi.org/10.1093/database/bas005 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Articles
Bandrowski, A. E.
Cachat, J.
Li, Y.
Müller, H. M.
Sternberg, P. W.
Ciccarese, P.
Clark, T.
Marenco, L.
Wang, R.
Astakhov, V.
Grethe, J. S.
Martone, M. E.
A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework
title A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework
title_full A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework
title_fullStr A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework
title_full_unstemmed A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework
title_short A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework
title_sort hybrid human and machine resource curation pipeline for the neuroscience information framework
topic Original Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308161/
https://www.ncbi.nlm.nih.gov/pubmed/22434839
http://dx.doi.org/10.1093/database/bas005
work_keys_str_mv AT bandrowskiae ahybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT cachatj ahybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT liy ahybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT mullerhm ahybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT sternbergpw ahybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT ciccaresep ahybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT clarkt ahybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT marencol ahybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT wangr ahybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT astakhovv ahybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT grethejs ahybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT martoneme ahybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT bandrowskiae hybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT cachatj hybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT liy hybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT mullerhm hybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT sternbergpw hybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT ciccaresep hybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT clarkt hybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT marencol hybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT wangr hybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT astakhovv hybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT grethejs hybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework
AT martoneme hybridhumanandmachineresourcecurationpipelinefortheneuroscienceinformationframework