Cargando…

MantaID: a machine learning–based tool to automate the identification of biological database IDs

The number of biological databases is growing rapidly, but different databases use different identifiers (IDs) to refer to the same biological entity. The inconsistency in IDs impedes the integration of various types of biological data. To resolve the problem, we developed MantaID, a data-driven, ma...

Descripción completa

Detalles Bibliográficos
Autores principales: Zeng, Zhengpeng, Hu, Jiamin, Cao, Miyuan, Li, Bingbing, Wang, Xiting, Yu, Feng, Mao, Longfei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168000/
https://www.ncbi.nlm.nih.gov/pubmed/37159241
http://dx.doi.org/10.1093/database/baad028
_version_ 1785038777312346112
author Zeng, Zhengpeng
Hu, Jiamin
Cao, Miyuan
Li, Bingbing
Wang, Xiting
Yu, Feng
Mao, Longfei
author_facet Zeng, Zhengpeng
Hu, Jiamin
Cao, Miyuan
Li, Bingbing
Wang, Xiting
Yu, Feng
Mao, Longfei
author_sort Zeng, Zhengpeng
collection PubMed
description The number of biological databases is growing rapidly, but different databases use different identifiers (IDs) to refer to the same biological entity. The inconsistency in IDs impedes the integration of various types of biological data. To resolve the problem, we developed MantaID, a data-driven, machine learning–based approach that automates identifying IDs on a large scale. The MantaID model’s prediction accuracy was proven to be 99%, and it correctly and effectively predicted 100,000 ID entries within 2 min. MantaID supports the discovery and exploitation of ID from large quantities of databases (e.g. up to 542 biological databases). An easy-to-use freely available open-source software R package, a user-friendly web application and application programming interfaces were also developed for MantaID to improve applicability. To our knowledge, MantaID is the first tool that enables an automatic, quick, accurate and comprehensive identification of large quantities of IDs and can therefore be used as a starting point to facilitate the complex assimilation and aggregation of biological data across diverse databases.
format Online
Article
Text
id pubmed-10168000
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-101680002023-05-10 MantaID: a machine learning–based tool to automate the identification of biological database IDs Zeng, Zhengpeng Hu, Jiamin Cao, Miyuan Li, Bingbing Wang, Xiting Yu, Feng Mao, Longfei Database (Oxford) Database Tool The number of biological databases is growing rapidly, but different databases use different identifiers (IDs) to refer to the same biological entity. The inconsistency in IDs impedes the integration of various types of biological data. To resolve the problem, we developed MantaID, a data-driven, machine learning–based approach that automates identifying IDs on a large scale. The MantaID model’s prediction accuracy was proven to be 99%, and it correctly and effectively predicted 100,000 ID entries within 2 min. MantaID supports the discovery and exploitation of ID from large quantities of databases (e.g. up to 542 biological databases). An easy-to-use freely available open-source software R package, a user-friendly web application and application programming interfaces were also developed for MantaID to improve applicability. To our knowledge, MantaID is the first tool that enables an automatic, quick, accurate and comprehensive identification of large quantities of IDs and can therefore be used as a starting point to facilitate the complex assimilation and aggregation of biological data across diverse databases. Oxford University Press 2023-05-09 /pmc/articles/PMC10168000/ /pubmed/37159241 http://dx.doi.org/10.1093/database/baad028 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Database Tool
Zeng, Zhengpeng
Hu, Jiamin
Cao, Miyuan
Li, Bingbing
Wang, Xiting
Yu, Feng
Mao, Longfei
MantaID: a machine learning–based tool to automate the identification of biological database IDs
title MantaID: a machine learning–based tool to automate the identification of biological database IDs
title_full MantaID: a machine learning–based tool to automate the identification of biological database IDs
title_fullStr MantaID: a machine learning–based tool to automate the identification of biological database IDs
title_full_unstemmed MantaID: a machine learning–based tool to automate the identification of biological database IDs
title_short MantaID: a machine learning–based tool to automate the identification of biological database IDs
title_sort mantaid: a machine learning–based tool to automate the identification of biological database ids
topic Database Tool
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168000/
https://www.ncbi.nlm.nih.gov/pubmed/37159241
http://dx.doi.org/10.1093/database/baad028
work_keys_str_mv AT zengzhengpeng mantaidamachinelearningbasedtooltoautomatetheidentificationofbiologicaldatabaseids
AT hujiamin mantaidamachinelearningbasedtooltoautomatetheidentificationofbiologicaldatabaseids
AT caomiyuan mantaidamachinelearningbasedtooltoautomatetheidentificationofbiologicaldatabaseids
AT libingbing mantaidamachinelearningbasedtooltoautomatetheidentificationofbiologicaldatabaseids
AT wangxiting mantaidamachinelearningbasedtooltoautomatetheidentificationofbiologicaldatabaseids
AT yufeng mantaidamachinelearningbasedtooltoautomatetheidentificationofbiologicaldatabaseids
AT maolongfei mantaidamachinelearningbasedtooltoautomatetheidentificationofbiologicaldatabaseids