Cargando…
MantaID: a machine learning–based tool to automate the identification of biological database IDs
The number of biological databases is growing rapidly, but different databases use different identifiers (IDs) to refer to the same biological entity. The inconsistency in IDs impedes the integration of various types of biological data. To resolve the problem, we developed MantaID, a data-driven, ma...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168000/ https://www.ncbi.nlm.nih.gov/pubmed/37159241 http://dx.doi.org/10.1093/database/baad028 |
_version_ | 1785038777312346112 |
---|---|
author | Zeng, Zhengpeng Hu, Jiamin Cao, Miyuan Li, Bingbing Wang, Xiting Yu, Feng Mao, Longfei |
author_facet | Zeng, Zhengpeng Hu, Jiamin Cao, Miyuan Li, Bingbing Wang, Xiting Yu, Feng Mao, Longfei |
author_sort | Zeng, Zhengpeng |
collection | PubMed |
description | The number of biological databases is growing rapidly, but different databases use different identifiers (IDs) to refer to the same biological entity. The inconsistency in IDs impedes the integration of various types of biological data. To resolve the problem, we developed MantaID, a data-driven, machine learning–based approach that automates identifying IDs on a large scale. The MantaID model’s prediction accuracy was proven to be 99%, and it correctly and effectively predicted 100,000 ID entries within 2 min. MantaID supports the discovery and exploitation of ID from large quantities of databases (e.g. up to 542 biological databases). An easy-to-use freely available open-source software R package, a user-friendly web application and application programming interfaces were also developed for MantaID to improve applicability. To our knowledge, MantaID is the first tool that enables an automatic, quick, accurate and comprehensive identification of large quantities of IDs and can therefore be used as a starting point to facilitate the complex assimilation and aggregation of biological data across diverse databases. |
format | Online Article Text |
id | pubmed-10168000 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-101680002023-05-10 MantaID: a machine learning–based tool to automate the identification of biological database IDs Zeng, Zhengpeng Hu, Jiamin Cao, Miyuan Li, Bingbing Wang, Xiting Yu, Feng Mao, Longfei Database (Oxford) Database Tool The number of biological databases is growing rapidly, but different databases use different identifiers (IDs) to refer to the same biological entity. The inconsistency in IDs impedes the integration of various types of biological data. To resolve the problem, we developed MantaID, a data-driven, machine learning–based approach that automates identifying IDs on a large scale. The MantaID model’s prediction accuracy was proven to be 99%, and it correctly and effectively predicted 100,000 ID entries within 2 min. MantaID supports the discovery and exploitation of ID from large quantities of databases (e.g. up to 542 biological databases). An easy-to-use freely available open-source software R package, a user-friendly web application and application programming interfaces were also developed for MantaID to improve applicability. To our knowledge, MantaID is the first tool that enables an automatic, quick, accurate and comprehensive identification of large quantities of IDs and can therefore be used as a starting point to facilitate the complex assimilation and aggregation of biological data across diverse databases. Oxford University Press 2023-05-09 /pmc/articles/PMC10168000/ /pubmed/37159241 http://dx.doi.org/10.1093/database/baad028 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Database Tool Zeng, Zhengpeng Hu, Jiamin Cao, Miyuan Li, Bingbing Wang, Xiting Yu, Feng Mao, Longfei MantaID: a machine learning–based tool to automate the identification of biological database IDs |
title | MantaID: a machine learning–based tool to automate the identification of biological database IDs |
title_full | MantaID: a machine learning–based tool to automate the identification of biological database IDs |
title_fullStr | MantaID: a machine learning–based tool to automate the identification of biological database IDs |
title_full_unstemmed | MantaID: a machine learning–based tool to automate the identification of biological database IDs |
title_short | MantaID: a machine learning–based tool to automate the identification of biological database IDs |
title_sort | mantaid: a machine learning–based tool to automate the identification of biological database ids |
topic | Database Tool |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168000/ https://www.ncbi.nlm.nih.gov/pubmed/37159241 http://dx.doi.org/10.1093/database/baad028 |
work_keys_str_mv | AT zengzhengpeng mantaidamachinelearningbasedtooltoautomatetheidentificationofbiologicaldatabaseids AT hujiamin mantaidamachinelearningbasedtooltoautomatetheidentificationofbiologicaldatabaseids AT caomiyuan mantaidamachinelearningbasedtooltoautomatetheidentificationofbiologicaldatabaseids AT libingbing mantaidamachinelearningbasedtooltoautomatetheidentificationofbiologicaldatabaseids AT wangxiting mantaidamachinelearningbasedtooltoautomatetheidentificationofbiologicaldatabaseids AT yufeng mantaidamachinelearningbasedtooltoautomatetheidentificationofbiologicaldatabaseids AT maolongfei mantaidamachinelearningbasedtooltoautomatetheidentificationofbiologicaldatabaseids |