Cargando…

BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models

In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of ‘book of life’. We also extend the BLMs into a system called BioSeq-BLM f...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Hong-Liang, Pang, Yi-He, Liu, Bin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8682797/
https://www.ncbi.nlm.nih.gov/pubmed/34581805
http://dx.doi.org/10.1093/nar/gkab829
_version_ 1784617299510034432
author Li, Hong-Liang
Pang, Yi-He
Liu, Bin
author_facet Li, Hong-Liang
Pang, Yi-He
Liu, Bin
author_sort Li, Hong-Liang
collection PubMed
description In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of ‘book of life’. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing technologies, and contribute to the development of this very important field. In order to help the readers to use BioSeq-BLM for their own experiments, the corresponding web server and stand-alone package are established and released, which can be freely accessed at http://bliulab.net/BioSeq-BLM/.
format Online
Article
Text
id pubmed-8682797
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-86827972021-12-20 BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models Li, Hong-Liang Pang, Yi-He Liu, Bin Nucleic Acids Res Methods Online In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of ‘book of life’. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing technologies, and contribute to the development of this very important field. In order to help the readers to use BioSeq-BLM for their own experiments, the corresponding web server and stand-alone package are established and released, which can be freely accessed at http://bliulab.net/BioSeq-BLM/. Oxford University Press 2021-09-28 /pmc/articles/PMC8682797/ /pubmed/34581805 http://dx.doi.org/10.1093/nar/gkab829 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Li, Hong-Liang
Pang, Yi-He
Liu, Bin
BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models
title BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models
title_full BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models
title_fullStr BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models
title_full_unstemmed BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models
title_short BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models
title_sort bioseq-blm: a platform for analyzing dna, rna and protein sequences based on biological language models
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8682797/
https://www.ncbi.nlm.nih.gov/pubmed/34581805
http://dx.doi.org/10.1093/nar/gkab829
work_keys_str_mv AT lihongliang bioseqblmaplatformforanalyzingdnarnaandproteinsequencesbasedonbiologicallanguagemodels
AT pangyihe bioseqblmaplatformforanalyzingdnarnaandproteinsequencesbasedonbiologicallanguagemodels
AT liubin bioseqblmaplatformforanalyzingdnarnaandproteinsequencesbasedonbiologicallanguagemodels