Cargando…
BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models
In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of ‘book of life’. We also extend the BLMs into a system called BioSeq-BLM f...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8682797/ https://www.ncbi.nlm.nih.gov/pubmed/34581805 http://dx.doi.org/10.1093/nar/gkab829 |
_version_ | 1784617299510034432 |
---|---|
author | Li, Hong-Liang Pang, Yi-He Liu, Bin |
author_facet | Li, Hong-Liang Pang, Yi-He Liu, Bin |
author_sort | Li, Hong-Liang |
collection | PubMed |
description | In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of ‘book of life’. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing technologies, and contribute to the development of this very important field. In order to help the readers to use BioSeq-BLM for their own experiments, the corresponding web server and stand-alone package are established and released, which can be freely accessed at http://bliulab.net/BioSeq-BLM/. |
format | Online Article Text |
id | pubmed-8682797 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-86827972021-12-20 BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models Li, Hong-Liang Pang, Yi-He Liu, Bin Nucleic Acids Res Methods Online In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of ‘book of life’. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing technologies, and contribute to the development of this very important field. In order to help the readers to use BioSeq-BLM for their own experiments, the corresponding web server and stand-alone package are established and released, which can be freely accessed at http://bliulab.net/BioSeq-BLM/. Oxford University Press 2021-09-28 /pmc/articles/PMC8682797/ /pubmed/34581805 http://dx.doi.org/10.1093/nar/gkab829 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Online Li, Hong-Liang Pang, Yi-He Liu, Bin BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models |
title | BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models |
title_full | BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models |
title_fullStr | BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models |
title_full_unstemmed | BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models |
title_short | BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models |
title_sort | bioseq-blm: a platform for analyzing dna, rna and protein sequences based on biological language models |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8682797/ https://www.ncbi.nlm.nih.gov/pubmed/34581805 http://dx.doi.org/10.1093/nar/gkab829 |
work_keys_str_mv | AT lihongliang bioseqblmaplatformforanalyzingdnarnaandproteinsequencesbasedonbiologicallanguagemodels AT pangyihe bioseqblmaplatformforanalyzingdnarnaandproteinsequencesbasedonbiologicallanguagemodels AT liubin bioseqblmaplatformforanalyzingdnarnaandproteinsequencesbasedonbiologicallanguagemodels |