Cargando…
Representation learning applications in biological sequence analysis
Although remarkable advances have been reported in high-throughput sequencing, the ability to aptly analyze a substantial amount of rapidly generated biological (DNA/RNA/protein) sequencing data remains a critical hurdle. To tackle this issue, the application of natural language processing (NLP) to...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Research Network of Computational and Structural Biotechnology
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8190442/ https://www.ncbi.nlm.nih.gov/pubmed/34141139 http://dx.doi.org/10.1016/j.csbj.2021.05.039 |
_version_ | 1783705685307949056 |
---|---|
author | Iuchi, Hitoshi Matsutani, Taro Yamada, Keisuke Iwano, Natsuki Sumi, Shunsuke Hosoda, Shion Zhao, Shitao Fukunaga, Tsukasa Hamada, Michiaki |
author_facet | Iuchi, Hitoshi Matsutani, Taro Yamada, Keisuke Iwano, Natsuki Sumi, Shunsuke Hosoda, Shion Zhao, Shitao Fukunaga, Tsukasa Hamada, Michiaki |
author_sort | Iuchi, Hitoshi |
collection | PubMed |
description | Although remarkable advances have been reported in high-throughput sequencing, the ability to aptly analyze a substantial amount of rapidly generated biological (DNA/RNA/protein) sequencing data remains a critical hurdle. To tackle this issue, the application of natural language processing (NLP) to biological sequence analysis has received increased attention. In this method, biological sequences are regarded as sentences while the single nucleic acids/amino acids or k-mers in these sequences represent the words. Embedding is an essential step in NLP, which performs the conversion of these words into vectors. Specifically, representation learning is an approach used for this transformation process, which can be applied to biological sequences. Vectorized biological sequences can then be applied for function and structure estimation, or as input for other probabilistic models. Considering the importance and growing trend for the application of representation learning to biological research, in the present study, we have reviewed the existing knowledge in representation learning for biological sequence analysis. |
format | Online Article Text |
id | pubmed-8190442 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Research Network of Computational and Structural Biotechnology |
record_format | MEDLINE/PubMed |
spelling | pubmed-81904422021-06-16 Representation learning applications in biological sequence analysis Iuchi, Hitoshi Matsutani, Taro Yamada, Keisuke Iwano, Natsuki Sumi, Shunsuke Hosoda, Shion Zhao, Shitao Fukunaga, Tsukasa Hamada, Michiaki Comput Struct Biotechnol J Review Article Although remarkable advances have been reported in high-throughput sequencing, the ability to aptly analyze a substantial amount of rapidly generated biological (DNA/RNA/protein) sequencing data remains a critical hurdle. To tackle this issue, the application of natural language processing (NLP) to biological sequence analysis has received increased attention. In this method, biological sequences are regarded as sentences while the single nucleic acids/amino acids or k-mers in these sequences represent the words. Embedding is an essential step in NLP, which performs the conversion of these words into vectors. Specifically, representation learning is an approach used for this transformation process, which can be applied to biological sequences. Vectorized biological sequences can then be applied for function and structure estimation, or as input for other probabilistic models. Considering the importance and growing trend for the application of representation learning to biological research, in the present study, we have reviewed the existing knowledge in representation learning for biological sequence analysis. Research Network of Computational and Structural Biotechnology 2021-05-23 /pmc/articles/PMC8190442/ /pubmed/34141139 http://dx.doi.org/10.1016/j.csbj.2021.05.039 Text en © 2021 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Review Article Iuchi, Hitoshi Matsutani, Taro Yamada, Keisuke Iwano, Natsuki Sumi, Shunsuke Hosoda, Shion Zhao, Shitao Fukunaga, Tsukasa Hamada, Michiaki Representation learning applications in biological sequence analysis |
title | Representation learning applications in biological sequence analysis |
title_full | Representation learning applications in biological sequence analysis |
title_fullStr | Representation learning applications in biological sequence analysis |
title_full_unstemmed | Representation learning applications in biological sequence analysis |
title_short | Representation learning applications in biological sequence analysis |
title_sort | representation learning applications in biological sequence analysis |
topic | Review Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8190442/ https://www.ncbi.nlm.nih.gov/pubmed/34141139 http://dx.doi.org/10.1016/j.csbj.2021.05.039 |
work_keys_str_mv | AT iuchihitoshi representationlearningapplicationsinbiologicalsequenceanalysis AT matsutanitaro representationlearningapplicationsinbiologicalsequenceanalysis AT yamadakeisuke representationlearningapplicationsinbiologicalsequenceanalysis AT iwanonatsuki representationlearningapplicationsinbiologicalsequenceanalysis AT sumishunsuke representationlearningapplicationsinbiologicalsequenceanalysis AT hosodashion representationlearningapplicationsinbiologicalsequenceanalysis AT zhaoshitao representationlearningapplicationsinbiologicalsequenceanalysis AT fukunagatsukasa representationlearningapplicationsinbiologicalsequenceanalysis AT hamadamichiaki representationlearningapplicationsinbiologicalsequenceanalysis |