Cargando…

De novo profile generation based on sequence context specificity with the long short-term memory network

BACKGROUND: Long short-term memory (LSTM) is one of the most attractive deep learning methods to learn time series or contexts of input data. Increasing studies, including biological sequence analyses in bioinformatics, utilize this architecture. Amino acid sequence profiles are widely used for bioi...

Descripción completa

Detalles Bibliográficos
Autores principales: Yamada, Kazunori D., Kinoshita, Kengo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6052547/
https://www.ncbi.nlm.nih.gov/pubmed/30021530
http://dx.doi.org/10.1186/s12859-018-2284-1
_version_ 1783340676585357312
author Yamada, Kazunori D.
Kinoshita, Kengo
author_facet Yamada, Kazunori D.
Kinoshita, Kengo
author_sort Yamada, Kazunori D.
collection PubMed
description BACKGROUND: Long short-term memory (LSTM) is one of the most attractive deep learning methods to learn time series or contexts of input data. Increasing studies, including biological sequence analyses in bioinformatics, utilize this architecture. Amino acid sequence profiles are widely used for bioinformatics studies, such as sequence similarity searches, multiple alignments, and evolutionary analyses. Currently, many biological sequences are becoming available, and the rapidly increasing amount of sequence data emphasizes the importance of scalable generators of amino acid sequence profiles. RESULTS: We employed the LSTM network and developed a novel profile generator to construct profiles without any assumptions, except for input sequence context. Our method could generate better profiles than existing de novo profile generators, including CSBuild and RPS-BLAST, on the basis of profile-sequence similarity search performance with linear calculation costs against input sequence size. In addition, we analyzed the effects of the memory power of LSTM and found that LSTM had high potential power to detect long-range interactions between amino acids, as in the case of beta-strand formation, which has been a difficult problem in protein bioinformatics using sequence information. CONCLUSION: We demonstrated the importance of sequence context and the feasibility of LSTM on biological sequence analyses. Our results demonstrated the effectiveness of memories in LSTM and showed that our de novo profile generator, SPBuild, achieved higher performance than that of existing methods for profile prediction of beta-strands, where long-range interactions of amino acids are important and are known to be difficult for the existing window-based prediction methods. Our findings will be useful for the development of other prediction methods related to biological sequences by machine learning methods. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2284-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6052547
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-60525472018-07-20 De novo profile generation based on sequence context specificity with the long short-term memory network Yamada, Kazunori D. Kinoshita, Kengo BMC Bioinformatics Research Article BACKGROUND: Long short-term memory (LSTM) is one of the most attractive deep learning methods to learn time series or contexts of input data. Increasing studies, including biological sequence analyses in bioinformatics, utilize this architecture. Amino acid sequence profiles are widely used for bioinformatics studies, such as sequence similarity searches, multiple alignments, and evolutionary analyses. Currently, many biological sequences are becoming available, and the rapidly increasing amount of sequence data emphasizes the importance of scalable generators of amino acid sequence profiles. RESULTS: We employed the LSTM network and developed a novel profile generator to construct profiles without any assumptions, except for input sequence context. Our method could generate better profiles than existing de novo profile generators, including CSBuild and RPS-BLAST, on the basis of profile-sequence similarity search performance with linear calculation costs against input sequence size. In addition, we analyzed the effects of the memory power of LSTM and found that LSTM had high potential power to detect long-range interactions between amino acids, as in the case of beta-strand formation, which has been a difficult problem in protein bioinformatics using sequence information. CONCLUSION: We demonstrated the importance of sequence context and the feasibility of LSTM on biological sequence analyses. Our results demonstrated the effectiveness of memories in LSTM and showed that our de novo profile generator, SPBuild, achieved higher performance than that of existing methods for profile prediction of beta-strands, where long-range interactions of amino acids are important and are known to be difficult for the existing window-based prediction methods. Our findings will be useful for the development of other prediction methods related to biological sequences by machine learning methods. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2284-1) contains supplementary material, which is available to authorized users. BioMed Central 2018-07-18 /pmc/articles/PMC6052547/ /pubmed/30021530 http://dx.doi.org/10.1186/s12859-018-2284-1 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Yamada, Kazunori D.
Kinoshita, Kengo
De novo profile generation based on sequence context specificity with the long short-term memory network
title De novo profile generation based on sequence context specificity with the long short-term memory network
title_full De novo profile generation based on sequence context specificity with the long short-term memory network
title_fullStr De novo profile generation based on sequence context specificity with the long short-term memory network
title_full_unstemmed De novo profile generation based on sequence context specificity with the long short-term memory network
title_short De novo profile generation based on sequence context specificity with the long short-term memory network
title_sort de novo profile generation based on sequence context specificity with the long short-term memory network
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6052547/
https://www.ncbi.nlm.nih.gov/pubmed/30021530
http://dx.doi.org/10.1186/s12859-018-2284-1
work_keys_str_mv AT yamadakazunorid denovoprofilegenerationbasedonsequencecontextspecificitywiththelongshorttermmemorynetwork
AT kinoshitakengo denovoprofilegenerationbasedonsequencecontextspecificitywiththelongshorttermmemorynetwork