Cargando…

Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA

Deoxyribonucleic acid (DNA) is a biological macromolecule. Its main function is information storage. At present, the advancement of sequencing technology had caused DNA sequence data to grow at an explosive rate, which has also pushed the study of DNA sequences in the wave of big data. Moreover, mac...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Aimin, Zhang, Wei, Wang, Jiahao, Yang, Ke, Han, Yang, Zhang, Limin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7498545/
https://www.ncbi.nlm.nih.gov/pubmed/33015010
http://dx.doi.org/10.3389/fbioe.2020.01032
_version_ 1783583534195146752
author Yang, Aimin
Zhang, Wei
Wang, Jiahao
Yang, Ke
Han, Yang
Zhang, Limin
author_facet Yang, Aimin
Zhang, Wei
Wang, Jiahao
Yang, Ke
Han, Yang
Zhang, Limin
author_sort Yang, Aimin
collection PubMed
description Deoxyribonucleic acid (DNA) is a biological macromolecule. Its main function is information storage. At present, the advancement of sequencing technology had caused DNA sequence data to grow at an explosive rate, which has also pushed the study of DNA sequences in the wave of big data. Moreover, machine learning is a powerful technique for analyzing largescale data and learns spontaneously to gain knowledge. It has been widely used in DNA sequence data analysis and obtained a lot of research achievements. Firstly, the review introduces the development process of sequencing technology, expounds on the concept of DNA sequence data structure and sequence similarity. Then we analyze the basic process of data mining, summary several major machine learning algorithms, and put forward the challenges faced by machine learning algorithms in the mining of biological sequence data and possible solutions in the future. Then we review four typical applications of machine learning in DNA sequence data: DNA sequence alignment, DNA sequence classification, DNA sequence clustering, and DNA pattern mining. We analyze their corresponding biological application background and significance, and systematically summarized the development and potential problems in the field of DNA sequence data mining in recent years. Finally, we summarize the content of the review and look into the future of some research directions for the next step.
format Online
Article
Text
id pubmed-7498545
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-74985452020-10-02 Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA Yang, Aimin Zhang, Wei Wang, Jiahao Yang, Ke Han, Yang Zhang, Limin Front Bioeng Biotechnol Bioengineering and Biotechnology Deoxyribonucleic acid (DNA) is a biological macromolecule. Its main function is information storage. At present, the advancement of sequencing technology had caused DNA sequence data to grow at an explosive rate, which has also pushed the study of DNA sequences in the wave of big data. Moreover, machine learning is a powerful technique for analyzing largescale data and learns spontaneously to gain knowledge. It has been widely used in DNA sequence data analysis and obtained a lot of research achievements. Firstly, the review introduces the development process of sequencing technology, expounds on the concept of DNA sequence data structure and sequence similarity. Then we analyze the basic process of data mining, summary several major machine learning algorithms, and put forward the challenges faced by machine learning algorithms in the mining of biological sequence data and possible solutions in the future. Then we review four typical applications of machine learning in DNA sequence data: DNA sequence alignment, DNA sequence classification, DNA sequence clustering, and DNA pattern mining. We analyze their corresponding biological application background and significance, and systematically summarized the development and potential problems in the field of DNA sequence data mining in recent years. Finally, we summarize the content of the review and look into the future of some research directions for the next step. Frontiers Media S.A. 2020-09-04 /pmc/articles/PMC7498545/ /pubmed/33015010 http://dx.doi.org/10.3389/fbioe.2020.01032 Text en Copyright © 2020 Yang, Zhang, Wang, Yang, Han and Zhang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Bioengineering and Biotechnology
Yang, Aimin
Zhang, Wei
Wang, Jiahao
Yang, Ke
Han, Yang
Zhang, Limin
Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA
title Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA
title_full Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA
title_fullStr Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA
title_full_unstemmed Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA
title_short Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA
title_sort review on the application of machine learning algorithms in the sequence data mining of dna
topic Bioengineering and Biotechnology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7498545/
https://www.ncbi.nlm.nih.gov/pubmed/33015010
http://dx.doi.org/10.3389/fbioe.2020.01032
work_keys_str_mv AT yangaimin reviewontheapplicationofmachinelearningalgorithmsinthesequencedataminingofdna
AT zhangwei reviewontheapplicationofmachinelearningalgorithmsinthesequencedataminingofdna
AT wangjiahao reviewontheapplicationofmachinelearningalgorithmsinthesequencedataminingofdna
AT yangke reviewontheapplicationofmachinelearningalgorithmsinthesequencedataminingofdna
AT hanyang reviewontheapplicationofmachinelearningalgorithmsinthesequencedataminingofdna
AT zhanglimin reviewontheapplicationofmachinelearningalgorithmsinthesequencedataminingofdna