Cargando…

Using multi-layer perceptron to identify origins of replication in eukaryotes via informative features

BACKGROUND: The origin is the starting site of DNA replication, an extremely vital part of the informational inheritance between parents and children. More importantly, accurately identifying the origin of replication has great application value in the diagnosis and treatment of diseases related to...

Descripción completa

Detalles Bibliográficos
Autores principales: Fan, Yongxian, Wang, Wanru
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8542328/
https://www.ncbi.nlm.nih.gov/pubmed/34688247
http://dx.doi.org/10.1186/s12859-021-04431-x
_version_ 1784589408212615168
author Fan, Yongxian
Wang, Wanru
author_facet Fan, Yongxian
Wang, Wanru
author_sort Fan, Yongxian
collection PubMed
description BACKGROUND: The origin is the starting site of DNA replication, an extremely vital part of the informational inheritance between parents and children. More importantly, accurately identifying the origin of replication has great application value in the diagnosis and treatment of diseases related to genetic information errors, while the traditional biological experimental methods are time-consuming and laborious. RESULTS: We carried out research on the origin of replication in a variety of eukaryotes and proposed a unique prediction method for each species. Throughout the experiment, we collected data from 7 species, including Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana, Kluyveromyces lactis, Pichia pastoris and Schizosaccharomyces pombe. In addition to the commonly used sequence feature extraction methods PseKNC-II and Base-content, we designed a feature extraction method based on TF-IDF. Then the two-step method was utilized for feature selection. After comparing a variety of traditional machine learning classification models, the multi-layer perceptron was employed as the classification algorithm. Ultimately, the data and codes involved in the experiment are available at https://github.com/Sarahyouzi/EukOriginPredict. CONCLUSIONS: The prediction accuracy of the training set of the above-mentioned seven species after 100 times fivefold cross validation reach 92.60%, 90.80%, 91.22%, 96.15%, 96.72%, 99.86%, 96.72%, respectively. It denotes that compared with other methods, the methods we designed could accomplish superior performance. In addition, our experiments reveals that the models of multiple species could predict each other with high accuracy, and the results of STREME shows that they have a certain common motif. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04431-x.
format Online
Article
Text
id pubmed-8542328
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-85423282021-10-25 Using multi-layer perceptron to identify origins of replication in eukaryotes via informative features Fan, Yongxian Wang, Wanru BMC Bioinformatics Research BACKGROUND: The origin is the starting site of DNA replication, an extremely vital part of the informational inheritance between parents and children. More importantly, accurately identifying the origin of replication has great application value in the diagnosis and treatment of diseases related to genetic information errors, while the traditional biological experimental methods are time-consuming and laborious. RESULTS: We carried out research on the origin of replication in a variety of eukaryotes and proposed a unique prediction method for each species. Throughout the experiment, we collected data from 7 species, including Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana, Kluyveromyces lactis, Pichia pastoris and Schizosaccharomyces pombe. In addition to the commonly used sequence feature extraction methods PseKNC-II and Base-content, we designed a feature extraction method based on TF-IDF. Then the two-step method was utilized for feature selection. After comparing a variety of traditional machine learning classification models, the multi-layer perceptron was employed as the classification algorithm. Ultimately, the data and codes involved in the experiment are available at https://github.com/Sarahyouzi/EukOriginPredict. CONCLUSIONS: The prediction accuracy of the training set of the above-mentioned seven species after 100 times fivefold cross validation reach 92.60%, 90.80%, 91.22%, 96.15%, 96.72%, 99.86%, 96.72%, respectively. It denotes that compared with other methods, the methods we designed could accomplish superior performance. In addition, our experiments reveals that the models of multiple species could predict each other with high accuracy, and the results of STREME shows that they have a certain common motif. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04431-x. BioMed Central 2021-10-23 /pmc/articles/PMC8542328/ /pubmed/34688247 http://dx.doi.org/10.1186/s12859-021-04431-x Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Fan, Yongxian
Wang, Wanru
Using multi-layer perceptron to identify origins of replication in eukaryotes via informative features
title Using multi-layer perceptron to identify origins of replication in eukaryotes via informative features
title_full Using multi-layer perceptron to identify origins of replication in eukaryotes via informative features
title_fullStr Using multi-layer perceptron to identify origins of replication in eukaryotes via informative features
title_full_unstemmed Using multi-layer perceptron to identify origins of replication in eukaryotes via informative features
title_short Using multi-layer perceptron to identify origins of replication in eukaryotes via informative features
title_sort using multi-layer perceptron to identify origins of replication in eukaryotes via informative features
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8542328/
https://www.ncbi.nlm.nih.gov/pubmed/34688247
http://dx.doi.org/10.1186/s12859-021-04431-x
work_keys_str_mv AT fanyongxian usingmultilayerperceptrontoidentifyoriginsofreplicationineukaryotesviainformativefeatures
AT wangwanru usingmultilayerperceptrontoidentifyoriginsofreplicationineukaryotesviainformativefeatures