Cargando…

Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites

Background: DNA replication plays an indispensable role in the transmission of genetic information. It is considered to be the basis of biological inheritance and the most fundamental process in all biological life. Considering that DNA replication initiates with a special location, namely the origi...

Descripción completa

Detalles Bibliográficos
Autores principales: Yao, Yingying, Zhang, Shengli, Xue, Tian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Bentham Science Publishers 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9878833/
https://www.ncbi.nlm.nih.gov/pubmed/36778978
http://dx.doi.org/10.2174/1389202923666220214122506
_version_ 1784878572385599488
author Yao, Yingying
Zhang, Shengli
Xue, Tian
author_facet Yao, Yingying
Zhang, Shengli
Xue, Tian
author_sort Yao, Yingying
collection PubMed
description Background: DNA replication plays an indispensable role in the transmission of genetic information. It is considered to be the basis of biological inheritance and the most fundamental process in all biological life. Considering that DNA replication initiates with a special location, namely the origin of replication, a better and accurate prediction of the origins of replication sites (ORIs) is essential to gain insight into the relationship with gene expression. Objective: In this study, we have developed an efficient predictor called iORI-LAVT for ORIs identification. Methods: This work focuses on extracting feature information from three aspects, including mono-nucleotide encoding, k-mer and ring-function-hydrogen-chemical properties. Subsequently, least absolute shrinkage and selection operator (LASSO) as a feature selection is applied to select the optimal features. Comparing the different combined soft voting classifiers results, the soft voting classifier based on GaussianNB and Logistic Regression is employed as the final classifier. Results: Based on 10-fold cross-validation test, the prediction accuracies of two benchmark datasets are 90.39% and 95.96%, respectively. As for the independent dataset, our method achieves high accuracy of 91.3%. Conclusion: Compared with previous predictors, iORI-LAVT outperforms the existing methods. It is believed that iORI-LAVT predictor is a promising alternative for further research on identifying ORIs.
format Online
Article
Text
id pubmed-9878833
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Bentham Science Publishers
record_format MEDLINE/PubMed
spelling pubmed-98788332023-02-09 Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites Yao, Yingying Zhang, Shengli Xue, Tian Curr Genomics Genetics & Genomics Background: DNA replication plays an indispensable role in the transmission of genetic information. It is considered to be the basis of biological inheritance and the most fundamental process in all biological life. Considering that DNA replication initiates with a special location, namely the origin of replication, a better and accurate prediction of the origins of replication sites (ORIs) is essential to gain insight into the relationship with gene expression. Objective: In this study, we have developed an efficient predictor called iORI-LAVT for ORIs identification. Methods: This work focuses on extracting feature information from three aspects, including mono-nucleotide encoding, k-mer and ring-function-hydrogen-chemical properties. Subsequently, least absolute shrinkage and selection operator (LASSO) as a feature selection is applied to select the optimal features. Comparing the different combined soft voting classifiers results, the soft voting classifier based on GaussianNB and Logistic Regression is employed as the final classifier. Results: Based on 10-fold cross-validation test, the prediction accuracies of two benchmark datasets are 90.39% and 95.96%, respectively. As for the independent dataset, our method achieves high accuracy of 91.3%. Conclusion: Compared with previous predictors, iORI-LAVT outperforms the existing methods. It is believed that iORI-LAVT predictor is a promising alternative for further research on identifying ORIs. Bentham Science Publishers 2022-06-10 2022-06-10 /pmc/articles/PMC9878833/ /pubmed/36778978 http://dx.doi.org/10.2174/1389202923666220214122506 Text en © 2022 Bentham Science Publishers https://creativecommons.org/licenses/by-nc/4.0/ This is an open access article licensed under the terms of the Creative Commons Attribution-Non-Commercial 4.0 International Public License (CC BY-NC 4.0) (https://creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited.
spellingShingle Genetics & Genomics
Yao, Yingying
Zhang, Shengli
Xue, Tian
Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites
title Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites
title_full Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites
title_fullStr Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites
title_full_unstemmed Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites
title_short Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites
title_sort integrating lasso feature selection and soft voting classifier to identify origins of replication sites
topic Genetics & Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9878833/
https://www.ncbi.nlm.nih.gov/pubmed/36778978
http://dx.doi.org/10.2174/1389202923666220214122506
work_keys_str_mv AT yaoyingying integratinglassofeatureselectionandsoftvotingclassifiertoidentifyoriginsofreplicationsites
AT zhangshengli integratinglassofeatureselectionandsoftvotingclassifiertoidentifyoriginsofreplicationsites
AT xuetian integratinglassofeatureselectionandsoftvotingclassifiertoidentifyoriginsofreplicationsites