Cargando…
Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites
Background: DNA replication plays an indispensable role in the transmission of genetic information. It is considered to be the basis of biological inheritance and the most fundamental process in all biological life. Considering that DNA replication initiates with a special location, namely the origi...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Bentham Science Publishers
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9878833/ https://www.ncbi.nlm.nih.gov/pubmed/36778978 http://dx.doi.org/10.2174/1389202923666220214122506 |
_version_ | 1784878572385599488 |
---|---|
author | Yao, Yingying Zhang, Shengli Xue, Tian |
author_facet | Yao, Yingying Zhang, Shengli Xue, Tian |
author_sort | Yao, Yingying |
collection | PubMed |
description | Background: DNA replication plays an indispensable role in the transmission of genetic information. It is considered to be the basis of biological inheritance and the most fundamental process in all biological life. Considering that DNA replication initiates with a special location, namely the origin of replication, a better and accurate prediction of the origins of replication sites (ORIs) is essential to gain insight into the relationship with gene expression. Objective: In this study, we have developed an efficient predictor called iORI-LAVT for ORIs identification. Methods: This work focuses on extracting feature information from three aspects, including mono-nucleotide encoding, k-mer and ring-function-hydrogen-chemical properties. Subsequently, least absolute shrinkage and selection operator (LASSO) as a feature selection is applied to select the optimal features. Comparing the different combined soft voting classifiers results, the soft voting classifier based on GaussianNB and Logistic Regression is employed as the final classifier. Results: Based on 10-fold cross-validation test, the prediction accuracies of two benchmark datasets are 90.39% and 95.96%, respectively. As for the independent dataset, our method achieves high accuracy of 91.3%. Conclusion: Compared with previous predictors, iORI-LAVT outperforms the existing methods. It is believed that iORI-LAVT predictor is a promising alternative for further research on identifying ORIs. |
format | Online Article Text |
id | pubmed-9878833 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Bentham Science Publishers |
record_format | MEDLINE/PubMed |
spelling | pubmed-98788332023-02-09 Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites Yao, Yingying Zhang, Shengli Xue, Tian Curr Genomics Genetics & Genomics Background: DNA replication plays an indispensable role in the transmission of genetic information. It is considered to be the basis of biological inheritance and the most fundamental process in all biological life. Considering that DNA replication initiates with a special location, namely the origin of replication, a better and accurate prediction of the origins of replication sites (ORIs) is essential to gain insight into the relationship with gene expression. Objective: In this study, we have developed an efficient predictor called iORI-LAVT for ORIs identification. Methods: This work focuses on extracting feature information from three aspects, including mono-nucleotide encoding, k-mer and ring-function-hydrogen-chemical properties. Subsequently, least absolute shrinkage and selection operator (LASSO) as a feature selection is applied to select the optimal features. Comparing the different combined soft voting classifiers results, the soft voting classifier based on GaussianNB and Logistic Regression is employed as the final classifier. Results: Based on 10-fold cross-validation test, the prediction accuracies of two benchmark datasets are 90.39% and 95.96%, respectively. As for the independent dataset, our method achieves high accuracy of 91.3%. Conclusion: Compared with previous predictors, iORI-LAVT outperforms the existing methods. It is believed that iORI-LAVT predictor is a promising alternative for further research on identifying ORIs. Bentham Science Publishers 2022-06-10 2022-06-10 /pmc/articles/PMC9878833/ /pubmed/36778978 http://dx.doi.org/10.2174/1389202923666220214122506 Text en © 2022 Bentham Science Publishers https://creativecommons.org/licenses/by-nc/4.0/ This is an open access article licensed under the terms of the Creative Commons Attribution-Non-Commercial 4.0 International Public License (CC BY-NC 4.0) (https://creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited. |
spellingShingle | Genetics & Genomics Yao, Yingying Zhang, Shengli Xue, Tian Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites |
title | Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites |
title_full | Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites |
title_fullStr | Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites |
title_full_unstemmed | Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites |
title_short | Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites |
title_sort | integrating lasso feature selection and soft voting classifier to identify origins of replication sites |
topic | Genetics & Genomics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9878833/ https://www.ncbi.nlm.nih.gov/pubmed/36778978 http://dx.doi.org/10.2174/1389202923666220214122506 |
work_keys_str_mv | AT yaoyingying integratinglassofeatureselectionandsoftvotingclassifiertoidentifyoriginsofreplicationsites AT zhangshengli integratinglassofeatureselectionandsoftvotingclassifiertoidentifyoriginsofreplicationsites AT xuetian integratinglassofeatureselectionandsoftvotingclassifiertoidentifyoriginsofreplicationsites |