Cargando…

Accurate prediction of B-form/A-form DNA conformation propensity from primary sequence: A machine learning and free energy handshake

DNA carries the genetic code of life, with different conformations associated with different biological functions. Predicting the conformation of DNA from its primary sequence, although desirable, is a challenging problem owing to the polymorphic nature of DNA. We have deployed a host of machine lea...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gupta, Abhijit, Kulkarni, Mandar, Mukherjee, Arnab
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8441556/ https://www.ncbi.nlm.nih.gov/pubmed/34553171 http://dx.doi.org/10.1016/j.patter.2021.100329

_version_	1783752892639870976
author	Gupta, Abhijit Kulkarni, Mandar Mukherjee, Arnab
author_facet	Gupta, Abhijit Kulkarni, Mandar Mukherjee, Arnab
author_sort	Gupta, Abhijit
collection	PubMed
description	DNA carries the genetic code of life, with different conformations associated with different biological functions. Predicting the conformation of DNA from its primary sequence, although desirable, is a challenging problem owing to the polymorphic nature of DNA. We have deployed a host of machine learning algorithms, including the popular state-of-the-art LightGBM (a gradient boosting model), for building prediction models. We used the nested cross-validation strategy to address the issues of “overfitting” and selection bias. This simultaneously provides an unbiased estimate of the generalization performance of a machine learning algorithm and allows us to tune the hyperparameters optimally. Furthermore, we built a secondary model based on SHAP (SHapley Additive exPlanations) that offers crucial insight into model interpretability. Our detailed model-building strategy and robust statistical validation protocols tackle the formidable challenge of working on small datasets, which is often the case in biological and medical data.
format	Online Article Text
id	pubmed-8441556
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-84415562021-09-21 Accurate prediction of B-form/A-form DNA conformation propensity from primary sequence: A machine learning and free energy handshake Gupta, Abhijit Kulkarni, Mandar Mukherjee, Arnab Patterns (N Y) Article DNA carries the genetic code of life, with different conformations associated with different biological functions. Predicting the conformation of DNA from its primary sequence, although desirable, is a challenging problem owing to the polymorphic nature of DNA. We have deployed a host of machine learning algorithms, including the popular state-of-the-art LightGBM (a gradient boosting model), for building prediction models. We used the nested cross-validation strategy to address the issues of “overfitting” and selection bias. This simultaneously provides an unbiased estimate of the generalization performance of a machine learning algorithm and allows us to tune the hyperparameters optimally. Furthermore, we built a secondary model based on SHAP (SHapley Additive exPlanations) that offers crucial insight into model interpretability. Our detailed model-building strategy and robust statistical validation protocols tackle the formidable challenge of working on small datasets, which is often the case in biological and medical data. Elsevier 2021-08-12 /pmc/articles/PMC8441556/ /pubmed/34553171 http://dx.doi.org/10.1016/j.patter.2021.100329 Text en © 2021 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Article Gupta, Abhijit Kulkarni, Mandar Mukherjee, Arnab Accurate prediction of B-form/A-form DNA conformation propensity from primary sequence: A machine learning and free energy handshake
title	Accurate prediction of B-form/A-form DNA conformation propensity from primary sequence: A machine learning and free energy handshake
title_full	Accurate prediction of B-form/A-form DNA conformation propensity from primary sequence: A machine learning and free energy handshake
title_fullStr	Accurate prediction of B-form/A-form DNA conformation propensity from primary sequence: A machine learning and free energy handshake
title_full_unstemmed	Accurate prediction of B-form/A-form DNA conformation propensity from primary sequence: A machine learning and free energy handshake
title_short	Accurate prediction of B-form/A-form DNA conformation propensity from primary sequence: A machine learning and free energy handshake
title_sort	accurate prediction of b-form/a-form dna conformation propensity from primary sequence: a machine learning and free energy handshake
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8441556/ https://www.ncbi.nlm.nih.gov/pubmed/34553171 http://dx.doi.org/10.1016/j.patter.2021.100329
work_keys_str_mv	AT guptaabhijit accuratepredictionofbformaformdnaconformationpropensityfromprimarysequenceamachinelearningandfreeenergyhandshake AT kulkarnimandar accuratepredictionofbformaformdnaconformationpropensityfromprimarysequenceamachinelearningandfreeenergyhandshake AT mukherjeearnab accuratepredictionofbformaformdnaconformationpropensityfromprimarysequenceamachinelearningandfreeenergyhandshake

Accurate prediction of B-form/A-form DNA conformation propensity from primary sequence: A machine learning and free energy handshake

Ejemplares similares