Cargando…

Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design

Peptide nucleic acids (PNAs) are potential antisense therapies for genetic, acquired, and viral diseases. Efficiently selecting candidate PNA sequences for synthesis and evaluation from a genome containing hundreds to thousands of options can be challenging. To facilitate this process, this work lev...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Chengxi, Zhang, Genwei, Mohapatra, Somesh, Callahan, Alex J., Loas, Andrei, Gómez‐Bombarelli, Rafael, Pentelute, Bradley L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9731686/
https://www.ncbi.nlm.nih.gov/pubmed/36270977
http://dx.doi.org/10.1002/advs.202201988
_version_ 1784845956535025664
author Li, Chengxi
Zhang, Genwei
Mohapatra, Somesh
Callahan, Alex J.
Loas, Andrei
Gómez‐Bombarelli, Rafael
Pentelute, Bradley L.
author_facet Li, Chengxi
Zhang, Genwei
Mohapatra, Somesh
Callahan, Alex J.
Loas, Andrei
Gómez‐Bombarelli, Rafael
Pentelute, Bradley L.
author_sort Li, Chengxi
collection PubMed
description Peptide nucleic acids (PNAs) are potential antisense therapies for genetic, acquired, and viral diseases. Efficiently selecting candidate PNA sequences for synthesis and evaluation from a genome containing hundreds to thousands of options can be challenging. To facilitate this process, this work leverages machine learning (ML) algorithms and automated synthesis technology to predict PNA synthesis efficiency and guide rational PNA sequence design. The training data is collected from individual fluorenylmethyloxycarbonyl (Fmoc) deprotection reactions performed on a fully automated PNA synthesizer. The optimized ML model allows for 93% prediction accuracy and 0.97 Pearson's r. The predicted synthesis scores are validated to be correlated with the experimental high‐performance liquid chromatography (HPLC) crude purities (correlation coefficient R (2) = 0.95). Furthermore, a general applicability of ML is demonstrated through designing synthetically accessible antisense PNA sequences from 102 315 predicted candidates targeting exon 44 of the human dystrophin gene, SARS‐CoV‐2, HIV, as well as selected genes associated with cardiovascular diseases, type II diabetes, and various cancers. Collectively, ML provides an accurate prediction of PNA synthesis quality and serves as a useful computational tool for informing PNA sequence design.
format Online
Article
Text
id pubmed-9731686
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-97316862022-12-12 Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design Li, Chengxi Zhang, Genwei Mohapatra, Somesh Callahan, Alex J. Loas, Andrei Gómez‐Bombarelli, Rafael Pentelute, Bradley L. Adv Sci (Weinh) Research Articles Peptide nucleic acids (PNAs) are potential antisense therapies for genetic, acquired, and viral diseases. Efficiently selecting candidate PNA sequences for synthesis and evaluation from a genome containing hundreds to thousands of options can be challenging. To facilitate this process, this work leverages machine learning (ML) algorithms and automated synthesis technology to predict PNA synthesis efficiency and guide rational PNA sequence design. The training data is collected from individual fluorenylmethyloxycarbonyl (Fmoc) deprotection reactions performed on a fully automated PNA synthesizer. The optimized ML model allows for 93% prediction accuracy and 0.97 Pearson's r. The predicted synthesis scores are validated to be correlated with the experimental high‐performance liquid chromatography (HPLC) crude purities (correlation coefficient R (2) = 0.95). Furthermore, a general applicability of ML is demonstrated through designing synthetically accessible antisense PNA sequences from 102 315 predicted candidates targeting exon 44 of the human dystrophin gene, SARS‐CoV‐2, HIV, as well as selected genes associated with cardiovascular diseases, type II diabetes, and various cancers. Collectively, ML provides an accurate prediction of PNA synthesis quality and serves as a useful computational tool for informing PNA sequence design. John Wiley and Sons Inc. 2022-10-21 /pmc/articles/PMC9731686/ /pubmed/36270977 http://dx.doi.org/10.1002/advs.202201988 Text en © 2022 The Authors. Advanced Science published by Wiley‐VCH GmbH https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Articles
Li, Chengxi
Zhang, Genwei
Mohapatra, Somesh
Callahan, Alex J.
Loas, Andrei
Gómez‐Bombarelli, Rafael
Pentelute, Bradley L.
Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design
title Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design
title_full Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design
title_fullStr Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design
title_full_unstemmed Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design
title_short Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design
title_sort machine learning guides peptide nucleic acid flow synthesis and sequence design
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9731686/
https://www.ncbi.nlm.nih.gov/pubmed/36270977
http://dx.doi.org/10.1002/advs.202201988
work_keys_str_mv AT lichengxi machinelearningguidespeptidenucleicacidflowsynthesisandsequencedesign
AT zhanggenwei machinelearningguidespeptidenucleicacidflowsynthesisandsequencedesign
AT mohapatrasomesh machinelearningguidespeptidenucleicacidflowsynthesisandsequencedesign
AT callahanalexj machinelearningguidespeptidenucleicacidflowsynthesisandsequencedesign
AT loasandrei machinelearningguidespeptidenucleicacidflowsynthesisandsequencedesign
AT gomezbombarellirafael machinelearningguidespeptidenucleicacidflowsynthesisandsequencedesign
AT pentelutebradleyl machinelearningguidespeptidenucleicacidflowsynthesisandsequencedesign