Cargando…

Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions

A number of neurologic diseases associated with expanded nucleotide repeats, including an inherited form of amyotrophic lateral sclerosis, have an unconventional form of translation called repeat-associated non-AUG (RAN) translation. It has been speculated that the repeat regions in the RNA fold int...

Descripción completa

Detalles Bibliográficos
Autores principales: Gleason, Alec C., Ghadge, Ghanashyam, Chen, Jin, Sonobe, Yoshifumi, Roos, Raymond P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9159584/
https://www.ncbi.nlm.nih.gov/pubmed/35648796
http://dx.doi.org/10.1371/journal.pone.0256411
_version_ 1784719084571590656
author Gleason, Alec C.
Ghadge, Ghanashyam
Chen, Jin
Sonobe, Yoshifumi
Roos, Raymond P.
author_facet Gleason, Alec C.
Ghadge, Ghanashyam
Chen, Jin
Sonobe, Yoshifumi
Roos, Raymond P.
author_sort Gleason, Alec C.
collection PubMed
description A number of neurologic diseases associated with expanded nucleotide repeats, including an inherited form of amyotrophic lateral sclerosis, have an unconventional form of translation called repeat-associated non-AUG (RAN) translation. It has been speculated that the repeat regions in the RNA fold into secondary structures in a length-dependent manner, promoting RAN translation. Repeat protein products are translated, accumulate, and may contribute to disease pathogenesis. Nucleotides that flank the repeat region, especially ones closest to the initiation site, are believed to enhance translation initiation. A machine learning model has been published to help identify ATG and near-cognate translation initiation sites; however, this model has diminished predictive power due to its extensive feature selection and limited training data. Here, we overcome this limitation and increase prediction accuracy by the following: a) capture the effect of nucleotides most critical for translation initiation via feature reduction, b) implement an alternative machine learning algorithm better suited for limited data, c) build comprehensive and balanced training data (via sampling without replacement) that includes previously unavailable sequences, and d) split ATG and near-cognate translation initiation codon data to train two separate models. We also design a supplementary scoring system to provide an additional prognostic assessment of model predictions. The resultant models have high performance, with ~85–88% accuracy, exceeding that of the previously published model by >18%. The models presented here are used to identify translation initiation sites in genes associated with a number of neurologic repeat expansion disorders. The results confirm a number of sites of translation initiation upstream of the expanded repeats that have been found experimentally, and predict sites that are not yet established.
format Online
Article
Text
id pubmed-9159584
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-91595842022-06-02 Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions Gleason, Alec C. Ghadge, Ghanashyam Chen, Jin Sonobe, Yoshifumi Roos, Raymond P. PLoS One Research Article A number of neurologic diseases associated with expanded nucleotide repeats, including an inherited form of amyotrophic lateral sclerosis, have an unconventional form of translation called repeat-associated non-AUG (RAN) translation. It has been speculated that the repeat regions in the RNA fold into secondary structures in a length-dependent manner, promoting RAN translation. Repeat protein products are translated, accumulate, and may contribute to disease pathogenesis. Nucleotides that flank the repeat region, especially ones closest to the initiation site, are believed to enhance translation initiation. A machine learning model has been published to help identify ATG and near-cognate translation initiation sites; however, this model has diminished predictive power due to its extensive feature selection and limited training data. Here, we overcome this limitation and increase prediction accuracy by the following: a) capture the effect of nucleotides most critical for translation initiation via feature reduction, b) implement an alternative machine learning algorithm better suited for limited data, c) build comprehensive and balanced training data (via sampling without replacement) that includes previously unavailable sequences, and d) split ATG and near-cognate translation initiation codon data to train two separate models. We also design a supplementary scoring system to provide an additional prognostic assessment of model predictions. The resultant models have high performance, with ~85–88% accuracy, exceeding that of the previously published model by >18%. The models presented here are used to identify translation initiation sites in genes associated with a number of neurologic repeat expansion disorders. The results confirm a number of sites of translation initiation upstream of the expanded repeats that have been found experimentally, and predict sites that are not yet established. Public Library of Science 2022-06-01 /pmc/articles/PMC9159584/ /pubmed/35648796 http://dx.doi.org/10.1371/journal.pone.0256411 Text en © 2022 Gleason et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Gleason, Alec C.
Ghadge, Ghanashyam
Chen, Jin
Sonobe, Yoshifumi
Roos, Raymond P.
Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions
title Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions
title_full Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions
title_fullStr Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions
title_full_unstemmed Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions
title_short Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions
title_sort machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9159584/
https://www.ncbi.nlm.nih.gov/pubmed/35648796
http://dx.doi.org/10.1371/journal.pone.0256411
work_keys_str_mv AT gleasonalecc machinelearningpredictstranslationinitiationsitesinneurologicdiseaseswithnucleotiderepeatexpansions
AT ghadgeghanashyam machinelearningpredictstranslationinitiationsitesinneurologicdiseaseswithnucleotiderepeatexpansions
AT chenjin machinelearningpredictstranslationinitiationsitesinneurologicdiseaseswithnucleotiderepeatexpansions
AT sonobeyoshifumi machinelearningpredictstranslationinitiationsitesinneurologicdiseaseswithnucleotiderepeatexpansions
AT roosraymondp machinelearningpredictstranslationinitiationsitesinneurologicdiseaseswithnucleotiderepeatexpansions