Cargando…

Amyloidogenic motifs revealed by n-gram analysis

Amyloids are proteins associated with several clinical disorders, including Alzheimer’s, and Creutzfeldt-Jakob’s. Despite their diversity, all amyloid proteins can undergo aggregation initiated by short segments called hot spots. To find the patterns defining the hot spots, we trained predictors of...

Descripción completa

Detalles Bibliográficos
Autores principales: Burdukiewicz, Michał, Sobczyk, Piotr, Rödiger, Stefan, Duda-Madej, Anna, Mackiewicz, Paweł, Kotulska, Małgorzata
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5636826/
https://www.ncbi.nlm.nih.gov/pubmed/29021608
http://dx.doi.org/10.1038/s41598-017-13210-9
_version_ 1783270516970225664
author Burdukiewicz, Michał
Sobczyk, Piotr
Rödiger, Stefan
Duda-Madej, Anna
Mackiewicz, Paweł
Kotulska, Małgorzata
author_facet Burdukiewicz, Michał
Sobczyk, Piotr
Rödiger, Stefan
Duda-Madej, Anna
Mackiewicz, Paweł
Kotulska, Małgorzata
author_sort Burdukiewicz, Michał
collection PubMed
description Amyloids are proteins associated with several clinical disorders, including Alzheimer’s, and Creutzfeldt-Jakob’s. Despite their diversity, all amyloid proteins can undergo aggregation initiated by short segments called hot spots. To find the patterns defining the hot spots, we trained predictors of amyloidogenicity, using n-grams and random forest classifiers. Since the amyloidogenicity may not depend on the exact sequence of amino acids but on their more general properties, we tested 524,284 reduced amino acid alphabets of different lengths (three to six letters) to find the alphabet providing the best performance in cross-validation. The predictor based on this alphabet, called AmyloGram, was benchmarked against the most popular tools for the detection of amyloid peptides using an external data set and obtained the highest values of performance measures (AUC: 0.90, MCC: 0.63). Our results showed sequential patterns in the amyloids which are strongly correlated with hydrophobicity, a tendency to form β-sheets, and lower flexibility of amino acid residues. Among the most informative n-grams of AmyloGram we identified 15 that were previously confirmed experimentally. AmyloGram is available as the web-server: http://smorfland.uni.wroc.pl/shiny/AmyloGram/ and as the R package AmyloGram. R scripts and data used to produce the results of this manuscript are available at http://github.com/michbur/AmyloGramAnalysis.
format Online
Article
Text
id pubmed-5636826
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-56368262017-10-18 Amyloidogenic motifs revealed by n-gram analysis Burdukiewicz, Michał Sobczyk, Piotr Rödiger, Stefan Duda-Madej, Anna Mackiewicz, Paweł Kotulska, Małgorzata Sci Rep Article Amyloids are proteins associated with several clinical disorders, including Alzheimer’s, and Creutzfeldt-Jakob’s. Despite their diversity, all amyloid proteins can undergo aggregation initiated by short segments called hot spots. To find the patterns defining the hot spots, we trained predictors of amyloidogenicity, using n-grams and random forest classifiers. Since the amyloidogenicity may not depend on the exact sequence of amino acids but on their more general properties, we tested 524,284 reduced amino acid alphabets of different lengths (three to six letters) to find the alphabet providing the best performance in cross-validation. The predictor based on this alphabet, called AmyloGram, was benchmarked against the most popular tools for the detection of amyloid peptides using an external data set and obtained the highest values of performance measures (AUC: 0.90, MCC: 0.63). Our results showed sequential patterns in the amyloids which are strongly correlated with hydrophobicity, a tendency to form β-sheets, and lower flexibility of amino acid residues. Among the most informative n-grams of AmyloGram we identified 15 that were previously confirmed experimentally. AmyloGram is available as the web-server: http://smorfland.uni.wroc.pl/shiny/AmyloGram/ and as the R package AmyloGram. R scripts and data used to produce the results of this manuscript are available at http://github.com/michbur/AmyloGramAnalysis. Nature Publishing Group UK 2017-10-11 /pmc/articles/PMC5636826/ /pubmed/29021608 http://dx.doi.org/10.1038/s41598-017-13210-9 Text en © The Author(s) 2017 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Burdukiewicz, Michał
Sobczyk, Piotr
Rödiger, Stefan
Duda-Madej, Anna
Mackiewicz, Paweł
Kotulska, Małgorzata
Amyloidogenic motifs revealed by n-gram analysis
title Amyloidogenic motifs revealed by n-gram analysis
title_full Amyloidogenic motifs revealed by n-gram analysis
title_fullStr Amyloidogenic motifs revealed by n-gram analysis
title_full_unstemmed Amyloidogenic motifs revealed by n-gram analysis
title_short Amyloidogenic motifs revealed by n-gram analysis
title_sort amyloidogenic motifs revealed by n-gram analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5636826/
https://www.ncbi.nlm.nih.gov/pubmed/29021608
http://dx.doi.org/10.1038/s41598-017-13210-9
work_keys_str_mv AT burdukiewiczmichał amyloidogenicmotifsrevealedbyngramanalysis
AT sobczykpiotr amyloidogenicmotifsrevealedbyngramanalysis
AT rodigerstefan amyloidogenicmotifsrevealedbyngramanalysis
AT dudamadejanna amyloidogenicmotifsrevealedbyngramanalysis
AT mackiewiczpaweł amyloidogenicmotifsrevealedbyngramanalysis
AT kotulskamałgorzata amyloidogenicmotifsrevealedbyngramanalysis