Cargando…
Amyloidogenic motifs revealed by n-gram analysis
Amyloids are proteins associated with several clinical disorders, including Alzheimer’s, and Creutzfeldt-Jakob’s. Despite their diversity, all amyloid proteins can undergo aggregation initiated by short segments called hot spots. To find the patterns defining the hot spots, we trained predictors of...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5636826/ https://www.ncbi.nlm.nih.gov/pubmed/29021608 http://dx.doi.org/10.1038/s41598-017-13210-9 |
_version_ | 1783270516970225664 |
---|---|
author | Burdukiewicz, Michał Sobczyk, Piotr Rödiger, Stefan Duda-Madej, Anna Mackiewicz, Paweł Kotulska, Małgorzata |
author_facet | Burdukiewicz, Michał Sobczyk, Piotr Rödiger, Stefan Duda-Madej, Anna Mackiewicz, Paweł Kotulska, Małgorzata |
author_sort | Burdukiewicz, Michał |
collection | PubMed |
description | Amyloids are proteins associated with several clinical disorders, including Alzheimer’s, and Creutzfeldt-Jakob’s. Despite their diversity, all amyloid proteins can undergo aggregation initiated by short segments called hot spots. To find the patterns defining the hot spots, we trained predictors of amyloidogenicity, using n-grams and random forest classifiers. Since the amyloidogenicity may not depend on the exact sequence of amino acids but on their more general properties, we tested 524,284 reduced amino acid alphabets of different lengths (three to six letters) to find the alphabet providing the best performance in cross-validation. The predictor based on this alphabet, called AmyloGram, was benchmarked against the most popular tools for the detection of amyloid peptides using an external data set and obtained the highest values of performance measures (AUC: 0.90, MCC: 0.63). Our results showed sequential patterns in the amyloids which are strongly correlated with hydrophobicity, a tendency to form β-sheets, and lower flexibility of amino acid residues. Among the most informative n-grams of AmyloGram we identified 15 that were previously confirmed experimentally. AmyloGram is available as the web-server: http://smorfland.uni.wroc.pl/shiny/AmyloGram/ and as the R package AmyloGram. R scripts and data used to produce the results of this manuscript are available at http://github.com/michbur/AmyloGramAnalysis. |
format | Online Article Text |
id | pubmed-5636826 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-56368262017-10-18 Amyloidogenic motifs revealed by n-gram analysis Burdukiewicz, Michał Sobczyk, Piotr Rödiger, Stefan Duda-Madej, Anna Mackiewicz, Paweł Kotulska, Małgorzata Sci Rep Article Amyloids are proteins associated with several clinical disorders, including Alzheimer’s, and Creutzfeldt-Jakob’s. Despite their diversity, all amyloid proteins can undergo aggregation initiated by short segments called hot spots. To find the patterns defining the hot spots, we trained predictors of amyloidogenicity, using n-grams and random forest classifiers. Since the amyloidogenicity may not depend on the exact sequence of amino acids but on their more general properties, we tested 524,284 reduced amino acid alphabets of different lengths (three to six letters) to find the alphabet providing the best performance in cross-validation. The predictor based on this alphabet, called AmyloGram, was benchmarked against the most popular tools for the detection of amyloid peptides using an external data set and obtained the highest values of performance measures (AUC: 0.90, MCC: 0.63). Our results showed sequential patterns in the amyloids which are strongly correlated with hydrophobicity, a tendency to form β-sheets, and lower flexibility of amino acid residues. Among the most informative n-grams of AmyloGram we identified 15 that were previously confirmed experimentally. AmyloGram is available as the web-server: http://smorfland.uni.wroc.pl/shiny/AmyloGram/ and as the R package AmyloGram. R scripts and data used to produce the results of this manuscript are available at http://github.com/michbur/AmyloGramAnalysis. Nature Publishing Group UK 2017-10-11 /pmc/articles/PMC5636826/ /pubmed/29021608 http://dx.doi.org/10.1038/s41598-017-13210-9 Text en © The Author(s) 2017 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Burdukiewicz, Michał Sobczyk, Piotr Rödiger, Stefan Duda-Madej, Anna Mackiewicz, Paweł Kotulska, Małgorzata Amyloidogenic motifs revealed by n-gram analysis |
title | Amyloidogenic motifs revealed by n-gram analysis |
title_full | Amyloidogenic motifs revealed by n-gram analysis |
title_fullStr | Amyloidogenic motifs revealed by n-gram analysis |
title_full_unstemmed | Amyloidogenic motifs revealed by n-gram analysis |
title_short | Amyloidogenic motifs revealed by n-gram analysis |
title_sort | amyloidogenic motifs revealed by n-gram analysis |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5636826/ https://www.ncbi.nlm.nih.gov/pubmed/29021608 http://dx.doi.org/10.1038/s41598-017-13210-9 |
work_keys_str_mv | AT burdukiewiczmichał amyloidogenicmotifsrevealedbyngramanalysis AT sobczykpiotr amyloidogenicmotifsrevealedbyngramanalysis AT rodigerstefan amyloidogenicmotifsrevealedbyngramanalysis AT dudamadejanna amyloidogenicmotifsrevealedbyngramanalysis AT mackiewiczpaweł amyloidogenicmotifsrevealedbyngramanalysis AT kotulskamałgorzata amyloidogenicmotifsrevealedbyngramanalysis |