Cargando…

FISH Amyloid – a new method for finding amyloidogenic segments in proteins based on site specific co-occurence of aminoacids

BACKGROUND: Amyloids are proteins capable of forming fibrils whose intramolecular contact sites assume densely packed zipper pattern. Their oligomers can underlie serious diseases, e.g. Alzheimer’s and Parkinson’s diseases. Recent studies show that short segments of aminoacids can be responsible for...

Descripción completa

Detalles Bibliográficos
Autores principales: Gasior, Pawel, Kotulska, Malgorzata
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3941796/
https://www.ncbi.nlm.nih.gov/pubmed/24564523
http://dx.doi.org/10.1186/1471-2105-15-54
_version_ 1782305978211893248
author Gasior, Pawel
Kotulska, Malgorzata
author_facet Gasior, Pawel
Kotulska, Malgorzata
author_sort Gasior, Pawel
collection PubMed
description BACKGROUND: Amyloids are proteins capable of forming fibrils whose intramolecular contact sites assume densely packed zipper pattern. Their oligomers can underlie serious diseases, e.g. Alzheimer’s and Parkinson’s diseases. Recent studies show that short segments of aminoacids can be responsible for amyloidogenic properties of a protein. A few hundreds of such peptides have been experimentally found but experimental testing of all candidates is currently not feasible. Here we propose an original machine learning method for classification of aminoacid sequences, based on discovering a segment with a discriminative pattern of site-specific co-occurrences between sequence elements. The pattern is based on the positions of residues with correlated occurrence over a sliding window of a specified length. The algorithm first recognizes the most relevant training segment in each positive training instance. Then the classification is based on maximal distances between co-occurrence matrix of the relevant segments in positive training sequences and the matrix from negative training segments. The method was applied for studying sequences of aminoacids with regard to their amyloidogenic properties. RESULTS: Our method was first trained on available datasets of hexapeptides with the amyloidogenic classification, using 5 or 6-residue sliding windows. Depending on the choice of training and testing datasets, the area under ROC curve obtained the value up to 0.80 for experimental, and 0.95 for computationally generated (with 3D profile method) datasets. Importantly, the results on 5-residue segments were not significantly worse, although the classification required that algorithm first recognized the most relevant training segments. The dataset of long sequences, such as sup35 prion and a few other amyloid proteins, were applied to test the method and gave encouraging results. Our web tool FISH Amyloid was trained on all available experimental data 4-10 residues long, offers prediction of amyloidogenic segments in protein sequences. CONCLUSIONS: We proposed a new original classification method which recognizes co-occurrence patterns in sequences. The method reveals characteristic classification pattern of the data and finds the segments where its scoring is the strongest, also in long training sequences. Applied to the problem of amyloidogenic segments recognition, it showed a good potential for classification problems in bioinformatics.
format Online
Article
Text
id pubmed-3941796
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39417962014-03-14 FISH Amyloid – a new method for finding amyloidogenic segments in proteins based on site specific co-occurence of aminoacids Gasior, Pawel Kotulska, Malgorzata BMC Bioinformatics Research Article BACKGROUND: Amyloids are proteins capable of forming fibrils whose intramolecular contact sites assume densely packed zipper pattern. Their oligomers can underlie serious diseases, e.g. Alzheimer’s and Parkinson’s diseases. Recent studies show that short segments of aminoacids can be responsible for amyloidogenic properties of a protein. A few hundreds of such peptides have been experimentally found but experimental testing of all candidates is currently not feasible. Here we propose an original machine learning method for classification of aminoacid sequences, based on discovering a segment with a discriminative pattern of site-specific co-occurrences between sequence elements. The pattern is based on the positions of residues with correlated occurrence over a sliding window of a specified length. The algorithm first recognizes the most relevant training segment in each positive training instance. Then the classification is based on maximal distances between co-occurrence matrix of the relevant segments in positive training sequences and the matrix from negative training segments. The method was applied for studying sequences of aminoacids with regard to their amyloidogenic properties. RESULTS: Our method was first trained on available datasets of hexapeptides with the amyloidogenic classification, using 5 or 6-residue sliding windows. Depending on the choice of training and testing datasets, the area under ROC curve obtained the value up to 0.80 for experimental, and 0.95 for computationally generated (with 3D profile method) datasets. Importantly, the results on 5-residue segments were not significantly worse, although the classification required that algorithm first recognized the most relevant training segments. The dataset of long sequences, such as sup35 prion and a few other amyloid proteins, were applied to test the method and gave encouraging results. Our web tool FISH Amyloid was trained on all available experimental data 4-10 residues long, offers prediction of amyloidogenic segments in protein sequences. CONCLUSIONS: We proposed a new original classification method which recognizes co-occurrence patterns in sequences. The method reveals characteristic classification pattern of the data and finds the segments where its scoring is the strongest, also in long training sequences. Applied to the problem of amyloidogenic segments recognition, it showed a good potential for classification problems in bioinformatics. BioMed Central 2014-02-24 /pmc/articles/PMC3941796/ /pubmed/24564523 http://dx.doi.org/10.1186/1471-2105-15-54 Text en Copyright © 2014 Gasior and Kotulska; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Research Article
Gasior, Pawel
Kotulska, Malgorzata
FISH Amyloid – a new method for finding amyloidogenic segments in proteins based on site specific co-occurence of aminoacids
title FISH Amyloid – a new method for finding amyloidogenic segments in proteins based on site specific co-occurence of aminoacids
title_full FISH Amyloid – a new method for finding amyloidogenic segments in proteins based on site specific co-occurence of aminoacids
title_fullStr FISH Amyloid – a new method for finding amyloidogenic segments in proteins based on site specific co-occurence of aminoacids
title_full_unstemmed FISH Amyloid – a new method for finding amyloidogenic segments in proteins based on site specific co-occurence of aminoacids
title_short FISH Amyloid – a new method for finding amyloidogenic segments in proteins based on site specific co-occurence of aminoacids
title_sort fish amyloid – a new method for finding amyloidogenic segments in proteins based on site specific co-occurence of aminoacids
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3941796/
https://www.ncbi.nlm.nih.gov/pubmed/24564523
http://dx.doi.org/10.1186/1471-2105-15-54
work_keys_str_mv AT gasiorpawel fishamyloidanewmethodforfindingamyloidogenicsegmentsinproteinsbasedonsitespecificcooccurenceofaminoacids
AT kotulskamalgorzata fishamyloidanewmethodforfindingamyloidogenicsegmentsinproteinsbasedonsitespecificcooccurenceofaminoacids