Cargando…

Cluster analysis identifies aminoacid compositional features that indicate Toxoplasma gondii adhesin proteins

Toxoplasma gondii invade host cells using a multi-step process that depends on the regulated secretion of adhesions. To identify key primary sequence features of adhesins in this parasite, we analyze the relative frequency of individual amino acids, their dipeptide frequencies, and the polarity, pol...

Descripción completa

Detalles Bibliográficos
Autores principales: Arenas, Ailan F, Salcedo, Gladys E, Moncada, Diego M, Erazo, Diego A, Osorio, Juan F, Gomez-Marin, Jorge E
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Biomedical Informatics 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3488833/
https://www.ncbi.nlm.nih.gov/pubmed/23144551
http://dx.doi.org/10.6026/97320630008916
_version_ 1782248684965068800
author Arenas, Ailan F
Salcedo, Gladys E
Moncada, Diego M
Erazo, Diego A
Osorio, Juan F
Gomez-Marin, Jorge E
author_facet Arenas, Ailan F
Salcedo, Gladys E
Moncada, Diego M
Erazo, Diego A
Osorio, Juan F
Gomez-Marin, Jorge E
author_sort Arenas, Ailan F
collection PubMed
description Toxoplasma gondii invade host cells using a multi-step process that depends on the regulated secretion of adhesions. To identify key primary sequence features of adhesins in this parasite, we analyze the relative frequency of individual amino acids, their dipeptide frequencies, and the polarity, polarizability and Van der Waals volume of the individual amino acids by using cluster analysis. This method identified cysteine as a key amino acid in the Toxoplasma adhesin group. The best vector algorithm of non-concatenated features was for 2 attributes: the single amino acid relative frequency and the dipeptide frequency. Polarity, polarizability and Van der Waals volume were not good classificatory attributes. Single amino acid attributes clustered unambiguously 67 apicomplexan hypothetical adhesins. This algorithm was also useful for clustering hypothetical Toxoplasma target host receptors. All of the cluster performances had over 70% sensitivity and 80% specificity. Compositional aminoacid data can be useful for improving machine learning-based prediction software when homology and structural data are not sufficient.
format Online
Article
Text
id pubmed-3488833
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Biomedical Informatics
record_format MEDLINE/PubMed
spelling pubmed-34888332012-11-09 Cluster analysis identifies aminoacid compositional features that indicate Toxoplasma gondii adhesin proteins Arenas, Ailan F Salcedo, Gladys E Moncada, Diego M Erazo, Diego A Osorio, Juan F Gomez-Marin, Jorge E Bioinformation Hypothesis Toxoplasma gondii invade host cells using a multi-step process that depends on the regulated secretion of adhesions. To identify key primary sequence features of adhesins in this parasite, we analyze the relative frequency of individual amino acids, their dipeptide frequencies, and the polarity, polarizability and Van der Waals volume of the individual amino acids by using cluster analysis. This method identified cysteine as a key amino acid in the Toxoplasma adhesin group. The best vector algorithm of non-concatenated features was for 2 attributes: the single amino acid relative frequency and the dipeptide frequency. Polarity, polarizability and Van der Waals volume were not good classificatory attributes. Single amino acid attributes clustered unambiguously 67 apicomplexan hypothetical adhesins. This algorithm was also useful for clustering hypothetical Toxoplasma target host receptors. All of the cluster performances had over 70% sensitivity and 80% specificity. Compositional aminoacid data can be useful for improving machine learning-based prediction software when homology and structural data are not sufficient. Biomedical Informatics 2012-10-01 /pmc/articles/PMC3488833/ /pubmed/23144551 http://dx.doi.org/10.6026/97320630008916 Text en © 2012 Biomedical Informatics This is an open-access article, which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original author and source are credited.
spellingShingle Hypothesis
Arenas, Ailan F
Salcedo, Gladys E
Moncada, Diego M
Erazo, Diego A
Osorio, Juan F
Gomez-Marin, Jorge E
Cluster analysis identifies aminoacid compositional features that indicate Toxoplasma gondii adhesin proteins
title Cluster analysis identifies aminoacid compositional features that indicate Toxoplasma gondii adhesin proteins
title_full Cluster analysis identifies aminoacid compositional features that indicate Toxoplasma gondii adhesin proteins
title_fullStr Cluster analysis identifies aminoacid compositional features that indicate Toxoplasma gondii adhesin proteins
title_full_unstemmed Cluster analysis identifies aminoacid compositional features that indicate Toxoplasma gondii adhesin proteins
title_short Cluster analysis identifies aminoacid compositional features that indicate Toxoplasma gondii adhesin proteins
title_sort cluster analysis identifies aminoacid compositional features that indicate toxoplasma gondii adhesin proteins
topic Hypothesis
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3488833/
https://www.ncbi.nlm.nih.gov/pubmed/23144551
http://dx.doi.org/10.6026/97320630008916
work_keys_str_mv AT arenasailanf clusteranalysisidentifiesaminoacidcompositionalfeaturesthatindicatetoxoplasmagondiiadhesinproteins
AT salcedogladyse clusteranalysisidentifiesaminoacidcompositionalfeaturesthatindicatetoxoplasmagondiiadhesinproteins
AT moncadadiegom clusteranalysisidentifiesaminoacidcompositionalfeaturesthatindicatetoxoplasmagondiiadhesinproteins
AT erazodiegoa clusteranalysisidentifiesaminoacidcompositionalfeaturesthatindicatetoxoplasmagondiiadhesinproteins
AT osoriojuanf clusteranalysisidentifiesaminoacidcompositionalfeaturesthatindicatetoxoplasmagondiiadhesinproteins
AT gomezmarinjorgee clusteranalysisidentifiesaminoacidcompositionalfeaturesthatindicatetoxoplasmagondiiadhesinproteins