Cargando…

Integration of phenotypic metadata and protein similarity in Archaea using a spectral bipartitioning approach

In order to simplify and meaningfully categorize large sets of protein sequence data, it is commonplace to cluster proteins based on the similarity of those sequences. However, it quickly becomes clear that the sequence flexibility allowed a given protein varies significantly among different protein...

Descripción completa

Detalles Bibliográficos
Autores principales: Hooper, Sean D., Anderson, Iain J., Pati, Amrita, Dalevi, Daniel, Mavromatis, Konstantinos, Kyrpides, Nikos C.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2673424/
https://www.ncbi.nlm.nih.gov/pubmed/19223325
http://dx.doi.org/10.1093/nar/gkp075
_version_ 1782166585555812352
author Hooper, Sean D.
Anderson, Iain J.
Pati, Amrita
Dalevi, Daniel
Mavromatis, Konstantinos
Kyrpides, Nikos C.
author_facet Hooper, Sean D.
Anderson, Iain J.
Pati, Amrita
Dalevi, Daniel
Mavromatis, Konstantinos
Kyrpides, Nikos C.
author_sort Hooper, Sean D.
collection PubMed
description In order to simplify and meaningfully categorize large sets of protein sequence data, it is commonplace to cluster proteins based on the similarity of those sequences. However, it quickly becomes clear that the sequence flexibility allowed a given protein varies significantly among different protein families. The degree to which sequences are conserved not only differs for each protein family, but also is affected by the phylogenetic divergence of the source organisms. Clustering techniques that use similarity thresholds for protein families do not always allow for these variations and thus cannot be confidently used for applications such as automated annotation and phylogenetic profiling. In this work, we applied a spectral bipartitioning technique to all proteins from 53 archaeal genomes. Comparisons between different taxonomic levels allowed us to study the effects of phylogenetic distances on cluster structure. Likewise, by associating functional annotations and phenotypic metadata with each protein, we could compare our protein similarity clusters with both protein function and associated phenotype. Our clusters can be analyzed graphically and interactively online.
format Text
id pubmed-2673424
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-26734242009-05-15 Integration of phenotypic metadata and protein similarity in Archaea using a spectral bipartitioning approach Hooper, Sean D. Anderson, Iain J. Pati, Amrita Dalevi, Daniel Mavromatis, Konstantinos Kyrpides, Nikos C. Nucleic Acids Res Genomics In order to simplify and meaningfully categorize large sets of protein sequence data, it is commonplace to cluster proteins based on the similarity of those sequences. However, it quickly becomes clear that the sequence flexibility allowed a given protein varies significantly among different protein families. The degree to which sequences are conserved not only differs for each protein family, but also is affected by the phylogenetic divergence of the source organisms. Clustering techniques that use similarity thresholds for protein families do not always allow for these variations and thus cannot be confidently used for applications such as automated annotation and phylogenetic profiling. In this work, we applied a spectral bipartitioning technique to all proteins from 53 archaeal genomes. Comparisons between different taxonomic levels allowed us to study the effects of phylogenetic distances on cluster structure. Likewise, by associating functional annotations and phenotypic metadata with each protein, we could compare our protein similarity clusters with both protein function and associated phenotype. Our clusters can be analyzed graphically and interactively online. Oxford University Press 2009-04 2009-02-17 /pmc/articles/PMC2673424/ /pubmed/19223325 http://dx.doi.org/10.1093/nar/gkp075 Text en © 2009 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Genomics
Hooper, Sean D.
Anderson, Iain J.
Pati, Amrita
Dalevi, Daniel
Mavromatis, Konstantinos
Kyrpides, Nikos C.
Integration of phenotypic metadata and protein similarity in Archaea using a spectral bipartitioning approach
title Integration of phenotypic metadata and protein similarity in Archaea using a spectral bipartitioning approach
title_full Integration of phenotypic metadata and protein similarity in Archaea using a spectral bipartitioning approach
title_fullStr Integration of phenotypic metadata and protein similarity in Archaea using a spectral bipartitioning approach
title_full_unstemmed Integration of phenotypic metadata and protein similarity in Archaea using a spectral bipartitioning approach
title_short Integration of phenotypic metadata and protein similarity in Archaea using a spectral bipartitioning approach
title_sort integration of phenotypic metadata and protein similarity in archaea using a spectral bipartitioning approach
topic Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2673424/
https://www.ncbi.nlm.nih.gov/pubmed/19223325
http://dx.doi.org/10.1093/nar/gkp075
work_keys_str_mv AT hooperseand integrationofphenotypicmetadataandproteinsimilarityinarchaeausingaspectralbipartitioningapproach
AT andersoniainj integrationofphenotypicmetadataandproteinsimilarityinarchaeausingaspectralbipartitioningapproach
AT patiamrita integrationofphenotypicmetadataandproteinsimilarityinarchaeausingaspectralbipartitioningapproach
AT dalevidaniel integrationofphenotypicmetadataandproteinsimilarityinarchaeausingaspectralbipartitioningapproach
AT mavromatiskonstantinos integrationofphenotypicmetadataandproteinsimilarityinarchaeausingaspectralbipartitioningapproach
AT kyrpidesnikosc integrationofphenotypicmetadataandproteinsimilarityinarchaeausingaspectralbipartitioningapproach