Cargando…
Integration of phenotypic metadata and protein similarity in Archaea using a spectral bipartitioning approach
In order to simplify and meaningfully categorize large sets of protein sequence data, it is commonplace to cluster proteins based on the similarity of those sequences. However, it quickly becomes clear that the sequence flexibility allowed a given protein varies significantly among different protein...
Autores principales: | , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2673424/ https://www.ncbi.nlm.nih.gov/pubmed/19223325 http://dx.doi.org/10.1093/nar/gkp075 |
_version_ | 1782166585555812352 |
---|---|
author | Hooper, Sean D. Anderson, Iain J. Pati, Amrita Dalevi, Daniel Mavromatis, Konstantinos Kyrpides, Nikos C. |
author_facet | Hooper, Sean D. Anderson, Iain J. Pati, Amrita Dalevi, Daniel Mavromatis, Konstantinos Kyrpides, Nikos C. |
author_sort | Hooper, Sean D. |
collection | PubMed |
description | In order to simplify and meaningfully categorize large sets of protein sequence data, it is commonplace to cluster proteins based on the similarity of those sequences. However, it quickly becomes clear that the sequence flexibility allowed a given protein varies significantly among different protein families. The degree to which sequences are conserved not only differs for each protein family, but also is affected by the phylogenetic divergence of the source organisms. Clustering techniques that use similarity thresholds for protein families do not always allow for these variations and thus cannot be confidently used for applications such as automated annotation and phylogenetic profiling. In this work, we applied a spectral bipartitioning technique to all proteins from 53 archaeal genomes. Comparisons between different taxonomic levels allowed us to study the effects of phylogenetic distances on cluster structure. Likewise, by associating functional annotations and phenotypic metadata with each protein, we could compare our protein similarity clusters with both protein function and associated phenotype. Our clusters can be analyzed graphically and interactively online. |
format | Text |
id | pubmed-2673424 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-26734242009-05-15 Integration of phenotypic metadata and protein similarity in Archaea using a spectral bipartitioning approach Hooper, Sean D. Anderson, Iain J. Pati, Amrita Dalevi, Daniel Mavromatis, Konstantinos Kyrpides, Nikos C. Nucleic Acids Res Genomics In order to simplify and meaningfully categorize large sets of protein sequence data, it is commonplace to cluster proteins based on the similarity of those sequences. However, it quickly becomes clear that the sequence flexibility allowed a given protein varies significantly among different protein families. The degree to which sequences are conserved not only differs for each protein family, but also is affected by the phylogenetic divergence of the source organisms. Clustering techniques that use similarity thresholds for protein families do not always allow for these variations and thus cannot be confidently used for applications such as automated annotation and phylogenetic profiling. In this work, we applied a spectral bipartitioning technique to all proteins from 53 archaeal genomes. Comparisons between different taxonomic levels allowed us to study the effects of phylogenetic distances on cluster structure. Likewise, by associating functional annotations and phenotypic metadata with each protein, we could compare our protein similarity clusters with both protein function and associated phenotype. Our clusters can be analyzed graphically and interactively online. Oxford University Press 2009-04 2009-02-17 /pmc/articles/PMC2673424/ /pubmed/19223325 http://dx.doi.org/10.1093/nar/gkp075 Text en © 2009 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Genomics Hooper, Sean D. Anderson, Iain J. Pati, Amrita Dalevi, Daniel Mavromatis, Konstantinos Kyrpides, Nikos C. Integration of phenotypic metadata and protein similarity in Archaea using a spectral bipartitioning approach |
title | Integration of phenotypic metadata and protein similarity in Archaea using a spectral bipartitioning approach |
title_full | Integration of phenotypic metadata and protein similarity in Archaea using a spectral bipartitioning approach |
title_fullStr | Integration of phenotypic metadata and protein similarity in Archaea using a spectral bipartitioning approach |
title_full_unstemmed | Integration of phenotypic metadata and protein similarity in Archaea using a spectral bipartitioning approach |
title_short | Integration of phenotypic metadata and protein similarity in Archaea using a spectral bipartitioning approach |
title_sort | integration of phenotypic metadata and protein similarity in archaea using a spectral bipartitioning approach |
topic | Genomics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2673424/ https://www.ncbi.nlm.nih.gov/pubmed/19223325 http://dx.doi.org/10.1093/nar/gkp075 |
work_keys_str_mv | AT hooperseand integrationofphenotypicmetadataandproteinsimilarityinarchaeausingaspectralbipartitioningapproach AT andersoniainj integrationofphenotypicmetadataandproteinsimilarityinarchaeausingaspectralbipartitioningapproach AT patiamrita integrationofphenotypicmetadataandproteinsimilarityinarchaeausingaspectralbipartitioningapproach AT dalevidaniel integrationofphenotypicmetadataandproteinsimilarityinarchaeausingaspectralbipartitioningapproach AT mavromatiskonstantinos integrationofphenotypicmetadataandproteinsimilarityinarchaeausingaspectralbipartitioningapproach AT kyrpidesnikosc integrationofphenotypicmetadataandproteinsimilarityinarchaeausingaspectralbipartitioningapproach |