Cargando…
GFam: a platform for automatic annotation of gene families
We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation ac...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3479161/ https://www.ncbi.nlm.nih.gov/pubmed/22790981 http://dx.doi.org/10.1093/nar/gks631 |
_version_ | 1782247416148262912 |
---|---|
author | Sasidharan, Rajkumar Nepusz, Tamás Swarbreck, David Huala, Eva Paccanaro, Alberto |
author_facet | Sasidharan, Rajkumar Nepusz, Tamás Swarbreck, David Huala, Eva Paccanaro, Alberto |
author_sort | Sasidharan, Rajkumar |
collection | PubMed |
description | We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam’s capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/. |
format | Online Article Text |
id | pubmed-3479161 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-34791612012-10-24 GFam: a platform for automatic annotation of gene families Sasidharan, Rajkumar Nepusz, Tamás Swarbreck, David Huala, Eva Paccanaro, Alberto Nucleic Acids Res Methods Online We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam’s capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/. Oxford University Press 2012-10 2012-07-11 /pmc/articles/PMC3479161/ /pubmed/22790981 http://dx.doi.org/10.1093/nar/gks631 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Sasidharan, Rajkumar Nepusz, Tamás Swarbreck, David Huala, Eva Paccanaro, Alberto GFam: a platform for automatic annotation of gene families |
title | GFam: a platform for automatic annotation of gene families |
title_full | GFam: a platform for automatic annotation of gene families |
title_fullStr | GFam: a platform for automatic annotation of gene families |
title_full_unstemmed | GFam: a platform for automatic annotation of gene families |
title_short | GFam: a platform for automatic annotation of gene families |
title_sort | gfam: a platform for automatic annotation of gene families |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3479161/ https://www.ncbi.nlm.nih.gov/pubmed/22790981 http://dx.doi.org/10.1093/nar/gks631 |
work_keys_str_mv | AT sasidharanrajkumar gfamaplatformforautomaticannotationofgenefamilies AT nepusztamas gfamaplatformforautomaticannotationofgenefamilies AT swarbreckdavid gfamaplatformforautomaticannotationofgenefamilies AT hualaeva gfamaplatformforautomaticannotationofgenefamilies AT paccanaroalberto gfamaplatformforautomaticannotationofgenefamilies |