Cargando…
PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database
The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Reso...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4744870/ https://www.ncbi.nlm.nih.gov/pubmed/26903996 http://dx.doi.org/10.3389/fmicb.2016.00118 |
_version_ | 1782414541107232768 |
---|---|
author | Davis, James J. Gerdes, Svetlana Olsen, Gary J. Olson, Robert Pusch, Gordon D. Shukla, Maulik Vonstein, Veronika Wattam, Alice R. Yoo, Hyunseung |
author_facet | Davis, James J. Gerdes, Svetlana Olsen, Gary J. Olson, Robert Pusch, Gordon D. Shukla, Maulik Vonstein, Veronika Wattam, Alice R. Yoo, Hyunseung |
author_sort | Davis, James J. |
collection | PubMed |
description | The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods. |
format | Online Article Text |
id | pubmed-4744870 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-47448702016-02-22 PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database Davis, James J. Gerdes, Svetlana Olsen, Gary J. Olson, Robert Pusch, Gordon D. Shukla, Maulik Vonstein, Veronika Wattam, Alice R. Yoo, Hyunseung Front Microbiol Microbiology The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods. Frontiers Media S.A. 2016-02-08 /pmc/articles/PMC4744870/ /pubmed/26903996 http://dx.doi.org/10.3389/fmicb.2016.00118 Text en Copyright © 2016 Davis, Gerdes, Olsen, Olson, Pusch, Shukla, Vonstein, Wattam and Yoo. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Microbiology Davis, James J. Gerdes, Svetlana Olsen, Gary J. Olson, Robert Pusch, Gordon D. Shukla, Maulik Vonstein, Veronika Wattam, Alice R. Yoo, Hyunseung PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database |
title | PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database |
title_full | PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database |
title_fullStr | PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database |
title_full_unstemmed | PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database |
title_short | PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database |
title_sort | pattyfams: protein families for the microbial genomes in the patric database |
topic | Microbiology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4744870/ https://www.ncbi.nlm.nih.gov/pubmed/26903996 http://dx.doi.org/10.3389/fmicb.2016.00118 |
work_keys_str_mv | AT davisjamesj pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT gerdessvetlana pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT olsengaryj pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT olsonrobert pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT puschgordond pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT shuklamaulik pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT vonsteinveronika pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT wattamalicer pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase AT yoohyunseung pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase |