Cargando…

PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database

The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Reso...

Descripción completa

Detalles Bibliográficos
Autores principales: Davis, James J., Gerdes, Svetlana, Olsen, Gary J., Olson, Robert, Pusch, Gordon D., Shukla, Maulik, Vonstein, Veronika, Wattam, Alice R., Yoo, Hyunseung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4744870/
https://www.ncbi.nlm.nih.gov/pubmed/26903996
http://dx.doi.org/10.3389/fmicb.2016.00118
_version_ 1782414541107232768
author Davis, James J.
Gerdes, Svetlana
Olsen, Gary J.
Olson, Robert
Pusch, Gordon D.
Shukla, Maulik
Vonstein, Veronika
Wattam, Alice R.
Yoo, Hyunseung
author_facet Davis, James J.
Gerdes, Svetlana
Olsen, Gary J.
Olson, Robert
Pusch, Gordon D.
Shukla, Maulik
Vonstein, Veronika
Wattam, Alice R.
Yoo, Hyunseung
author_sort Davis, James J.
collection PubMed
description The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.
format Online
Article
Text
id pubmed-4744870
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-47448702016-02-22 PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database Davis, James J. Gerdes, Svetlana Olsen, Gary J. Olson, Robert Pusch, Gordon D. Shukla, Maulik Vonstein, Veronika Wattam, Alice R. Yoo, Hyunseung Front Microbiol Microbiology The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods. Frontiers Media S.A. 2016-02-08 /pmc/articles/PMC4744870/ /pubmed/26903996 http://dx.doi.org/10.3389/fmicb.2016.00118 Text en Copyright © 2016 Davis, Gerdes, Olsen, Olson, Pusch, Shukla, Vonstein, Wattam and Yoo. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Davis, James J.
Gerdes, Svetlana
Olsen, Gary J.
Olson, Robert
Pusch, Gordon D.
Shukla, Maulik
Vonstein, Veronika
Wattam, Alice R.
Yoo, Hyunseung
PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database
title PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database
title_full PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database
title_fullStr PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database
title_full_unstemmed PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database
title_short PATtyFams: Protein Families for the Microbial Genomes in the PATRIC Database
title_sort pattyfams: protein families for the microbial genomes in the patric database
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4744870/
https://www.ncbi.nlm.nih.gov/pubmed/26903996
http://dx.doi.org/10.3389/fmicb.2016.00118
work_keys_str_mv AT davisjamesj pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT gerdessvetlana pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT olsengaryj pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT olsonrobert pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT puschgordond pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT shuklamaulik pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT vonsteinveronika pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT wattamalicer pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase
AT yoohyunseung pattyfamsproteinfamiliesforthemicrobialgenomesinthepatricdatabase