Cargando…

HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold

BACKGROUND: Protein superfamilies can be divided into subfamilies of proteins with different functional characteristics. Their sequences can be classified hierarchically, which is part of sequence function assignation. Typically, there are no clear subfamily hallmarks that would allow pattern-based...

Descripción completa

Detalles Bibliográficos
Autores principales: Pagnuco, Inti Anabela, Revuelta, María Victoria, Bondino, Hernán Gabriel, Brun, Marcel, ten Have, Arjen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5868777/
https://www.ncbi.nlm.nih.gov/pubmed/29579071
http://dx.doi.org/10.1371/journal.pone.0193757
_version_ 1783309182812815360
author Pagnuco, Inti Anabela
Revuelta, María Victoria
Bondino, Hernán Gabriel
Brun, Marcel
ten Have, Arjen
author_facet Pagnuco, Inti Anabela
Revuelta, María Victoria
Bondino, Hernán Gabriel
Brun, Marcel
ten Have, Arjen
author_sort Pagnuco, Inti Anabela
collection PubMed
description BACKGROUND: Protein superfamilies can be divided into subfamilies of proteins with different functional characteristics. Their sequences can be classified hierarchically, which is part of sequence function assignation. Typically, there are no clear subfamily hallmarks that would allow pattern-based function assignation by which this task is mostly achieved based on the similarity principle. This is hampered by the lack of a score cut-off that is both sensitive and specific. RESULTS: HMMER Cut-off Threshold Tool (HMMERCTTER) adds a reliable cut-off threshold to the popular HMMER. Using a high quality superfamily phylogeny, it clusters a set of training sequences such that the cluster-specific HMMER profiles show cluster or subfamily member detection with 100% precision and recall (P&R), thereby generating a specific threshold as inclusion cut-off. Profiles and thresholds are then used as classifiers to screen a target dataset. Iterative inclusion of novel sequences to groups and the corresponding HMMER profiles results in high sensitivity while specificity is maintained by imposing 100% P&R self detection. In three presented case studies of protein superfamilies, classification of large datasets with 100% precision was achieved with over 95% recall. Limits and caveats are presented and explained. CONCLUSIONS: HMMERCTTER is a promising protein superfamily sequence classifier provided high quality training datasets are used. It provides a decision support system that aids in the difficult task of sequence function assignation in the twilight zone of sequence similarity. All relevant data and source codes are available from the Github repository at the following URL: https://github.com/BBCMdP/HMMERCTTER.
format Online
Article
Text
id pubmed-5868777
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-58687772018-04-06 HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold Pagnuco, Inti Anabela Revuelta, María Victoria Bondino, Hernán Gabriel Brun, Marcel ten Have, Arjen PLoS One Research Article BACKGROUND: Protein superfamilies can be divided into subfamilies of proteins with different functional characteristics. Their sequences can be classified hierarchically, which is part of sequence function assignation. Typically, there are no clear subfamily hallmarks that would allow pattern-based function assignation by which this task is mostly achieved based on the similarity principle. This is hampered by the lack of a score cut-off that is both sensitive and specific. RESULTS: HMMER Cut-off Threshold Tool (HMMERCTTER) adds a reliable cut-off threshold to the popular HMMER. Using a high quality superfamily phylogeny, it clusters a set of training sequences such that the cluster-specific HMMER profiles show cluster or subfamily member detection with 100% precision and recall (P&R), thereby generating a specific threshold as inclusion cut-off. Profiles and thresholds are then used as classifiers to screen a target dataset. Iterative inclusion of novel sequences to groups and the corresponding HMMER profiles results in high sensitivity while specificity is maintained by imposing 100% P&R self detection. In three presented case studies of protein superfamilies, classification of large datasets with 100% precision was achieved with over 95% recall. Limits and caveats are presented and explained. CONCLUSIONS: HMMERCTTER is a promising protein superfamily sequence classifier provided high quality training datasets are used. It provides a decision support system that aids in the difficult task of sequence function assignation in the twilight zone of sequence similarity. All relevant data and source codes are available from the Github repository at the following URL: https://github.com/BBCMdP/HMMERCTTER. Public Library of Science 2018-03-26 /pmc/articles/PMC5868777/ /pubmed/29579071 http://dx.doi.org/10.1371/journal.pone.0193757 Text en © 2018 Pagnuco et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Pagnuco, Inti Anabela
Revuelta, María Victoria
Bondino, Hernán Gabriel
Brun, Marcel
ten Have, Arjen
HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold
title HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold
title_full HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold
title_fullStr HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold
title_full_unstemmed HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold
title_short HMMER Cut-off Threshold Tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold
title_sort hmmer cut-off threshold tool (hmmerctter): supervised classification of superfamily protein sequences with a reliable cut-off threshold
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5868777/
https://www.ncbi.nlm.nih.gov/pubmed/29579071
http://dx.doi.org/10.1371/journal.pone.0193757
work_keys_str_mv AT pagnucointianabela hmmercutoffthresholdtoolhmmercttersupervisedclassificationofsuperfamilyproteinsequenceswithareliablecutoffthreshold
AT revueltamariavictoria hmmercutoffthresholdtoolhmmercttersupervisedclassificationofsuperfamilyproteinsequenceswithareliablecutoffthreshold
AT bondinohernangabriel hmmercutoffthresholdtoolhmmercttersupervisedclassificationofsuperfamilyproteinsequenceswithareliablecutoffthreshold
AT brunmarcel hmmercutoffthresholdtoolhmmercttersupervisedclassificationofsuperfamilyproteinsequenceswithareliablecutoffthreshold
AT tenhavearjen hmmercutoffthresholdtoolhmmercttersupervisedclassificationofsuperfamilyproteinsequenceswithareliablecutoffthreshold