Cargando…

PhANNs, a fast and accurate tool and web server to classify phage structural proteins

For any given bacteriophage genome or phage-derived sequences in metagenomic data sets, we are unable to assign a function to 50–90% of genes, or more. Structural protein-encoding genes constitute a large fraction of the average phage genome and are among the most divergent and difficult-to-identify...

Descripción completa

Detalles Bibliográficos
Autores principales: Cantu, Vito Adrian, Salamon, Peter, Seguritan, Victor, Redfield, Jackson, Salamon, David, Edwards, Robert A., Segall, Anca M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7660903/
https://www.ncbi.nlm.nih.gov/pubmed/33137102
http://dx.doi.org/10.1371/journal.pcbi.1007845
_version_ 1783609108957495296
author Cantu, Vito Adrian
Salamon, Peter
Seguritan, Victor
Redfield, Jackson
Salamon, David
Edwards, Robert A.
Segall, Anca M.
author_facet Cantu, Vito Adrian
Salamon, Peter
Seguritan, Victor
Redfield, Jackson
Salamon, David
Edwards, Robert A.
Segall, Anca M.
author_sort Cantu, Vito Adrian
collection PubMed
description For any given bacteriophage genome or phage-derived sequences in metagenomic data sets, we are unable to assign a function to 50–90% of genes, or more. Structural protein-encoding genes constitute a large fraction of the average phage genome and are among the most divergent and difficult-to-identify genes using homology-based methods. To understand the functions encoded by phages, their contributions to their environments, and to help gauge their utility as potential phage therapy agents, we have developed a new approach to classify phage ORFs into ten major classes of structural proteins or into an “other” category. The resulting tool is named PhANNs (Phage Artificial Neural Networks). We built a database of 538,213 manually curated phage protein sequences that we split into eleven subsets (10 for cross-validation, one for testing) using a novel clustering method that ensures there are no homologous proteins between sets yet maintains the maximum sequence diversity for training. An Artificial Neural Network ensemble trained on features extracted from those sets reached a test F(1)-score of 0.875 and test accuracy of 86.2%. PhANNs can rapidly classify proteins into one of the ten structural classes or, if not predicted to fall in one of the ten classes, as “other,” providing a new approach for functional annotation of phage proteins. PhANNs is open source and can be run from our web server or installed locally.
format Online
Article
Text
id pubmed-7660903
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-76609032020-11-18 PhANNs, a fast and accurate tool and web server to classify phage structural proteins Cantu, Vito Adrian Salamon, Peter Seguritan, Victor Redfield, Jackson Salamon, David Edwards, Robert A. Segall, Anca M. PLoS Comput Biol Research Article For any given bacteriophage genome or phage-derived sequences in metagenomic data sets, we are unable to assign a function to 50–90% of genes, or more. Structural protein-encoding genes constitute a large fraction of the average phage genome and are among the most divergent and difficult-to-identify genes using homology-based methods. To understand the functions encoded by phages, their contributions to their environments, and to help gauge their utility as potential phage therapy agents, we have developed a new approach to classify phage ORFs into ten major classes of structural proteins or into an “other” category. The resulting tool is named PhANNs (Phage Artificial Neural Networks). We built a database of 538,213 manually curated phage protein sequences that we split into eleven subsets (10 for cross-validation, one for testing) using a novel clustering method that ensures there are no homologous proteins between sets yet maintains the maximum sequence diversity for training. An Artificial Neural Network ensemble trained on features extracted from those sets reached a test F(1)-score of 0.875 and test accuracy of 86.2%. PhANNs can rapidly classify proteins into one of the ten structural classes or, if not predicted to fall in one of the ten classes, as “other,” providing a new approach for functional annotation of phage proteins. PhANNs is open source and can be run from our web server or installed locally. Public Library of Science 2020-11-02 /pmc/articles/PMC7660903/ /pubmed/33137102 http://dx.doi.org/10.1371/journal.pcbi.1007845 Text en © 2020 Cantu et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Cantu, Vito Adrian
Salamon, Peter
Seguritan, Victor
Redfield, Jackson
Salamon, David
Edwards, Robert A.
Segall, Anca M.
PhANNs, a fast and accurate tool and web server to classify phage structural proteins
title PhANNs, a fast and accurate tool and web server to classify phage structural proteins
title_full PhANNs, a fast and accurate tool and web server to classify phage structural proteins
title_fullStr PhANNs, a fast and accurate tool and web server to classify phage structural proteins
title_full_unstemmed PhANNs, a fast and accurate tool and web server to classify phage structural proteins
title_short PhANNs, a fast and accurate tool and web server to classify phage structural proteins
title_sort phanns, a fast and accurate tool and web server to classify phage structural proteins
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7660903/
https://www.ncbi.nlm.nih.gov/pubmed/33137102
http://dx.doi.org/10.1371/journal.pcbi.1007845
work_keys_str_mv AT cantuvitoadrian phannsafastandaccuratetoolandwebservertoclassifyphagestructuralproteins
AT salamonpeter phannsafastandaccuratetoolandwebservertoclassifyphagestructuralproteins
AT seguritanvictor phannsafastandaccuratetoolandwebservertoclassifyphagestructuralproteins
AT redfieldjackson phannsafastandaccuratetoolandwebservertoclassifyphagestructuralproteins
AT salamondavid phannsafastandaccuratetoolandwebservertoclassifyphagestructuralproteins
AT edwardsroberta phannsafastandaccuratetoolandwebservertoclassifyphagestructuralproteins
AT segallancam phannsafastandaccuratetoolandwebservertoclassifyphagestructuralproteins