Cargando…

PASS: Protein Annotation Surveillance Site for Protein Annotation Using Homologous Clusters, NLP, and Sequence Similarity Networks

Advances in genome sequencing have accelerated the growth of sequenced genomes but at a cost in the quality of genome annotation. At the same time, computational analysis is widely used for protein annotation, but a dearth of experimental verification has contributed to inaccurate annotation as well...

Descripción completa

Detalles Bibliográficos
Autores principales: Tao, Jin, Brayton, Kelly A., Broschat, Shira L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9581018/
https://www.ncbi.nlm.nih.gov/pubmed/36303767
http://dx.doi.org/10.3389/fbinf.2021.749008
_version_ 1784812523371888640
author Tao, Jin
Brayton, Kelly A.
Broschat, Shira L.
author_facet Tao, Jin
Brayton, Kelly A.
Broschat, Shira L.
author_sort Tao, Jin
collection PubMed
description Advances in genome sequencing have accelerated the growth of sequenced genomes but at a cost in the quality of genome annotation. At the same time, computational analysis is widely used for protein annotation, but a dearth of experimental verification has contributed to inaccurate annotation as well as to annotation error propagation. Thus, a tool to help life scientists with accurate protein annotation would be useful. In this work we describe a website we have developed, the Protein Annotation Surveillance Site (PASS), which provides such a tool. This website consists of three major components: a database of homologous clusters of more than eight million protein sequences deduced from the representative genomes of bacteria, archaea, eukarya, and viruses, together with sequence information; a machine-learning software tool which periodically queries the UniprotKB database to determine whether protein function has been experimentally verified; and a query-able webpage where the FASTA headers of sequences from the cluster best matching an input sequence are returned. The user can choose from these sequences to create a sequence similarity network to assist in annotation or else use their expert knowledge to choose an annotation from the cluster sequences. Illustrations demonstrating use of this website are presented.
format Online
Article
Text
id pubmed-9581018
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-95810182022-10-26 PASS: Protein Annotation Surveillance Site for Protein Annotation Using Homologous Clusters, NLP, and Sequence Similarity Networks Tao, Jin Brayton, Kelly A. Broschat, Shira L. Front Bioinform Bioinformatics Advances in genome sequencing have accelerated the growth of sequenced genomes but at a cost in the quality of genome annotation. At the same time, computational analysis is widely used for protein annotation, but a dearth of experimental verification has contributed to inaccurate annotation as well as to annotation error propagation. Thus, a tool to help life scientists with accurate protein annotation would be useful. In this work we describe a website we have developed, the Protein Annotation Surveillance Site (PASS), which provides such a tool. This website consists of three major components: a database of homologous clusters of more than eight million protein sequences deduced from the representative genomes of bacteria, archaea, eukarya, and viruses, together with sequence information; a machine-learning software tool which periodically queries the UniprotKB database to determine whether protein function has been experimentally verified; and a query-able webpage where the FASTA headers of sequences from the cluster best matching an input sequence are returned. The user can choose from these sequences to create a sequence similarity network to assist in annotation or else use their expert knowledge to choose an annotation from the cluster sequences. Illustrations demonstrating use of this website are presented. Frontiers Media S.A. 2021-09-29 /pmc/articles/PMC9581018/ /pubmed/36303767 http://dx.doi.org/10.3389/fbinf.2021.749008 Text en Copyright © 2021 Tao, Brayton and Broschat. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Bioinformatics
Tao, Jin
Brayton, Kelly A.
Broschat, Shira L.
PASS: Protein Annotation Surveillance Site for Protein Annotation Using Homologous Clusters, NLP, and Sequence Similarity Networks
title PASS: Protein Annotation Surveillance Site for Protein Annotation Using Homologous Clusters, NLP, and Sequence Similarity Networks
title_full PASS: Protein Annotation Surveillance Site for Protein Annotation Using Homologous Clusters, NLP, and Sequence Similarity Networks
title_fullStr PASS: Protein Annotation Surveillance Site for Protein Annotation Using Homologous Clusters, NLP, and Sequence Similarity Networks
title_full_unstemmed PASS: Protein Annotation Surveillance Site for Protein Annotation Using Homologous Clusters, NLP, and Sequence Similarity Networks
title_short PASS: Protein Annotation Surveillance Site for Protein Annotation Using Homologous Clusters, NLP, and Sequence Similarity Networks
title_sort pass: protein annotation surveillance site for protein annotation using homologous clusters, nlp, and sequence similarity networks
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9581018/
https://www.ncbi.nlm.nih.gov/pubmed/36303767
http://dx.doi.org/10.3389/fbinf.2021.749008
work_keys_str_mv AT taojin passproteinannotationsurveillancesiteforproteinannotationusinghomologousclustersnlpandsequencesimilaritynetworks
AT braytonkellya passproteinannotationsurveillancesiteforproteinannotationusinghomologousclustersnlpandsequencesimilaritynetworks
AT broschatshiral passproteinannotationsurveillancesiteforproteinannotationusinghomologousclustersnlpandsequencesimilaritynetworks