Cargando…

Distinguishing Proteins From Arbitrary Amino Acid Sequences

What kinds of amino acid sequences could possibly be protein sequences? From all existing databases that we can find, known proteins are only a small fraction of all possible combinations of amino acids. Beginning with Sanger's first detailed determination of a protein sequence in 1952, previou...

Descripción completa

Detalles Bibliográficos
Autores principales: Yau, Stephen S.-T., Mao, Wei-Guang, Benson, Max, He, Rong Lucy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4302309/
https://www.ncbi.nlm.nih.gov/pubmed/25609314
http://dx.doi.org/10.1038/srep07972
_version_ 1782353774973550592
author Yau, Stephen S.-T.
Mao, Wei-Guang
Benson, Max
He, Rong Lucy
author_facet Yau, Stephen S.-T.
Mao, Wei-Guang
Benson, Max
He, Rong Lucy
author_sort Yau, Stephen S.-T.
collection PubMed
description What kinds of amino acid sequences could possibly be protein sequences? From all existing databases that we can find, known proteins are only a small fraction of all possible combinations of amino acids. Beginning with Sanger's first detailed determination of a protein sequence in 1952, previous studies have focused on describing the structure of existing protein sequences in order to construct the protein universe. No one, however, has developed a criteria for determining whether an arbitrary amino acid sequence can be a protein. Here we show that when the collection of arbitrary amino acid sequences is viewed in an appropriate geometric context, the protein sequences cluster together. This leads to a new computational test, described here, that has proved to be remarkably accurate at determining whether an arbitrary amino acid sequence can be a protein. Even more, if the results of this test indicate that the sequence can be a protein, and it is indeed a protein sequence, then its identity as a protein sequence is uniquely defined. We anticipate our computational test will be useful for those who are attempting to complete the job of discovering all proteins, or constructing the protein universe.
format Online
Article
Text
id pubmed-4302309
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-43023092015-01-27 Distinguishing Proteins From Arbitrary Amino Acid Sequences Yau, Stephen S.-T. Mao, Wei-Guang Benson, Max He, Rong Lucy Sci Rep Article What kinds of amino acid sequences could possibly be protein sequences? From all existing databases that we can find, known proteins are only a small fraction of all possible combinations of amino acids. Beginning with Sanger's first detailed determination of a protein sequence in 1952, previous studies have focused on describing the structure of existing protein sequences in order to construct the protein universe. No one, however, has developed a criteria for determining whether an arbitrary amino acid sequence can be a protein. Here we show that when the collection of arbitrary amino acid sequences is viewed in an appropriate geometric context, the protein sequences cluster together. This leads to a new computational test, described here, that has proved to be remarkably accurate at determining whether an arbitrary amino acid sequence can be a protein. Even more, if the results of this test indicate that the sequence can be a protein, and it is indeed a protein sequence, then its identity as a protein sequence is uniquely defined. We anticipate our computational test will be useful for those who are attempting to complete the job of discovering all proteins, or constructing the protein universe. Nature Publishing Group 2015-01-22 /pmc/articles/PMC4302309/ /pubmed/25609314 http://dx.doi.org/10.1038/srep07972 Text en Copyright © 2015, Macmillan Publishers Limited. All rights reserved http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder in order to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Yau, Stephen S.-T.
Mao, Wei-Guang
Benson, Max
He, Rong Lucy
Distinguishing Proteins From Arbitrary Amino Acid Sequences
title Distinguishing Proteins From Arbitrary Amino Acid Sequences
title_full Distinguishing Proteins From Arbitrary Amino Acid Sequences
title_fullStr Distinguishing Proteins From Arbitrary Amino Acid Sequences
title_full_unstemmed Distinguishing Proteins From Arbitrary Amino Acid Sequences
title_short Distinguishing Proteins From Arbitrary Amino Acid Sequences
title_sort distinguishing proteins from arbitrary amino acid sequences
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4302309/
https://www.ncbi.nlm.nih.gov/pubmed/25609314
http://dx.doi.org/10.1038/srep07972
work_keys_str_mv AT yaustephenst distinguishingproteinsfromarbitraryaminoacidsequences
AT maoweiguang distinguishingproteinsfromarbitraryaminoacidsequences
AT bensonmax distinguishingproteinsfromarbitraryaminoacidsequences
AT heronglucy distinguishingproteinsfromarbitraryaminoacidsequences