Cargando…

Peptide Vocabulary Analysis Reveals Ultra-Conservation and Homonymity in Protein Sequences

A new algorithm is presented for vocabulary analysis (word detection) in texts of human origin. It performs at 60%–70% overall accuracy and greater than 80% accuracy for longer words, and approximately 85% sensitivity on Alice in Wonderland, a considerable improvement on previous methods. When appli...

Descripción completa

Detalles Bibliográficos
Autor principal:	Gatherer, Derek
Formato:	Texto
Lenguaje:	English
Publicado:	Libertas Academica 2009
Materias:	Original Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2789693/ https://www.ncbi.nlm.nih.gov/pubmed/20066129

_version_	1782175060710129664
author	Gatherer, Derek
author_facet	Gatherer, Derek
author_sort	Gatherer, Derek
collection	PubMed
description	A new algorithm is presented for vocabulary analysis (word detection) in texts of human origin. It performs at 60%–70% overall accuracy and greater than 80% accuracy for longer words, and approximately 85% sensitivity on Alice in Wonderland, a considerable improvement on previous methods. When applied to protein sequences, it detects short sequences analogous to words in human texts, i.e. intolerant to changes in spelling (mutation), and relatively context-independent in their meaning (function). Some of these are homonyms of up to 7 amino acids, which can assume different structures in different proteins. Others are ultra-conserved stretches of up to 18 amino acids within proteins of less than 40% overall identity, reflecting extreme constraint or convergent evolution. Different species are found to have qualitatively different major peptide vocabularies, e.g. some are dominated by large gene families, while others are rich in simple repeats or dominated by internally repetitive proteins. This suggests the possibility of a peptide vocabulary signature, analogous to genome signatures in DNA. Homonyms may be useful in detecting convergent evolution and positive selection in protein evolution. Ultra-conserved words may be useful in identifying structures intolerant to substitution over long periods of evolutionary time.
format	Text
id	pubmed-2789693
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	Libertas Academica
record_format	MEDLINE/PubMed
spelling	pubmed-27896932010-01-11 Peptide Vocabulary Analysis Reveals Ultra-Conservation and Homonymity in Protein Sequences Gatherer, Derek Bioinform Biol Insights Original Research A new algorithm is presented for vocabulary analysis (word detection) in texts of human origin. It performs at 60%–70% overall accuracy and greater than 80% accuracy for longer words, and approximately 85% sensitivity on Alice in Wonderland, a considerable improvement on previous methods. When applied to protein sequences, it detects short sequences analogous to words in human texts, i.e. intolerant to changes in spelling (mutation), and relatively context-independent in their meaning (function). Some of these are homonyms of up to 7 amino acids, which can assume different structures in different proteins. Others are ultra-conserved stretches of up to 18 amino acids within proteins of less than 40% overall identity, reflecting extreme constraint or convergent evolution. Different species are found to have qualitatively different major peptide vocabularies, e.g. some are dominated by large gene families, while others are rich in simple repeats or dominated by internally repetitive proteins. This suggests the possibility of a peptide vocabulary signature, analogous to genome signatures in DNA. Homonyms may be useful in detecting convergent evolution and positive selection in protein evolution. Ultra-conserved words may be useful in identifying structures intolerant to substitution over long periods of evolutionary time. Libertas Academica 2009-11-24 /pmc/articles/PMC2789693/ /pubmed/20066129 Text en http://creativecommons.org/licenses/by/3.0 This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
spellingShingle	Original Research Gatherer, Derek Peptide Vocabulary Analysis Reveals Ultra-Conservation and Homonymity in Protein Sequences
title	Peptide Vocabulary Analysis Reveals Ultra-Conservation and Homonymity in Protein Sequences
title_full	Peptide Vocabulary Analysis Reveals Ultra-Conservation and Homonymity in Protein Sequences
title_fullStr	Peptide Vocabulary Analysis Reveals Ultra-Conservation and Homonymity in Protein Sequences
title_full_unstemmed	Peptide Vocabulary Analysis Reveals Ultra-Conservation and Homonymity in Protein Sequences
title_short	Peptide Vocabulary Analysis Reveals Ultra-Conservation and Homonymity in Protein Sequences
title_sort	peptide vocabulary analysis reveals ultra-conservation and homonymity in protein sequences
topic	Original Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2789693/ https://www.ncbi.nlm.nih.gov/pubmed/20066129
work_keys_str_mv	AT gathererderek peptidevocabularyanalysisrevealsultraconservationandhomonymityinproteinsequences

Peptide Vocabulary Analysis Reveals Ultra-Conservation and Homonymity in Protein Sequences

Ejemplares similares