Cargando…

A Novel Bioinformatics Strategy for Function Prediction of Poorly-Characterized Protein Genes Obtained from Metagenome Analyses

As a result of remarkable progresses of DNA sequencing technology, vast quantities of genomic sequences have been decoded. Homology search for amino acid sequences, such as BLAST, has become a basic tool for assigning functions of genes/proteins when genomic sequences are decoded. Although the homol...

Descripción completa

Detalles Bibliográficos
Autores principales: Abe, Takashi, Kanaya, Shigehiko, Uehara, Hiroshi, Ikemura, Toshimichi
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2762413/
https://www.ncbi.nlm.nih.gov/pubmed/19801558
http://dx.doi.org/10.1093/dnares/dsp018
_version_ 1782172919373234176
author Abe, Takashi
Kanaya, Shigehiko
Uehara, Hiroshi
Ikemura, Toshimichi
author_facet Abe, Takashi
Kanaya, Shigehiko
Uehara, Hiroshi
Ikemura, Toshimichi
author_sort Abe, Takashi
collection PubMed
description As a result of remarkable progresses of DNA sequencing technology, vast quantities of genomic sequences have been decoded. Homology search for amino acid sequences, such as BLAST, has become a basic tool for assigning functions of genes/proteins when genomic sequences are decoded. Although the homology search has clearly been a powerful and irreplaceable method, the functions of only 50% or fewer of genes can be predicted when a novel genome is decoded. A prediction method independent of the homology search is urgently needed. By analyzing oligonucleotide compositions in genomic sequences, we previously developed a modified Self-Organizing Map ‘BLSOM’ that clustered genomic fragments according to phylotype with no advance knowledge of phylotype. Using BLSOM for di-, tri- and tetrapeptide compositions, we developed a system to enable separation (self-organization) of proteins by function. Analyzing oligopeptide frequencies in proteins previously classified into COGs (clusters of orthologous groups of proteins), BLSOMs could faithfully reproduce the COG classifications. This indicated that proteins, whose functions are unknown because of lack of significant sequence similarity with function-known proteins, can be related to function-known proteins based on similarity in oligopeptide composition. BLSOM was applied to predict functions of vast quantities of proteins derived from mixed genomes in environmental samples.
format Text
id pubmed-2762413
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-27624132009-10-15 A Novel Bioinformatics Strategy for Function Prediction of Poorly-Characterized Protein Genes Obtained from Metagenome Analyses Abe, Takashi Kanaya, Shigehiko Uehara, Hiroshi Ikemura, Toshimichi DNA Res Full Papers As a result of remarkable progresses of DNA sequencing technology, vast quantities of genomic sequences have been decoded. Homology search for amino acid sequences, such as BLAST, has become a basic tool for assigning functions of genes/proteins when genomic sequences are decoded. Although the homology search has clearly been a powerful and irreplaceable method, the functions of only 50% or fewer of genes can be predicted when a novel genome is decoded. A prediction method independent of the homology search is urgently needed. By analyzing oligonucleotide compositions in genomic sequences, we previously developed a modified Self-Organizing Map ‘BLSOM’ that clustered genomic fragments according to phylotype with no advance knowledge of phylotype. Using BLSOM for di-, tri- and tetrapeptide compositions, we developed a system to enable separation (self-organization) of proteins by function. Analyzing oligopeptide frequencies in proteins previously classified into COGs (clusters of orthologous groups of proteins), BLSOMs could faithfully reproduce the COG classifications. This indicated that proteins, whose functions are unknown because of lack of significant sequence similarity with function-known proteins, can be related to function-known proteins based on similarity in oligopeptide composition. BLSOM was applied to predict functions of vast quantities of proteins derived from mixed genomes in environmental samples. Oxford University Press 2009-10 2009-10-03 /pmc/articles/PMC2762413/ /pubmed/19801558 http://dx.doi.org/10.1093/dnares/dsp018 Text en © The Author 2009. Published by Oxford University Press on behalf of Kazusa DNA Research Institute http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Full Papers
Abe, Takashi
Kanaya, Shigehiko
Uehara, Hiroshi
Ikemura, Toshimichi
A Novel Bioinformatics Strategy for Function Prediction of Poorly-Characterized Protein Genes Obtained from Metagenome Analyses
title A Novel Bioinformatics Strategy for Function Prediction of Poorly-Characterized Protein Genes Obtained from Metagenome Analyses
title_full A Novel Bioinformatics Strategy for Function Prediction of Poorly-Characterized Protein Genes Obtained from Metagenome Analyses
title_fullStr A Novel Bioinformatics Strategy for Function Prediction of Poorly-Characterized Protein Genes Obtained from Metagenome Analyses
title_full_unstemmed A Novel Bioinformatics Strategy for Function Prediction of Poorly-Characterized Protein Genes Obtained from Metagenome Analyses
title_short A Novel Bioinformatics Strategy for Function Prediction of Poorly-Characterized Protein Genes Obtained from Metagenome Analyses
title_sort novel bioinformatics strategy for function prediction of poorly-characterized protein genes obtained from metagenome analyses
topic Full Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2762413/
https://www.ncbi.nlm.nih.gov/pubmed/19801558
http://dx.doi.org/10.1093/dnares/dsp018
work_keys_str_mv AT abetakashi anovelbioinformaticsstrategyforfunctionpredictionofpoorlycharacterizedproteingenesobtainedfrommetagenomeanalyses
AT kanayashigehiko anovelbioinformaticsstrategyforfunctionpredictionofpoorlycharacterizedproteingenesobtainedfrommetagenomeanalyses
AT ueharahiroshi anovelbioinformaticsstrategyforfunctionpredictionofpoorlycharacterizedproteingenesobtainedfrommetagenomeanalyses
AT ikemuratoshimichi anovelbioinformaticsstrategyforfunctionpredictionofpoorlycharacterizedproteingenesobtainedfrommetagenomeanalyses
AT abetakashi novelbioinformaticsstrategyforfunctionpredictionofpoorlycharacterizedproteingenesobtainedfrommetagenomeanalyses
AT kanayashigehiko novelbioinformaticsstrategyforfunctionpredictionofpoorlycharacterizedproteingenesobtainedfrommetagenomeanalyses
AT ueharahiroshi novelbioinformaticsstrategyforfunctionpredictionofpoorlycharacterizedproteingenesobtainedfrommetagenomeanalyses
AT ikemuratoshimichi novelbioinformaticsstrategyforfunctionpredictionofpoorlycharacterizedproteingenesobtainedfrommetagenomeanalyses