Cargando…

Accurate Prediction of Protein Functional Class From Sequence in the Mycobacterium Tuberculosis and Escherichia Coli Genomes Using Data Mining

The analysis of genomics data needs to become as automated as its generation. Here we present a novel data-mining approach to predicting protein functional class from sequence. This method is based on a combination of inductive logic programming clustering and rule learning. We demonstrate the effec...

Descripción completa

Detalles Bibliográficos
Autores principales: King, Ross D., Karwath, Andreas, Clare, Amanda, Dehaspe, Luc
Formato: Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2000
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2448385/
https://www.ncbi.nlm.nih.gov/pubmed/11119305
http://dx.doi.org/10.1002/1097-0061(200012)17:4<283::AID-YEA52>3.0.CO;2-F
_version_ 1782157122679603200
author King, Ross D.
Karwath, Andreas
Clare, Amanda
Dehaspe, Luc
author_facet King, Ross D.
Karwath, Andreas
Clare, Amanda
Dehaspe, Luc
author_sort King, Ross D.
collection PubMed
description The analysis of genomics data needs to become as automated as its generation. Here we present a novel data-mining approach to predicting protein functional class from sequence. This method is based on a combination of inductive logic programming clustering and rule learning. We demonstrate the effectiveness of this approach on the M. tuberculosis and E. coli genomes, and identify biologically interpretable rules which predict protein functional class from information only available from the sequence. These rules predict 65% of the ORFs with no assigned function in M. tuberculosis and 24% of those in E. coli, with an estimated accuracy of 60–80% (depending on the level of functional assignment). The rules are founded on a combination of detection of remote homology, convergent evolution and horizontal gene transfer. We identify rules that predict protein functional class even in the absence of detectable sequence or structural homology. These rules give insight into the evolutionary history of M. tuberculosis and E. coli.
format Text
id pubmed-2448385
institution National Center for Biotechnology Information
language English
publishDate 2000
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-24483852008-07-14 Accurate Prediction of Protein Functional Class From Sequence in the Mycobacterium Tuberculosis and Escherichia Coli Genomes Using Data Mining King, Ross D. Karwath, Andreas Clare, Amanda Dehaspe, Luc Yeast Research Article The analysis of genomics data needs to become as automated as its generation. Here we present a novel data-mining approach to predicting protein functional class from sequence. This method is based on a combination of inductive logic programming clustering and rule learning. We demonstrate the effectiveness of this approach on the M. tuberculosis and E. coli genomes, and identify biologically interpretable rules which predict protein functional class from information only available from the sequence. These rules predict 65% of the ORFs with no assigned function in M. tuberculosis and 24% of those in E. coli, with an estimated accuracy of 60–80% (depending on the level of functional assignment). The rules are founded on a combination of detection of remote homology, convergent evolution and horizontal gene transfer. We identify rules that predict protein functional class even in the absence of detectable sequence or structural homology. These rules give insight into the evolutionary history of M. tuberculosis and E. coli. Hindawi Publishing Corporation 2000 /pmc/articles/PMC2448385/ /pubmed/11119305 http://dx.doi.org/10.1002/1097-0061(200012)17:4<283::AID-YEA52>3.0.CO;2-F Text en Copyright © 2000 Hindawi Publishing Corporation. http://creativecommons.org/licenses/by/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
King, Ross D.
Karwath, Andreas
Clare, Amanda
Dehaspe, Luc
Accurate Prediction of Protein Functional Class From Sequence in the Mycobacterium Tuberculosis and Escherichia Coli Genomes Using Data Mining
title Accurate Prediction of Protein Functional Class From Sequence in the Mycobacterium Tuberculosis and Escherichia Coli Genomes Using Data Mining
title_full Accurate Prediction of Protein Functional Class From Sequence in the Mycobacterium Tuberculosis and Escherichia Coli Genomes Using Data Mining
title_fullStr Accurate Prediction of Protein Functional Class From Sequence in the Mycobacterium Tuberculosis and Escherichia Coli Genomes Using Data Mining
title_full_unstemmed Accurate Prediction of Protein Functional Class From Sequence in the Mycobacterium Tuberculosis and Escherichia Coli Genomes Using Data Mining
title_short Accurate Prediction of Protein Functional Class From Sequence in the Mycobacterium Tuberculosis and Escherichia Coli Genomes Using Data Mining
title_sort accurate prediction of protein functional class from sequence in the mycobacterium tuberculosis and escherichia coli genomes using data mining
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2448385/
https://www.ncbi.nlm.nih.gov/pubmed/11119305
http://dx.doi.org/10.1002/1097-0061(200012)17:4<283::AID-YEA52>3.0.CO;2-F
work_keys_str_mv AT kingrossd accuratepredictionofproteinfunctionalclassfromsequenceinthemycobacteriumtuberculosisandescherichiacoligenomesusingdatamining
AT karwathandreas accuratepredictionofproteinfunctionalclassfromsequenceinthemycobacteriumtuberculosisandescherichiacoligenomesusingdatamining
AT clareamanda accuratepredictionofproteinfunctionalclassfromsequenceinthemycobacteriumtuberculosisandescherichiacoligenomesusingdatamining
AT dehaspeluc accuratepredictionofproteinfunctionalclassfromsequenceinthemycobacteriumtuberculosisandescherichiacoligenomesusingdatamining