Cargando…

Prediction of protein long-range contacts using an ensemble of genetic algorithm classifiers with sequence profile centers

BACKGROUND: Prediction of long-range inter-residue contacts is an important topic in bioinformatics research. It is helpful for determining protein structures, understanding protein foldings, and therefore advancing the annotation of protein functions. RESULTS: In this paper, we propose a novel ense...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Peng, Li, Jinyan
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2873825/
https://www.ncbi.nlm.nih.gov/pubmed/20487509
http://dx.doi.org/10.1186/1472-6807-10-S1-S2
_version_ 1782181402672889856
author Chen, Peng
Li, Jinyan
author_facet Chen, Peng
Li, Jinyan
author_sort Chen, Peng
collection PubMed
description BACKGROUND: Prediction of long-range inter-residue contacts is an important topic in bioinformatics research. It is helpful for determining protein structures, understanding protein foldings, and therefore advancing the annotation of protein functions. RESULTS: In this paper, we propose a novel ensemble of genetic algorithm classifiers (GaCs) to address the long-range contact prediction problem. Our method is based on the key idea called sequence profile centers (SPCs). Each SPC is the average sequence profiles of residue pairs belonging to the same contact class or non-contact class. GaCs train on multiple but different pairs of long-range contact data (positive data) and long-range non-contact data (negative data). The negative data sets, having roughly the same sizes as the positive ones, are constructed by random sampling over the original imbalanced negative data. As a result, about 21.5% long-range contacts are correctly predicted. We also found that the ensemble of GaCs indeed makes an accuracy improvement by around 5.6% over the single GaC. CONCLUSIONS: Classifiers with the use of sequence profile centers may advance the long-range contact prediction. In line with this approach, key structural features in proteins would be determined with high efficiency and accuracy.
format Text
id pubmed-2873825
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28738252010-05-21 Prediction of protein long-range contacts using an ensemble of genetic algorithm classifiers with sequence profile centers Chen, Peng Li, Jinyan BMC Struct Biol Research BACKGROUND: Prediction of long-range inter-residue contacts is an important topic in bioinformatics research. It is helpful for determining protein structures, understanding protein foldings, and therefore advancing the annotation of protein functions. RESULTS: In this paper, we propose a novel ensemble of genetic algorithm classifiers (GaCs) to address the long-range contact prediction problem. Our method is based on the key idea called sequence profile centers (SPCs). Each SPC is the average sequence profiles of residue pairs belonging to the same contact class or non-contact class. GaCs train on multiple but different pairs of long-range contact data (positive data) and long-range non-contact data (negative data). The negative data sets, having roughly the same sizes as the positive ones, are constructed by random sampling over the original imbalanced negative data. As a result, about 21.5% long-range contacts are correctly predicted. We also found that the ensemble of GaCs indeed makes an accuracy improvement by around 5.6% over the single GaC. CONCLUSIONS: Classifiers with the use of sequence profile centers may advance the long-range contact prediction. In line with this approach, key structural features in proteins would be determined with high efficiency and accuracy. BioMed Central 2010-05-17 /pmc/articles/PMC2873825/ /pubmed/20487509 http://dx.doi.org/10.1186/1472-6807-10-S1-S2 Text en Copyright ©2010 Li and Chen; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Chen, Peng
Li, Jinyan
Prediction of protein long-range contacts using an ensemble of genetic algorithm classifiers with sequence profile centers
title Prediction of protein long-range contacts using an ensemble of genetic algorithm classifiers with sequence profile centers
title_full Prediction of protein long-range contacts using an ensemble of genetic algorithm classifiers with sequence profile centers
title_fullStr Prediction of protein long-range contacts using an ensemble of genetic algorithm classifiers with sequence profile centers
title_full_unstemmed Prediction of protein long-range contacts using an ensemble of genetic algorithm classifiers with sequence profile centers
title_short Prediction of protein long-range contacts using an ensemble of genetic algorithm classifiers with sequence profile centers
title_sort prediction of protein long-range contacts using an ensemble of genetic algorithm classifiers with sequence profile centers
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2873825/
https://www.ncbi.nlm.nih.gov/pubmed/20487509
http://dx.doi.org/10.1186/1472-6807-10-S1-S2
work_keys_str_mv AT chenpeng predictionofproteinlongrangecontactsusinganensembleofgeneticalgorithmclassifierswithsequenceprofilecenters
AT lijinyan predictionofproteinlongrangecontactsusinganensembleofgeneticalgorithmclassifierswithsequenceprofilecenters