Cargando…

Improvement in the prediction of the translation initiation site through balancing methods, inclusion of acquired knowledge and addition of features to sequences of mRNA

BACKGROUND: The accurate prediction of the initiation of translation in sequences of mRNA is an important activity for genome annotation. However, obtaining an accurate prediction is not always a simple task and can be modeled as a problem of classification between positive sequences (protein codifi...

Descripción completa

Detalles Bibliográficos
Autores principales: Silva, Lívia Márcia, de Souza Teixeira, Felipe Carvalho, Ortega, José Miguel, Zárate, Luis Enrique, Nobre, Cristiane Neri
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287592/
https://www.ncbi.nlm.nih.gov/pubmed/22369295
http://dx.doi.org/10.1186/1471-2164-12-S4-S9
_version_ 1782224698706231296
author Silva, Lívia Márcia
de Souza Teixeira, Felipe Carvalho
Ortega, José Miguel
Zárate, Luis Enrique
Nobre, Cristiane Neri
author_facet Silva, Lívia Márcia
de Souza Teixeira, Felipe Carvalho
Ortega, José Miguel
Zárate, Luis Enrique
Nobre, Cristiane Neri
author_sort Silva, Lívia Márcia
collection PubMed
description BACKGROUND: The accurate prediction of the initiation of translation in sequences of mRNA is an important activity for genome annotation. However, obtaining an accurate prediction is not always a simple task and can be modeled as a problem of classification between positive sequences (protein codifiers) and negative sequences (non-codifiers). The problem is highly imbalanced because each molecule of mRNA has a unique translation initiation site and various others that are not initiators. Therefore, this study focuses on the problem from the perspective of balancing classes and we present an undersampling balancing method, M-clus, which is based on clustering. The method also adds features to sequences and improves the performance of the classifier through the inclusion of knowledge obtained by the model, called InAKnow. RESULTS: Through this methodology, the measures of performance used (accuracy, sensitivity, specificity and adjusted accuracy) are greater than 93% for the Mus musculus and Rattus norvegicus organisms, and varied between 72.97% and 97.43% for the other organisms evaluated: Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Nasonia vitripennis. The precision increases significantly by 39% and 22.9% for Mus musculus and Rattus norvegicus, respectively, when the knowledge obtained by the model is included. For the other organisms, the precision increases by between 37.10% and 59.49%. The inclusion of certain features during training, for example, the presence of ATG in the upstream region of the Translation Initiation Site, improves the rate of sensitivity by approximately 7%. Using the M-Clus balancing method generates a significant increase in the rate of sensitivity from 51.39% to 91.55% (Mus musculus) and from 47.45% to 88.09% (Rattus norvegicus). CONCLUSIONS: In order to solve the problem of TIS prediction, the results indicate that the methodology proposed in this work is adequate, particularly when using the concept of acquired knowledge which increased the accuracy in all databases evaluated.
format Online
Article
Text
id pubmed-3287592
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32875922012-02-28 Improvement in the prediction of the translation initiation site through balancing methods, inclusion of acquired knowledge and addition of features to sequences of mRNA Silva, Lívia Márcia de Souza Teixeira, Felipe Carvalho Ortega, José Miguel Zárate, Luis Enrique Nobre, Cristiane Neri BMC Genomics Proceedings BACKGROUND: The accurate prediction of the initiation of translation in sequences of mRNA is an important activity for genome annotation. However, obtaining an accurate prediction is not always a simple task and can be modeled as a problem of classification between positive sequences (protein codifiers) and negative sequences (non-codifiers). The problem is highly imbalanced because each molecule of mRNA has a unique translation initiation site and various others that are not initiators. Therefore, this study focuses on the problem from the perspective of balancing classes and we present an undersampling balancing method, M-clus, which is based on clustering. The method also adds features to sequences and improves the performance of the classifier through the inclusion of knowledge obtained by the model, called InAKnow. RESULTS: Through this methodology, the measures of performance used (accuracy, sensitivity, specificity and adjusted accuracy) are greater than 93% for the Mus musculus and Rattus norvegicus organisms, and varied between 72.97% and 97.43% for the other organisms evaluated: Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Nasonia vitripennis. The precision increases significantly by 39% and 22.9% for Mus musculus and Rattus norvegicus, respectively, when the knowledge obtained by the model is included. For the other organisms, the precision increases by between 37.10% and 59.49%. The inclusion of certain features during training, for example, the presence of ATG in the upstream region of the Translation Initiation Site, improves the rate of sensitivity by approximately 7%. Using the M-Clus balancing method generates a significant increase in the rate of sensitivity from 51.39% to 91.55% (Mus musculus) and from 47.45% to 88.09% (Rattus norvegicus). CONCLUSIONS: In order to solve the problem of TIS prediction, the results indicate that the methodology proposed in this work is adequate, particularly when using the concept of acquired knowledge which increased the accuracy in all databases evaluated. BioMed Central 2011-12-22 /pmc/articles/PMC3287592/ /pubmed/22369295 http://dx.doi.org/10.1186/1471-2164-12-S4-S9 Text en Copyright ©2011 Silva et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Silva, Lívia Márcia
de Souza Teixeira, Felipe Carvalho
Ortega, José Miguel
Zárate, Luis Enrique
Nobre, Cristiane Neri
Improvement in the prediction of the translation initiation site through balancing methods, inclusion of acquired knowledge and addition of features to sequences of mRNA
title Improvement in the prediction of the translation initiation site through balancing methods, inclusion of acquired knowledge and addition of features to sequences of mRNA
title_full Improvement in the prediction of the translation initiation site through balancing methods, inclusion of acquired knowledge and addition of features to sequences of mRNA
title_fullStr Improvement in the prediction of the translation initiation site through balancing methods, inclusion of acquired knowledge and addition of features to sequences of mRNA
title_full_unstemmed Improvement in the prediction of the translation initiation site through balancing methods, inclusion of acquired knowledge and addition of features to sequences of mRNA
title_short Improvement in the prediction of the translation initiation site through balancing methods, inclusion of acquired knowledge and addition of features to sequences of mRNA
title_sort improvement in the prediction of the translation initiation site through balancing methods, inclusion of acquired knowledge and addition of features to sequences of mrna
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287592/
https://www.ncbi.nlm.nih.gov/pubmed/22369295
http://dx.doi.org/10.1186/1471-2164-12-S4-S9
work_keys_str_mv AT silvaliviamarcia improvementinthepredictionofthetranslationinitiationsitethroughbalancingmethodsinclusionofacquiredknowledgeandadditionoffeaturestosequencesofmrna
AT desouzateixeirafelipecarvalho improvementinthepredictionofthetranslationinitiationsitethroughbalancingmethodsinclusionofacquiredknowledgeandadditionoffeaturestosequencesofmrna
AT ortegajosemiguel improvementinthepredictionofthetranslationinitiationsitethroughbalancingmethodsinclusionofacquiredknowledgeandadditionoffeaturestosequencesofmrna
AT zarateluisenrique improvementinthepredictionofthetranslationinitiationsitethroughbalancingmethodsinclusionofacquiredknowledgeandadditionoffeaturestosequencesofmrna
AT nobrecristianeneri improvementinthepredictionofthetranslationinitiationsitethroughbalancingmethodsinclusionofacquiredknowledgeandadditionoffeaturestosequencesofmrna