Cargando…

Transductive learning as an alternative to translation initiation site identification

BACKGROUND: The correct protein coding region identification is an important and latent problem in the molecular biology field. This problem becomes a challenge due to the lack of deep knowledge about the biological systems and unfamiliarity of conservative characteristics in the messenger RNA (mRNA...

Descripción completa

Detalles Bibliográficos
Autores principales: Nunes Pinto, Cristiano Lacerda, Nobre, Cristiane Neri, Zárate, Luis Enrique
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5290616/
https://www.ncbi.nlm.nih.gov/pubmed/28152994
http://dx.doi.org/10.1186/s12859-017-1502-6
_version_ 1782504667843919872
author Nunes Pinto, Cristiano Lacerda
Nobre, Cristiane Neri
Zárate, Luis Enrique
author_facet Nunes Pinto, Cristiano Lacerda
Nobre, Cristiane Neri
Zárate, Luis Enrique
author_sort Nunes Pinto, Cristiano Lacerda
collection PubMed
description BACKGROUND: The correct protein coding region identification is an important and latent problem in the molecular biology field. This problem becomes a challenge due to the lack of deep knowledge about the biological systems and unfamiliarity of conservative characteristics in the messenger RNA (mRNA). Therefore, it is fundamental to research for computational methods aiming to help the patterns discovery for identification of the Translation Initiation Sites (TIS). In the field of Bioinformatics, machine learning methods have been widely applied based on the inductive inference, as Inductive Support Vector Machine (ISVM). On the other hand, not so much attention has been given to transductive inference-based machine learning methods such as Transductive Support Vector Machine (TSVM). The transductive inference performs well for problems in which the amount of unlabeled sequences is considerably greater than the labeled ones. Similarly, the problem of predicting the TIS may take advantage of transductive methods due to the fact that the amount of new sequences grows rapidly with the progress of Genome Project that allows the study of new organisms. Consequently, this work aims to investigate the transductive learning towards TIS identification and compare the results with those obtained in inductive method. RESULTS: The transductive inference presents better results both in F-measure and in sensitivity in comparison with the inductive method for predicting the TIS. Additionally, it presents the least failure rate for identifying the TIS, presenting a smaller number of False Negatives (FN) than the ISVM. The ISVM and TSVM methods were validated with the molecules from the most representative organisms contained in the RefSeq database: Rattus norvegicus, Mus musculus, Homo sapiens, Drosophila melanogaster and Arabidopsis thaliana. The transductive method presented F-measure and sensitivity higher than 90% and also higher than the results obtained with ISVM. The ISVM and TSVM approaches were implemented in the TransduTIS tool, TransduTIS-I and TransduTIS-T respectively, available in a web interface. These approaches were compared with the TISHunter, TIS Miner, NetStart tools, presenting satisfactory results. CONCLUSIONS: In relation to precision, the results are similar for the ISVM and TSVM classifiers. However, the results show that the application of TSVM approach ensured an improvement, specially for F-measure and sensitivity. Moreover, it was possible to identify a potential for the application of TSVM, which is for organisms in the initial study phase with few identified sequences in the databases. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1502-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5290616
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-52906162017-02-09 Transductive learning as an alternative to translation initiation site identification Nunes Pinto, Cristiano Lacerda Nobre, Cristiane Neri Zárate, Luis Enrique BMC Bioinformatics Methodology Article BACKGROUND: The correct protein coding region identification is an important and latent problem in the molecular biology field. This problem becomes a challenge due to the lack of deep knowledge about the biological systems and unfamiliarity of conservative characteristics in the messenger RNA (mRNA). Therefore, it is fundamental to research for computational methods aiming to help the patterns discovery for identification of the Translation Initiation Sites (TIS). In the field of Bioinformatics, machine learning methods have been widely applied based on the inductive inference, as Inductive Support Vector Machine (ISVM). On the other hand, not so much attention has been given to transductive inference-based machine learning methods such as Transductive Support Vector Machine (TSVM). The transductive inference performs well for problems in which the amount of unlabeled sequences is considerably greater than the labeled ones. Similarly, the problem of predicting the TIS may take advantage of transductive methods due to the fact that the amount of new sequences grows rapidly with the progress of Genome Project that allows the study of new organisms. Consequently, this work aims to investigate the transductive learning towards TIS identification and compare the results with those obtained in inductive method. RESULTS: The transductive inference presents better results both in F-measure and in sensitivity in comparison with the inductive method for predicting the TIS. Additionally, it presents the least failure rate for identifying the TIS, presenting a smaller number of False Negatives (FN) than the ISVM. The ISVM and TSVM methods were validated with the molecules from the most representative organisms contained in the RefSeq database: Rattus norvegicus, Mus musculus, Homo sapiens, Drosophila melanogaster and Arabidopsis thaliana. The transductive method presented F-measure and sensitivity higher than 90% and also higher than the results obtained with ISVM. The ISVM and TSVM approaches were implemented in the TransduTIS tool, TransduTIS-I and TransduTIS-T respectively, available in a web interface. These approaches were compared with the TISHunter, TIS Miner, NetStart tools, presenting satisfactory results. CONCLUSIONS: In relation to precision, the results are similar for the ISVM and TSVM classifiers. However, the results show that the application of TSVM approach ensured an improvement, specially for F-measure and sensitivity. Moreover, it was possible to identify a potential for the application of TSVM, which is for organisms in the initial study phase with few identified sequences in the databases. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1502-6) contains supplementary material, which is available to authorized users. BioMed Central 2017-02-02 /pmc/articles/PMC5290616/ /pubmed/28152994 http://dx.doi.org/10.1186/s12859-017-1502-6 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Nunes Pinto, Cristiano Lacerda
Nobre, Cristiane Neri
Zárate, Luis Enrique
Transductive learning as an alternative to translation initiation site identification
title Transductive learning as an alternative to translation initiation site identification
title_full Transductive learning as an alternative to translation initiation site identification
title_fullStr Transductive learning as an alternative to translation initiation site identification
title_full_unstemmed Transductive learning as an alternative to translation initiation site identification
title_short Transductive learning as an alternative to translation initiation site identification
title_sort transductive learning as an alternative to translation initiation site identification
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5290616/
https://www.ncbi.nlm.nih.gov/pubmed/28152994
http://dx.doi.org/10.1186/s12859-017-1502-6
work_keys_str_mv AT nunespintocristianolacerda transductivelearningasanalternativetotranslationinitiationsiteidentification
AT nobrecristianeneri transductivelearningasanalternativetotranslationinitiationsiteidentification
AT zarateluisenrique transductivelearningasanalternativetotranslationinitiationsiteidentification