Cargando…
Photosynthetic protein classification using genome neighborhood-based machine learning feature
Identification of novel photosynthetic proteins is important for understanding and improving photosynthetic efficiency. Synergistically, genome neighborhood can provide additional useful information to identify photosynthetic proteins. We, therefore, expected that applying a computational approach,...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7189237/ https://www.ncbi.nlm.nih.gov/pubmed/32346070 http://dx.doi.org/10.1038/s41598-020-64053-w |
_version_ | 1783527462556139520 |
---|---|
author | Sangphukieo, Apiwat Laomettachit, Teeraphan Ruengjitchatchawalya, Marasri |
author_facet | Sangphukieo, Apiwat Laomettachit, Teeraphan Ruengjitchatchawalya, Marasri |
author_sort | Sangphukieo, Apiwat |
collection | PubMed |
description | Identification of novel photosynthetic proteins is important for understanding and improving photosynthetic efficiency. Synergistically, genome neighborhood can provide additional useful information to identify photosynthetic proteins. We, therefore, expected that applying a computational approach, particularly machine learning (ML) with the genome neighborhood-based feature should facilitate the photosynthetic function assignment. Our results revealed a functional relationship between photosynthetic genes and their conserved neighboring genes observed by ‘Phylo score’, indicating their functions could be inferred from the genome neighborhood profile. Therefore, we created a new method for extracting patterns based on the genome neighborhood network (GNN) and applied them for the photosynthetic protein classification using ML algorithms. Random forest (RF) classifier using genome neighborhood-based features achieved the highest accuracy up to 87% in the classification of photosynthetic proteins and also showed better performance (Mathew’s correlation coefficient = 0.718) than other available tools including the sequence similarity search (0.447) and ML-based method (0.361). Furthermore, we demonstrated the ability of our model to identify novel photosynthetic proteins compared to the other methods. Our classifier is available at http://bicep2.kmutt.ac.th/photomod_standalone, https://bit.ly/2S0I2Ox and DockerHub: https://hub.docker.com/r/asangphukieo/photomod. |
format | Online Article Text |
id | pubmed-7189237 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-71892372020-05-04 Photosynthetic protein classification using genome neighborhood-based machine learning feature Sangphukieo, Apiwat Laomettachit, Teeraphan Ruengjitchatchawalya, Marasri Sci Rep Article Identification of novel photosynthetic proteins is important for understanding and improving photosynthetic efficiency. Synergistically, genome neighborhood can provide additional useful information to identify photosynthetic proteins. We, therefore, expected that applying a computational approach, particularly machine learning (ML) with the genome neighborhood-based feature should facilitate the photosynthetic function assignment. Our results revealed a functional relationship between photosynthetic genes and their conserved neighboring genes observed by ‘Phylo score’, indicating their functions could be inferred from the genome neighborhood profile. Therefore, we created a new method for extracting patterns based on the genome neighborhood network (GNN) and applied them for the photosynthetic protein classification using ML algorithms. Random forest (RF) classifier using genome neighborhood-based features achieved the highest accuracy up to 87% in the classification of photosynthetic proteins and also showed better performance (Mathew’s correlation coefficient = 0.718) than other available tools including the sequence similarity search (0.447) and ML-based method (0.361). Furthermore, we demonstrated the ability of our model to identify novel photosynthetic proteins compared to the other methods. Our classifier is available at http://bicep2.kmutt.ac.th/photomod_standalone, https://bit.ly/2S0I2Ox and DockerHub: https://hub.docker.com/r/asangphukieo/photomod. Nature Publishing Group UK 2020-04-28 /pmc/articles/PMC7189237/ /pubmed/32346070 http://dx.doi.org/10.1038/s41598-020-64053-w Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Sangphukieo, Apiwat Laomettachit, Teeraphan Ruengjitchatchawalya, Marasri Photosynthetic protein classification using genome neighborhood-based machine learning feature |
title | Photosynthetic protein classification using genome neighborhood-based machine learning feature |
title_full | Photosynthetic protein classification using genome neighborhood-based machine learning feature |
title_fullStr | Photosynthetic protein classification using genome neighborhood-based machine learning feature |
title_full_unstemmed | Photosynthetic protein classification using genome neighborhood-based machine learning feature |
title_short | Photosynthetic protein classification using genome neighborhood-based machine learning feature |
title_sort | photosynthetic protein classification using genome neighborhood-based machine learning feature |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7189237/ https://www.ncbi.nlm.nih.gov/pubmed/32346070 http://dx.doi.org/10.1038/s41598-020-64053-w |
work_keys_str_mv | AT sangphukieoapiwat photosyntheticproteinclassificationusinggenomeneighborhoodbasedmachinelearningfeature AT laomettachitteeraphan photosyntheticproteinclassificationusinggenomeneighborhoodbasedmachinelearningfeature AT ruengjitchatchawalyamarasri photosyntheticproteinclassificationusinggenomeneighborhoodbasedmachinelearningfeature |