Cargando…

Potential Arabidopsis thaliana glucosinolate genes identified from the co-expression modules using graph clustering approach

BACKGROUND: Glucosinolates (GSLs) are plant secondary metabolites that contain nitrogen-containing compounds. They are important in the plant defense system and known to provide protection against cancer in humans. Currently, increasing the amount of data generated from various omics technologies se...

Descripción completa

Detalles Bibliográficos
Autores principales: Harun, Sarahani, Afiqah-Aleng, Nor, Karim, Mohammad Bozlul, Altaf Ul Amin, Md, Kanaya, Shigehiko, Mohamed-Hussein, Zeti-Azura
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8349163/
https://www.ncbi.nlm.nih.gov/pubmed/34430080
http://dx.doi.org/10.7717/peerj.11876
_version_ 1783735509471723520
author Harun, Sarahani
Afiqah-Aleng, Nor
Karim, Mohammad Bozlul
Altaf Ul Amin, Md
Kanaya, Shigehiko
Mohamed-Hussein, Zeti-Azura
author_facet Harun, Sarahani
Afiqah-Aleng, Nor
Karim, Mohammad Bozlul
Altaf Ul Amin, Md
Kanaya, Shigehiko
Mohamed-Hussein, Zeti-Azura
author_sort Harun, Sarahani
collection PubMed
description BACKGROUND: Glucosinolates (GSLs) are plant secondary metabolites that contain nitrogen-containing compounds. They are important in the plant defense system and known to provide protection against cancer in humans. Currently, increasing the amount of data generated from various omics technologies serves as a hotspot for new gene discovery. However, sometimes sequence similarity searching approach is not sufficiently effective to find these genes; hence, we adapted a network clustering approach to search for potential GSLs genes from the Arabidopsis thaliana co-expression dataset. METHODS: We used known GSL genes to construct a comprehensive GSL co-expression network. This network was analyzed with the DPClusOST algorithm using a density of 0.5. 0.6. 0.7, 0.8, and 0.9. Generating clusters were evaluated using Fisher’s exact test to identify GSL gene co-expression clusters. A significance score (SScore) was calculated for each gene based on the generated p-value of Fisher’s exact test. SScore was used to perform a receiver operating characteristic (ROC) study to classify possible GSL genes using the ROCR package. ROCR was used in determining the AUC that measured the suitable density value of the cluster for further analysis. Finally, pathway enrichment analysis was conducted using ClueGO to identify significant pathways associated with the GSL clusters. RESULTS: The density value of 0.8 showed the highest area under the curve (AUC) leading to the selection of thirteen potential GSL genes from the top six significant clusters that include IMDH3, MVP1, T19K24.17, MRSA2, SIR, ASP4, MTO1, At1g21440, HMT3, At3g47420, PS1, SAL1, and At3g14220. A total of Four potential genes (MTO1, SIR, SAL1, and IMDH3) were identified from the pathway enrichment analysis on the significant clusters. These genes are directly related to GSL-associated pathways such as sulfur metabolism and valine, leucine, and isoleucine biosynthesis. This approach demonstrates the ability of the network clustering approach in identifying potential GSL genes which cannot be found from the standard similarity search.
format Online
Article
Text
id pubmed-8349163
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-83491632021-08-23 Potential Arabidopsis thaliana glucosinolate genes identified from the co-expression modules using graph clustering approach Harun, Sarahani Afiqah-Aleng, Nor Karim, Mohammad Bozlul Altaf Ul Amin, Md Kanaya, Shigehiko Mohamed-Hussein, Zeti-Azura PeerJ Bioinformatics BACKGROUND: Glucosinolates (GSLs) are plant secondary metabolites that contain nitrogen-containing compounds. They are important in the plant defense system and known to provide protection against cancer in humans. Currently, increasing the amount of data generated from various omics technologies serves as a hotspot for new gene discovery. However, sometimes sequence similarity searching approach is not sufficiently effective to find these genes; hence, we adapted a network clustering approach to search for potential GSLs genes from the Arabidopsis thaliana co-expression dataset. METHODS: We used known GSL genes to construct a comprehensive GSL co-expression network. This network was analyzed with the DPClusOST algorithm using a density of 0.5. 0.6. 0.7, 0.8, and 0.9. Generating clusters were evaluated using Fisher’s exact test to identify GSL gene co-expression clusters. A significance score (SScore) was calculated for each gene based on the generated p-value of Fisher’s exact test. SScore was used to perform a receiver operating characteristic (ROC) study to classify possible GSL genes using the ROCR package. ROCR was used in determining the AUC that measured the suitable density value of the cluster for further analysis. Finally, pathway enrichment analysis was conducted using ClueGO to identify significant pathways associated with the GSL clusters. RESULTS: The density value of 0.8 showed the highest area under the curve (AUC) leading to the selection of thirteen potential GSL genes from the top six significant clusters that include IMDH3, MVP1, T19K24.17, MRSA2, SIR, ASP4, MTO1, At1g21440, HMT3, At3g47420, PS1, SAL1, and At3g14220. A total of Four potential genes (MTO1, SIR, SAL1, and IMDH3) were identified from the pathway enrichment analysis on the significant clusters. These genes are directly related to GSL-associated pathways such as sulfur metabolism and valine, leucine, and isoleucine biosynthesis. This approach demonstrates the ability of the network clustering approach in identifying potential GSL genes which cannot be found from the standard similarity search. PeerJ Inc. 2021-08-04 /pmc/articles/PMC8349163/ /pubmed/34430080 http://dx.doi.org/10.7717/peerj.11876 Text en © 2021 Harun et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Harun, Sarahani
Afiqah-Aleng, Nor
Karim, Mohammad Bozlul
Altaf Ul Amin, Md
Kanaya, Shigehiko
Mohamed-Hussein, Zeti-Azura
Potential Arabidopsis thaliana glucosinolate genes identified from the co-expression modules using graph clustering approach
title Potential Arabidopsis thaliana glucosinolate genes identified from the co-expression modules using graph clustering approach
title_full Potential Arabidopsis thaliana glucosinolate genes identified from the co-expression modules using graph clustering approach
title_fullStr Potential Arabidopsis thaliana glucosinolate genes identified from the co-expression modules using graph clustering approach
title_full_unstemmed Potential Arabidopsis thaliana glucosinolate genes identified from the co-expression modules using graph clustering approach
title_short Potential Arabidopsis thaliana glucosinolate genes identified from the co-expression modules using graph clustering approach
title_sort potential arabidopsis thaliana glucosinolate genes identified from the co-expression modules using graph clustering approach
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8349163/
https://www.ncbi.nlm.nih.gov/pubmed/34430080
http://dx.doi.org/10.7717/peerj.11876
work_keys_str_mv AT harunsarahani potentialarabidopsisthalianaglucosinolategenesidentifiedfromthecoexpressionmodulesusinggraphclusteringapproach
AT afiqahalengnor potentialarabidopsisthalianaglucosinolategenesidentifiedfromthecoexpressionmodulesusinggraphclusteringapproach
AT karimmohammadbozlul potentialarabidopsisthalianaglucosinolategenesidentifiedfromthecoexpressionmodulesusinggraphclusteringapproach
AT altafulaminmd potentialarabidopsisthalianaglucosinolategenesidentifiedfromthecoexpressionmodulesusinggraphclusteringapproach
AT kanayashigehiko potentialarabidopsisthalianaglucosinolategenesidentifiedfromthecoexpressionmodulesusinggraphclusteringapproach
AT mohamedhusseinzetiazura potentialarabidopsisthalianaglucosinolategenesidentifiedfromthecoexpressionmodulesusinggraphclusteringapproach