Cargando…

MADS-Box Gene Classification in Angiosperms by Clustering and Machine Learning Approaches

The MADS-box gene family is an important transcription factor family involved in floral organogenesis. The previously proposed ABCDE model suggests that different floral organ identities are controlled by various combinations of classes of MADS-box genes. The five-class ABCDE model cannot cover all...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Yu-Ting, Chang, Chi-Chang, Chen, Chi-Wei, Chen, Kuan-Chun, Chu, Yen-Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6333052/
https://www.ncbi.nlm.nih.gov/pubmed/30671085
http://dx.doi.org/10.3389/fgene.2018.00707
_version_ 1783387492224860160
author Chen, Yu-Ting
Chang, Chi-Chang
Chen, Chi-Wei
Chen, Kuan-Chun
Chu, Yen-Wei
author_facet Chen, Yu-Ting
Chang, Chi-Chang
Chen, Chi-Wei
Chen, Kuan-Chun
Chu, Yen-Wei
author_sort Chen, Yu-Ting
collection PubMed
description The MADS-box gene family is an important transcription factor family involved in floral organogenesis. The previously proposed ABCDE model suggests that different floral organ identities are controlled by various combinations of classes of MADS-box genes. The five-class ABCDE model cannot cover all the species of angiosperms, especially the orchid. Thus, we developed a two-stage approach for MADS-box gene classification to advance the study of floral organogenesis of angiosperms. First, eight classes of reference datasets (A, AGL6, B12, B34, BPI, C, D, and E) were curated and clustered by phylogenetic analysis and unsupervised learning, and they were confirmed by the literature. Second, feature selection and multiple prediction models were curated according to sequence similarity and the characteristics of the MADS-box gene domain using support vector machines. Compared with the BindN and COILS features, the local BLAST model yielded the best accuracy. For performance evaluation, the accuracy of Phalaenopsis aphrodite MADS-box gene classification was 93.3%, which is higher than 86.7% of our previous classification prediction tool, iMADS. Phylogenetic tree construction – the most common method for gene classification yields classification errors and is time-consuming for analysis of massive, multi-species, or incomplete sequences. In this regard, our new system can also confirm the classification errors of all the random selection that were incorrectly classified by phylogenetic tree analysis. Our model constitutes a reliable and efficient MADS-box gene classification system for angiosperms.
format Online
Article
Text
id pubmed-6333052
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-63330522019-01-22 MADS-Box Gene Classification in Angiosperms by Clustering and Machine Learning Approaches Chen, Yu-Ting Chang, Chi-Chang Chen, Chi-Wei Chen, Kuan-Chun Chu, Yen-Wei Front Genet Genetics The MADS-box gene family is an important transcription factor family involved in floral organogenesis. The previously proposed ABCDE model suggests that different floral organ identities are controlled by various combinations of classes of MADS-box genes. The five-class ABCDE model cannot cover all the species of angiosperms, especially the orchid. Thus, we developed a two-stage approach for MADS-box gene classification to advance the study of floral organogenesis of angiosperms. First, eight classes of reference datasets (A, AGL6, B12, B34, BPI, C, D, and E) were curated and clustered by phylogenetic analysis and unsupervised learning, and they were confirmed by the literature. Second, feature selection and multiple prediction models were curated according to sequence similarity and the characteristics of the MADS-box gene domain using support vector machines. Compared with the BindN and COILS features, the local BLAST model yielded the best accuracy. For performance evaluation, the accuracy of Phalaenopsis aphrodite MADS-box gene classification was 93.3%, which is higher than 86.7% of our previous classification prediction tool, iMADS. Phylogenetic tree construction – the most common method for gene classification yields classification errors and is time-consuming for analysis of massive, multi-species, or incomplete sequences. In this regard, our new system can also confirm the classification errors of all the random selection that were incorrectly classified by phylogenetic tree analysis. Our model constitutes a reliable and efficient MADS-box gene classification system for angiosperms. Frontiers Media S.A. 2019-01-08 /pmc/articles/PMC6333052/ /pubmed/30671085 http://dx.doi.org/10.3389/fgene.2018.00707 Text en Copyright © 2019 Chen, Chang, Chen, Chen and Chu. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Chen, Yu-Ting
Chang, Chi-Chang
Chen, Chi-Wei
Chen, Kuan-Chun
Chu, Yen-Wei
MADS-Box Gene Classification in Angiosperms by Clustering and Machine Learning Approaches
title MADS-Box Gene Classification in Angiosperms by Clustering and Machine Learning Approaches
title_full MADS-Box Gene Classification in Angiosperms by Clustering and Machine Learning Approaches
title_fullStr MADS-Box Gene Classification in Angiosperms by Clustering and Machine Learning Approaches
title_full_unstemmed MADS-Box Gene Classification in Angiosperms by Clustering and Machine Learning Approaches
title_short MADS-Box Gene Classification in Angiosperms by Clustering and Machine Learning Approaches
title_sort mads-box gene classification in angiosperms by clustering and machine learning approaches
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6333052/
https://www.ncbi.nlm.nih.gov/pubmed/30671085
http://dx.doi.org/10.3389/fgene.2018.00707
work_keys_str_mv AT chenyuting madsboxgeneclassificationinangiospermsbyclusteringandmachinelearningapproaches
AT changchichang madsboxgeneclassificationinangiospermsbyclusteringandmachinelearningapproaches
AT chenchiwei madsboxgeneclassificationinangiospermsbyclusteringandmachinelearningapproaches
AT chenkuanchun madsboxgeneclassificationinangiospermsbyclusteringandmachinelearningapproaches
AT chuyenwei madsboxgeneclassificationinangiospermsbyclusteringandmachinelearningapproaches