Cargando…

Predicting and clustering plant CLE genes with a new method developed specifically for short amino acid sequences

BACKGROUND: The CLV3/ESR-RELATED (CLE) gene family encodes small secreted peptides (SSPs) and plays vital roles in plant growth and development by promoting cell-to-cell communication. The prediction and classification of CLE genes is challenging because of their low sequence similarity. RESULTS: We...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Zhe, Liu, Lei, Kucukoglu, Melis, Tian, Dongdong, Larkin, Robert M., Shi, Xueping, Zheng, Bo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7552357/
https://www.ncbi.nlm.nih.gov/pubmed/33045986
http://dx.doi.org/10.1186/s12864-020-07114-8
_version_ 1783593385165062144
author Zhang, Zhe
Liu, Lei
Kucukoglu, Melis
Tian, Dongdong
Larkin, Robert M.
Shi, Xueping
Zheng, Bo
author_facet Zhang, Zhe
Liu, Lei
Kucukoglu, Melis
Tian, Dongdong
Larkin, Robert M.
Shi, Xueping
Zheng, Bo
author_sort Zhang, Zhe
collection PubMed
description BACKGROUND: The CLV3/ESR-RELATED (CLE) gene family encodes small secreted peptides (SSPs) and plays vital roles in plant growth and development by promoting cell-to-cell communication. The prediction and classification of CLE genes is challenging because of their low sequence similarity. RESULTS: We developed a machine learning-aided method for predicting CLE genes by using a CLE motif-specific residual score matrix and a novel clustering method based on the Euclidean distance of 12 amino acid residues from the CLE motif in a site-weight dependent manner. In total, 2156 CLE candidates—including 627 novel candidates—were predicted from 69 plant species. The results from our CLE motif-based clustering are consistent with previous reports using the entire pre-propeptide. Characterization of CLE candidates provided systematic statistics on protein lengths, signal peptides, relative motif positions, amino acid compositions of different parts of the CLE precursor proteins, and decisive factors of CLE prediction. The approach taken here provides information on the evolution of the CLE gene family and provides evidence that the CLE and IDA/IDL genes share a common ancestor. CONCLUSIONS: Our new approach is applicable to SSPs or other proteins with short conserved domains and hence, provides a useful tool for gene prediction, classification and evolutionary analysis.
format Online
Article
Text
id pubmed-7552357
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-75523572020-10-13 Predicting and clustering plant CLE genes with a new method developed specifically for short amino acid sequences Zhang, Zhe Liu, Lei Kucukoglu, Melis Tian, Dongdong Larkin, Robert M. Shi, Xueping Zheng, Bo BMC Genomics Methodology Article BACKGROUND: The CLV3/ESR-RELATED (CLE) gene family encodes small secreted peptides (SSPs) and plays vital roles in plant growth and development by promoting cell-to-cell communication. The prediction and classification of CLE genes is challenging because of their low sequence similarity. RESULTS: We developed a machine learning-aided method for predicting CLE genes by using a CLE motif-specific residual score matrix and a novel clustering method based on the Euclidean distance of 12 amino acid residues from the CLE motif in a site-weight dependent manner. In total, 2156 CLE candidates—including 627 novel candidates—were predicted from 69 plant species. The results from our CLE motif-based clustering are consistent with previous reports using the entire pre-propeptide. Characterization of CLE candidates provided systematic statistics on protein lengths, signal peptides, relative motif positions, amino acid compositions of different parts of the CLE precursor proteins, and decisive factors of CLE prediction. The approach taken here provides information on the evolution of the CLE gene family and provides evidence that the CLE and IDA/IDL genes share a common ancestor. CONCLUSIONS: Our new approach is applicable to SSPs or other proteins with short conserved domains and hence, provides a useful tool for gene prediction, classification and evolutionary analysis. BioMed Central 2020-10-12 /pmc/articles/PMC7552357/ /pubmed/33045986 http://dx.doi.org/10.1186/s12864-020-07114-8 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Zhang, Zhe
Liu, Lei
Kucukoglu, Melis
Tian, Dongdong
Larkin, Robert M.
Shi, Xueping
Zheng, Bo
Predicting and clustering plant CLE genes with a new method developed specifically for short amino acid sequences
title Predicting and clustering plant CLE genes with a new method developed specifically for short amino acid sequences
title_full Predicting and clustering plant CLE genes with a new method developed specifically for short amino acid sequences
title_fullStr Predicting and clustering plant CLE genes with a new method developed specifically for short amino acid sequences
title_full_unstemmed Predicting and clustering plant CLE genes with a new method developed specifically for short amino acid sequences
title_short Predicting and clustering plant CLE genes with a new method developed specifically for short amino acid sequences
title_sort predicting and clustering plant cle genes with a new method developed specifically for short amino acid sequences
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7552357/
https://www.ncbi.nlm.nih.gov/pubmed/33045986
http://dx.doi.org/10.1186/s12864-020-07114-8
work_keys_str_mv AT zhangzhe predictingandclusteringplantclegeneswithanewmethoddevelopedspecificallyforshortaminoacidsequences
AT liulei predictingandclusteringplantclegeneswithanewmethoddevelopedspecificallyforshortaminoacidsequences
AT kucukoglumelis predictingandclusteringplantclegeneswithanewmethoddevelopedspecificallyforshortaminoacidsequences
AT tiandongdong predictingandclusteringplantclegeneswithanewmethoddevelopedspecificallyforshortaminoacidsequences
AT larkinrobertm predictingandclusteringplantclegeneswithanewmethoddevelopedspecificallyforshortaminoacidsequences
AT shixueping predictingandclusteringplantclegeneswithanewmethoddevelopedspecificallyforshortaminoacidsequences
AT zhengbo predictingandclusteringplantclegeneswithanewmethoddevelopedspecificallyforshortaminoacidsequences