Cargando…

RFPDR: a random forest approach for plant disease resistance protein prediction

BACKGROUND: Plant innate immunity relies on a broad repertoire of receptor proteins that can detect pathogens and trigger an effective defense response. Bioinformatic tools based on conserved domain and sequence similarity are within the most popular strategies for protein identification and charact...

Descripción completa

Detalles Bibliográficos
Autores principales: Simón, Diego, Borsani, Omar, Filippi, Carla Valeria
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9037127/
https://www.ncbi.nlm.nih.gov/pubmed/35480565
http://dx.doi.org/10.7717/peerj.11683
_version_ 1784693665496563712
author Simón, Diego
Borsani, Omar
Filippi, Carla Valeria
author_facet Simón, Diego
Borsani, Omar
Filippi, Carla Valeria
author_sort Simón, Diego
collection PubMed
description BACKGROUND: Plant innate immunity relies on a broad repertoire of receptor proteins that can detect pathogens and trigger an effective defense response. Bioinformatic tools based on conserved domain and sequence similarity are within the most popular strategies for protein identification and characterization. However, the multi-domain nature, high sequence diversity and complex evolutionary history of disease resistance (DR) proteins make their prediction a real challenge. Here we present RFPDR, which pioneers the application of Random Forest (RF) for Plant DR protein prediction. METHODS: A recently published collection of experimentally validated DR proteins was used as a positive dataset, while 10x10 nested datasets, ranging from 400-4,000 non-DR proteins, were used as negative datasets. A total of 9,631 features were extracted from each protein sequence, and included in a full dimension (FD) RFPDR model. Sequence selection was performed, to generate a reduced-dimension (RD) RFPDR model. Model performances were evaluated using an 80/20 (training/testing) partition, with 10-cross fold validation, and compared to baseline, sequence-based and state-of-the-art strategies. To gain some insights into the underlying biology, the most discriminatory sequence-based features in the RF classifier were identified. RESULTS AND DISCUSSION: RD-RFPDR showed to be sensitive (86.4 ± 4.0%) and specific (96.9 ± 1.5%) for identifying DR proteins, while robust to data imbalance. Its high performance and robustness, added to the fact that RD-RFPDR provides valuable information related to DR proteins underlying properties, make RD-RFPDR an interesting approach for DR protein prediction, complementing the state-of-the-art strategies.
format Online
Article
Text
id pubmed-9037127
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-90371272022-04-26 RFPDR: a random forest approach for plant disease resistance protein prediction Simón, Diego Borsani, Omar Filippi, Carla Valeria PeerJ Agricultural Science BACKGROUND: Plant innate immunity relies on a broad repertoire of receptor proteins that can detect pathogens and trigger an effective defense response. Bioinformatic tools based on conserved domain and sequence similarity are within the most popular strategies for protein identification and characterization. However, the multi-domain nature, high sequence diversity and complex evolutionary history of disease resistance (DR) proteins make their prediction a real challenge. Here we present RFPDR, which pioneers the application of Random Forest (RF) for Plant DR protein prediction. METHODS: A recently published collection of experimentally validated DR proteins was used as a positive dataset, while 10x10 nested datasets, ranging from 400-4,000 non-DR proteins, were used as negative datasets. A total of 9,631 features were extracted from each protein sequence, and included in a full dimension (FD) RFPDR model. Sequence selection was performed, to generate a reduced-dimension (RD) RFPDR model. Model performances were evaluated using an 80/20 (training/testing) partition, with 10-cross fold validation, and compared to baseline, sequence-based and state-of-the-art strategies. To gain some insights into the underlying biology, the most discriminatory sequence-based features in the RF classifier were identified. RESULTS AND DISCUSSION: RD-RFPDR showed to be sensitive (86.4 ± 4.0%) and specific (96.9 ± 1.5%) for identifying DR proteins, while robust to data imbalance. Its high performance and robustness, added to the fact that RD-RFPDR provides valuable information related to DR proteins underlying properties, make RD-RFPDR an interesting approach for DR protein prediction, complementing the state-of-the-art strategies. PeerJ Inc. 2022-04-22 /pmc/articles/PMC9037127/ /pubmed/35480565 http://dx.doi.org/10.7717/peerj.11683 Text en ©2022 Simón et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Agricultural Science
Simón, Diego
Borsani, Omar
Filippi, Carla Valeria
RFPDR: a random forest approach for plant disease resistance protein prediction
title RFPDR: a random forest approach for plant disease resistance protein prediction
title_full RFPDR: a random forest approach for plant disease resistance protein prediction
title_fullStr RFPDR: a random forest approach for plant disease resistance protein prediction
title_full_unstemmed RFPDR: a random forest approach for plant disease resistance protein prediction
title_short RFPDR: a random forest approach for plant disease resistance protein prediction
title_sort rfpdr: a random forest approach for plant disease resistance protein prediction
topic Agricultural Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9037127/
https://www.ncbi.nlm.nih.gov/pubmed/35480565
http://dx.doi.org/10.7717/peerj.11683
work_keys_str_mv AT simondiego rfpdrarandomforestapproachforplantdiseaseresistanceproteinprediction
AT borsaniomar rfpdrarandomforestapproachforplantdiseaseresistanceproteinprediction
AT filippicarlavaleria rfpdrarandomforestapproachforplantdiseaseresistanceproteinprediction