Cargando…
Feature selection for RNA cleavage efficiency at specific sites using the LASSO regression model in Arabidopsis thaliana
BACKGROUND: RNA degradation is important for the regulation of gene expression. Despite the identification of proteins and sequences related to deadenylation-dependent RNA degradation in plants, endonucleolytic cleavage-dependent RNA degradation has not been studied in detail. Here, we developed tru...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8299621/ https://www.ncbi.nlm.nih.gov/pubmed/34294042 http://dx.doi.org/10.1186/s12859-021-04291-5 |
_version_ | 1783726305316962304 |
---|---|
author | Ueno, Daishin Kawabe, Harunori Yamasaki, Shotaro Demura, Taku Kato, Ko |
author_facet | Ueno, Daishin Kawabe, Harunori Yamasaki, Shotaro Demura, Taku Kato, Ko |
author_sort | Ueno, Daishin |
collection | PubMed |
description | BACKGROUND: RNA degradation is important for the regulation of gene expression. Despite the identification of proteins and sequences related to deadenylation-dependent RNA degradation in plants, endonucleolytic cleavage-dependent RNA degradation has not been studied in detail. Here, we developed truncated RNA end sequencing in Arabidopsis thaliana to identify cleavage sites and evaluate the efficiency of cleavage at each site. Although several features are related to RNA cleavage efficiency, the effect of each feature on cleavage efficiency has not been evaluated by considering multiple putative determinants in A. thaliana. RESULTS: Cleavage site information was acquired from a previous study, and cleavage efficiency at the site level (CS(site) value), which indicates the number of reads at each cleavage site normalized to RNA abundance, was calculated. To identify features related to cleavage efficiency at the site level, multiple putative determinants (features) were used to perform feature selection using the Least Absolute Shrinkage and Selection Operator (LASSO) regression model. The results indicated that whole RNA features were important for the CS(site) value, in addition to features around cleavage sites. Whole RNA features related to the translation process and nucleotide frequency around cleavage sites were major determinants of cleavage efficiency. The results were verified in a model constructed using only sequence features, which showed that the prediction accuracy was similar to that determined using all features including the translation process, suggesting that cleavage efficiency can be predicted using only sequence information. The LASSO regression model was validated in exogenous genes, which showed that the model constructed using only sequence information can predict cleavage efficiency in both endogenous and exogenous genes. CONCLUSIONS: Feature selection using the LASSO regression model in A. thaliana identified 155 features. Correlation coefficients revealed that whole RNA features are important for determining cleavage efficiency in addition to features around the cleavage sites. The LASSO regression model can predict cleavage efficiency in endogenous and exogenous genes using only sequence information. The model revealed the significance of the effect of multiple determinants on cleavage efficiency, suggesting that sequence features are important for RNA degradation mechanisms in A. thaliana. |
format | Online Article Text |
id | pubmed-8299621 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-82996212021-07-28 Feature selection for RNA cleavage efficiency at specific sites using the LASSO regression model in Arabidopsis thaliana Ueno, Daishin Kawabe, Harunori Yamasaki, Shotaro Demura, Taku Kato, Ko BMC Bioinformatics Research BACKGROUND: RNA degradation is important for the regulation of gene expression. Despite the identification of proteins and sequences related to deadenylation-dependent RNA degradation in plants, endonucleolytic cleavage-dependent RNA degradation has not been studied in detail. Here, we developed truncated RNA end sequencing in Arabidopsis thaliana to identify cleavage sites and evaluate the efficiency of cleavage at each site. Although several features are related to RNA cleavage efficiency, the effect of each feature on cleavage efficiency has not been evaluated by considering multiple putative determinants in A. thaliana. RESULTS: Cleavage site information was acquired from a previous study, and cleavage efficiency at the site level (CS(site) value), which indicates the number of reads at each cleavage site normalized to RNA abundance, was calculated. To identify features related to cleavage efficiency at the site level, multiple putative determinants (features) were used to perform feature selection using the Least Absolute Shrinkage and Selection Operator (LASSO) regression model. The results indicated that whole RNA features were important for the CS(site) value, in addition to features around cleavage sites. Whole RNA features related to the translation process and nucleotide frequency around cleavage sites were major determinants of cleavage efficiency. The results were verified in a model constructed using only sequence features, which showed that the prediction accuracy was similar to that determined using all features including the translation process, suggesting that cleavage efficiency can be predicted using only sequence information. The LASSO regression model was validated in exogenous genes, which showed that the model constructed using only sequence information can predict cleavage efficiency in both endogenous and exogenous genes. CONCLUSIONS: Feature selection using the LASSO regression model in A. thaliana identified 155 features. Correlation coefficients revealed that whole RNA features are important for determining cleavage efficiency in addition to features around the cleavage sites. The LASSO regression model can predict cleavage efficiency in endogenous and exogenous genes using only sequence information. The model revealed the significance of the effect of multiple determinants on cleavage efficiency, suggesting that sequence features are important for RNA degradation mechanisms in A. thaliana. BioMed Central 2021-07-22 /pmc/articles/PMC8299621/ /pubmed/34294042 http://dx.doi.org/10.1186/s12859-021-04291-5 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Ueno, Daishin Kawabe, Harunori Yamasaki, Shotaro Demura, Taku Kato, Ko Feature selection for RNA cleavage efficiency at specific sites using the LASSO regression model in Arabidopsis thaliana |
title | Feature selection for RNA cleavage efficiency at specific sites using the LASSO regression model in Arabidopsis thaliana |
title_full | Feature selection for RNA cleavage efficiency at specific sites using the LASSO regression model in Arabidopsis thaliana |
title_fullStr | Feature selection for RNA cleavage efficiency at specific sites using the LASSO regression model in Arabidopsis thaliana |
title_full_unstemmed | Feature selection for RNA cleavage efficiency at specific sites using the LASSO regression model in Arabidopsis thaliana |
title_short | Feature selection for RNA cleavage efficiency at specific sites using the LASSO regression model in Arabidopsis thaliana |
title_sort | feature selection for rna cleavage efficiency at specific sites using the lasso regression model in arabidopsis thaliana |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8299621/ https://www.ncbi.nlm.nih.gov/pubmed/34294042 http://dx.doi.org/10.1186/s12859-021-04291-5 |
work_keys_str_mv | AT uenodaishin featureselectionforrnacleavageefficiencyatspecificsitesusingthelassoregressionmodelinarabidopsisthaliana AT kawabeharunori featureselectionforrnacleavageefficiencyatspecificsitesusingthelassoregressionmodelinarabidopsisthaliana AT yamasakishotaro featureselectionforrnacleavageefficiencyatspecificsitesusingthelassoregressionmodelinarabidopsisthaliana AT demurataku featureselectionforrnacleavageefficiencyatspecificsitesusingthelassoregressionmodelinarabidopsisthaliana AT katoko featureselectionforrnacleavageefficiencyatspecificsitesusingthelassoregressionmodelinarabidopsisthaliana |