Cargando…

Prioritizing bona fide bacterial small RNAs with machine learning classifiers

Bacterial small (sRNAs) are involved in the control of several cellular processes. Hundreds of putative sRNAs have been identified in many bacterial species through RNA sequencing. The existence of putative sRNAs is usually validated by Northern blot analysis. However, the large amount of novel puta...

Descripción completa

Detalles Bibliográficos
Autores principales: Eppenhof, Erik J.J., Peña-Castillo, Lourdes
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6348098/
https://www.ncbi.nlm.nih.gov/pubmed/30697489
http://dx.doi.org/10.7717/peerj.6304
_version_ 1783390038703210496
author Eppenhof, Erik J.J.
Peña-Castillo, Lourdes
author_facet Eppenhof, Erik J.J.
Peña-Castillo, Lourdes
author_sort Eppenhof, Erik J.J.
collection PubMed
description Bacterial small (sRNAs) are involved in the control of several cellular processes. Hundreds of putative sRNAs have been identified in many bacterial species through RNA sequencing. The existence of putative sRNAs is usually validated by Northern blot analysis. However, the large amount of novel putative sRNAs reported in the literature makes it impractical to validate each of them in the wet lab. In this work, we applied five machine learning approaches to construct twenty models to discriminate bona fide sRNAs from random genomic sequences in five bacterial species. Sequences were represented using seven features including free energy of their predicted secondary structure, their distances to the closest predicted promoter site and Rho-independent terminator, and their distance to the closest open reading frames (ORFs). To automatically calculate these features, we developed an sRNA Characterization Pipeline (sRNACharP). All seven features used in the classification task contributed positively to the performance of the predictive models. The best performing model obtained a median precision of 100% at 10% recall and of 64% at 40% recall across all five bacterial species, and it outperformed previous published approaches on two benchmark datasets in terms of precision and recall. Our results indicate that even though there is limited sRNA sequence conservation across different bacterial species, there are intrinsic features in the genomic context of sRNAs that are conserved across taxa. We show that these features are utilized by machine learning approaches to learn a species-independent model to prioritize bona fide bacterial sRNAs.
format Online
Article
Text
id pubmed-6348098
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-63480982019-01-29 Prioritizing bona fide bacterial small RNAs with machine learning classifiers Eppenhof, Erik J.J. Peña-Castillo, Lourdes PeerJ Bioinformatics Bacterial small (sRNAs) are involved in the control of several cellular processes. Hundreds of putative sRNAs have been identified in many bacterial species through RNA sequencing. The existence of putative sRNAs is usually validated by Northern blot analysis. However, the large amount of novel putative sRNAs reported in the literature makes it impractical to validate each of them in the wet lab. In this work, we applied five machine learning approaches to construct twenty models to discriminate bona fide sRNAs from random genomic sequences in five bacterial species. Sequences were represented using seven features including free energy of their predicted secondary structure, their distances to the closest predicted promoter site and Rho-independent terminator, and their distance to the closest open reading frames (ORFs). To automatically calculate these features, we developed an sRNA Characterization Pipeline (sRNACharP). All seven features used in the classification task contributed positively to the performance of the predictive models. The best performing model obtained a median precision of 100% at 10% recall and of 64% at 40% recall across all five bacterial species, and it outperformed previous published approaches on two benchmark datasets in terms of precision and recall. Our results indicate that even though there is limited sRNA sequence conservation across different bacterial species, there are intrinsic features in the genomic context of sRNAs that are conserved across taxa. We show that these features are utilized by machine learning approaches to learn a species-independent model to prioritize bona fide bacterial sRNAs. PeerJ Inc. 2019-01-24 /pmc/articles/PMC6348098/ /pubmed/30697489 http://dx.doi.org/10.7717/peerj.6304 Text en ©2019 Eppenhof and Peña-Castillo http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Eppenhof, Erik J.J.
Peña-Castillo, Lourdes
Prioritizing bona fide bacterial small RNAs with machine learning classifiers
title Prioritizing bona fide bacterial small RNAs with machine learning classifiers
title_full Prioritizing bona fide bacterial small RNAs with machine learning classifiers
title_fullStr Prioritizing bona fide bacterial small RNAs with machine learning classifiers
title_full_unstemmed Prioritizing bona fide bacterial small RNAs with machine learning classifiers
title_short Prioritizing bona fide bacterial small RNAs with machine learning classifiers
title_sort prioritizing bona fide bacterial small rnas with machine learning classifiers
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6348098/
https://www.ncbi.nlm.nih.gov/pubmed/30697489
http://dx.doi.org/10.7717/peerj.6304
work_keys_str_mv AT eppenhoferikjj prioritizingbonafidebacterialsmallrnaswithmachinelearningclassifiers
AT penacastillolourdes prioritizingbonafidebacterialsmallrnaswithmachinelearningclassifiers