Cargando…
Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA
BACKGROUND: Polyadenylation is a critical stage of RNA processing during the formation of mature mRNA, and is present in most of the known eukaryote protein-coding transcripts and many long non-coding RNAs. The correct identification of poly(A) signals (PAS) not only helps to elucidate the 3′-end ge...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5558757/ https://www.ncbi.nlm.nih.gov/pubmed/28810905 http://dx.doi.org/10.1186/s12864-017-4033-7 |
_version_ | 1783257443585753088 |
---|---|
author | Magana-Mora, Arturo Kalkatawi, Manal Bajic, Vladimir B. |
author_facet | Magana-Mora, Arturo Kalkatawi, Manal Bajic, Vladimir B. |
author_sort | Magana-Mora, Arturo |
collection | PubMed |
description | BACKGROUND: Polyadenylation is a critical stage of RNA processing during the formation of mature mRNA, and is present in most of the known eukaryote protein-coding transcripts and many long non-coding RNAs. The correct identification of poly(A) signals (PAS) not only helps to elucidate the 3′-end genomic boundaries of a transcribed DNA region and gene regulatory mechanisms but also gives insight into the multiple transcript isoforms resulting from alternative PAS. Although progress has been made in the in-silico prediction of genomic signals, the recognition of PAS in DNA genomic sequences remains a challenge. RESULTS: In this study, we analyzed human genomic DNA sequences for the 12 most common PAS variants. Our analysis has identified a set of features that helps in the recognition of true PAS, which may be involved in the regulation of the polyadenylation process. The proposed features, in combination with a recognition model, resulted in a novel method and tool, Omni-PolyA. Omni-PolyA combines several machine learning techniques such as different classifiers in a tree-like decision structure and genetic algorithms for deriving a robust classification model. We performed a comparison between results obtained by state-of-the-art methods, deep neural networks, and Omni-PolyA. Results show that Omni-PolyA significantly reduced the average classification error rate by 35.37% in the prediction of the 12 considered PAS variants relative to the state-of-the-art results. CONCLUSIONS: The results of our study demonstrate that Omni-PolyA is currently the most accurate model for the prediction of PAS in human and can serve as a useful complement to other PAS recognition methods. Omni-PolyA is publicly available as an online tool accessible at www.cbrc.kaust.edu.sa/omnipolya/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-4033-7) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5558757 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-55587572017-08-18 Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA Magana-Mora, Arturo Kalkatawi, Manal Bajic, Vladimir B. BMC Genomics Research Article BACKGROUND: Polyadenylation is a critical stage of RNA processing during the formation of mature mRNA, and is present in most of the known eukaryote protein-coding transcripts and many long non-coding RNAs. The correct identification of poly(A) signals (PAS) not only helps to elucidate the 3′-end genomic boundaries of a transcribed DNA region and gene regulatory mechanisms but also gives insight into the multiple transcript isoforms resulting from alternative PAS. Although progress has been made in the in-silico prediction of genomic signals, the recognition of PAS in DNA genomic sequences remains a challenge. RESULTS: In this study, we analyzed human genomic DNA sequences for the 12 most common PAS variants. Our analysis has identified a set of features that helps in the recognition of true PAS, which may be involved in the regulation of the polyadenylation process. The proposed features, in combination with a recognition model, resulted in a novel method and tool, Omni-PolyA. Omni-PolyA combines several machine learning techniques such as different classifiers in a tree-like decision structure and genetic algorithms for deriving a robust classification model. We performed a comparison between results obtained by state-of-the-art methods, deep neural networks, and Omni-PolyA. Results show that Omni-PolyA significantly reduced the average classification error rate by 35.37% in the prediction of the 12 considered PAS variants relative to the state-of-the-art results. CONCLUSIONS: The results of our study demonstrate that Omni-PolyA is currently the most accurate model for the prediction of PAS in human and can serve as a useful complement to other PAS recognition methods. Omni-PolyA is publicly available as an online tool accessible at www.cbrc.kaust.edu.sa/omnipolya/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-4033-7) contains supplementary material, which is available to authorized users. BioMed Central 2017-08-15 /pmc/articles/PMC5558757/ /pubmed/28810905 http://dx.doi.org/10.1186/s12864-017-4033-7 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Magana-Mora, Arturo Kalkatawi, Manal Bajic, Vladimir B. Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA |
title | Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA |
title_full | Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA |
title_fullStr | Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA |
title_full_unstemmed | Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA |
title_short | Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA |
title_sort | omni-polya: a method and tool for accurate recognition of poly(a) signals in human genomic dna |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5558757/ https://www.ncbi.nlm.nih.gov/pubmed/28810905 http://dx.doi.org/10.1186/s12864-017-4033-7 |
work_keys_str_mv | AT maganamoraarturo omnipolyaamethodandtoolforaccuraterecognitionofpolyasignalsinhumangenomicdna AT kalkatawimanal omnipolyaamethodandtoolforaccuraterecognitionofpolyasignalsinhumangenomicdna AT bajicvladimirb omnipolyaamethodandtoolforaccuraterecognitionofpolyasignalsinhumangenomicdna |