Cargando…

Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA

BACKGROUND: Polyadenylation is a critical stage of RNA processing during the formation of mature mRNA, and is present in most of the known eukaryote protein-coding transcripts and many long non-coding RNAs. The correct identification of poly(A) signals (PAS) not only helps to elucidate the 3′-end ge...

Descripción completa

Detalles Bibliográficos
Autores principales: Magana-Mora, Arturo, Kalkatawi, Manal, Bajic, Vladimir B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5558757/
https://www.ncbi.nlm.nih.gov/pubmed/28810905
http://dx.doi.org/10.1186/s12864-017-4033-7
_version_ 1783257443585753088
author Magana-Mora, Arturo
Kalkatawi, Manal
Bajic, Vladimir B.
author_facet Magana-Mora, Arturo
Kalkatawi, Manal
Bajic, Vladimir B.
author_sort Magana-Mora, Arturo
collection PubMed
description BACKGROUND: Polyadenylation is a critical stage of RNA processing during the formation of mature mRNA, and is present in most of the known eukaryote protein-coding transcripts and many long non-coding RNAs. The correct identification of poly(A) signals (PAS) not only helps to elucidate the 3′-end genomic boundaries of a transcribed DNA region and gene regulatory mechanisms but also gives insight into the multiple transcript isoforms resulting from alternative PAS. Although progress has been made in the in-silico prediction of genomic signals, the recognition of PAS in DNA genomic sequences remains a challenge. RESULTS: In this study, we analyzed human genomic DNA sequences for the 12 most common PAS variants. Our analysis has identified a set of features that helps in the recognition of true PAS, which may be involved in the regulation of the polyadenylation process. The proposed features, in combination with a recognition model, resulted in a novel method and tool, Omni-PolyA. Omni-PolyA combines several machine learning techniques such as different classifiers in a tree-like decision structure and genetic algorithms for deriving a robust classification model. We performed a comparison between results obtained by state-of-the-art methods, deep neural networks, and Omni-PolyA. Results show that Omni-PolyA significantly reduced the average classification error rate by 35.37% in the prediction of the 12 considered PAS variants relative to the state-of-the-art results. CONCLUSIONS: The results of our study demonstrate that Omni-PolyA is currently the most accurate model for the prediction of PAS in human and can serve as a useful complement to other PAS recognition methods. Omni-PolyA is publicly available as an online tool accessible at www.cbrc.kaust.edu.sa/omnipolya/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-4033-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5558757
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55587572017-08-18 Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA Magana-Mora, Arturo Kalkatawi, Manal Bajic, Vladimir B. BMC Genomics Research Article BACKGROUND: Polyadenylation is a critical stage of RNA processing during the formation of mature mRNA, and is present in most of the known eukaryote protein-coding transcripts and many long non-coding RNAs. The correct identification of poly(A) signals (PAS) not only helps to elucidate the 3′-end genomic boundaries of a transcribed DNA region and gene regulatory mechanisms but also gives insight into the multiple transcript isoforms resulting from alternative PAS. Although progress has been made in the in-silico prediction of genomic signals, the recognition of PAS in DNA genomic sequences remains a challenge. RESULTS: In this study, we analyzed human genomic DNA sequences for the 12 most common PAS variants. Our analysis has identified a set of features that helps in the recognition of true PAS, which may be involved in the regulation of the polyadenylation process. The proposed features, in combination with a recognition model, resulted in a novel method and tool, Omni-PolyA. Omni-PolyA combines several machine learning techniques such as different classifiers in a tree-like decision structure and genetic algorithms for deriving a robust classification model. We performed a comparison between results obtained by state-of-the-art methods, deep neural networks, and Omni-PolyA. Results show that Omni-PolyA significantly reduced the average classification error rate by 35.37% in the prediction of the 12 considered PAS variants relative to the state-of-the-art results. CONCLUSIONS: The results of our study demonstrate that Omni-PolyA is currently the most accurate model for the prediction of PAS in human and can serve as a useful complement to other PAS recognition methods. Omni-PolyA is publicly available as an online tool accessible at www.cbrc.kaust.edu.sa/omnipolya/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-4033-7) contains supplementary material, which is available to authorized users. BioMed Central 2017-08-15 /pmc/articles/PMC5558757/ /pubmed/28810905 http://dx.doi.org/10.1186/s12864-017-4033-7 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Magana-Mora, Arturo
Kalkatawi, Manal
Bajic, Vladimir B.
Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA
title Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA
title_full Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA
title_fullStr Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA
title_full_unstemmed Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA
title_short Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA
title_sort omni-polya: a method and tool for accurate recognition of poly(a) signals in human genomic dna
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5558757/
https://www.ncbi.nlm.nih.gov/pubmed/28810905
http://dx.doi.org/10.1186/s12864-017-4033-7
work_keys_str_mv AT maganamoraarturo omnipolyaamethodandtoolforaccuraterecognitionofpolyasignalsinhumangenomicdna
AT kalkatawimanal omnipolyaamethodandtoolforaccuraterecognitionofpolyasignalsinhumangenomicdna
AT bajicvladimirb omnipolyaamethodandtoolforaccuraterecognitionofpolyasignalsinhumangenomicdna