Cargando…

PIntron: a fast method for detecting the gene structure due to alternative splicing via maximal pairings of a pattern and a text

BACKGROUND: A challenging issue in designing computational methods for predicting the gene structure into exons and introns from a cluster of transcript (EST, mRNA) sequences, is guaranteeing accuracy as well as efficiency in time and space, when large clusters of more than 20,000 ESTs and genes lon...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pirola, Yuri, Rizzi, Raffaella, Picardi, Ernesto, Pesole, Graziano, Della Vedova, Gianluca, Bonizzoni, Paola
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3358663/ https://www.ncbi.nlm.nih.gov/pubmed/22537006 http://dx.doi.org/10.1186/1471-2105-13-S5-S2

_version_	1782233796227104768
author	Pirola, Yuri Rizzi, Raffaella Picardi, Ernesto Pesole, Graziano Della Vedova, Gianluca Bonizzoni, Paola
author_facet	Pirola, Yuri Rizzi, Raffaella Picardi, Ernesto Pesole, Graziano Della Vedova, Gianluca Bonizzoni, Paola
author_sort	Pirola, Yuri
collection	PubMed
description	BACKGROUND: A challenging issue in designing computational methods for predicting the gene structure into exons and introns from a cluster of transcript (EST, mRNA) sequences, is guaranteeing accuracy as well as efficiency in time and space, when large clusters of more than 20,000 ESTs and genes longer than 1 Mb are processed. Traditionally, the problem has been faced by combining different tools, not specifically designed for this task. RESULTS: We propose a fast method based on ad hoc procedures for solving the problem. Our method combines two ideas: a novel algorithm of proved small time complexity for computing spliced alignments of a transcript against a genome, and an efficient algorithm that exploits the inherent redundancy of information in a cluster of transcripts to select, among all possible factorizations of EST sequences, those allowing to infer splice site junctions that are largely confirmed by the input data. The EST alignment procedure is based on the construction of maximal embeddings, that are sequences obtained from paths of a graph structure, called embedding graph, whose vertices are the maximal pairings of a genomic sequence T and an EST P. The procedure runs in time linear in the length of P and T and in the size of the output. The method was implemented into the PIntron package. PIntron requires as input a genomic sequence or region and a set of EST and/or mRNA sequences. Besides the prediction of the full-length transcript isoforms potentially expressed by the gene, the PIntron package includes a module for the CDS annotation of the predicted transcripts. CONCLUSIONS: PIntron, the software tool implementing our methodology, is available at http://www.algolab.eu/PIntron under GNU AGPL. PIntron has been shown to outperform state-of-the-art methods, and to quickly process some critical genes. At the same time, PIntron exhibits high accuracy (sensitivity and specificity) when benchmarked with ENCODE annotations.
format	Online Article Text
id	pubmed-3358663
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-33586632012-05-31 PIntron: a fast method for detecting the gene structure due to alternative splicing via maximal pairings of a pattern and a text Pirola, Yuri Rizzi, Raffaella Picardi, Ernesto Pesole, Graziano Della Vedova, Gianluca Bonizzoni, Paola BMC Bioinformatics Research BACKGROUND: A challenging issue in designing computational methods for predicting the gene structure into exons and introns from a cluster of transcript (EST, mRNA) sequences, is guaranteeing accuracy as well as efficiency in time and space, when large clusters of more than 20,000 ESTs and genes longer than 1 Mb are processed. Traditionally, the problem has been faced by combining different tools, not specifically designed for this task. RESULTS: We propose a fast method based on ad hoc procedures for solving the problem. Our method combines two ideas: a novel algorithm of proved small time complexity for computing spliced alignments of a transcript against a genome, and an efficient algorithm that exploits the inherent redundancy of information in a cluster of transcripts to select, among all possible factorizations of EST sequences, those allowing to infer splice site junctions that are largely confirmed by the input data. The EST alignment procedure is based on the construction of maximal embeddings, that are sequences obtained from paths of a graph structure, called embedding graph, whose vertices are the maximal pairings of a genomic sequence T and an EST P. The procedure runs in time linear in the length of P and T and in the size of the output. The method was implemented into the PIntron package. PIntron requires as input a genomic sequence or region and a set of EST and/or mRNA sequences. Besides the prediction of the full-length transcript isoforms potentially expressed by the gene, the PIntron package includes a module for the CDS annotation of the predicted transcripts. CONCLUSIONS: PIntron, the software tool implementing our methodology, is available at http://www.algolab.eu/PIntron under GNU AGPL. PIntron has been shown to outperform state-of-the-art methods, and to quickly process some critical genes. At the same time, PIntron exhibits high accuracy (sensitivity and specificity) when benchmarked with ENCODE annotations. BioMed Central 2012-04-12 /pmc/articles/PMC3358663/ /pubmed/22537006 http://dx.doi.org/10.1186/1471-2105-13-S5-S2 Text en Copyright ©2012 Pirola et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Pirola, Yuri Rizzi, Raffaella Picardi, Ernesto Pesole, Graziano Della Vedova, Gianluca Bonizzoni, Paola PIntron: a fast method for detecting the gene structure due to alternative splicing via maximal pairings of a pattern and a text
title	PIntron: a fast method for detecting the gene structure due to alternative splicing via maximal pairings of a pattern and a text
title_full	PIntron: a fast method for detecting the gene structure due to alternative splicing via maximal pairings of a pattern and a text
title_fullStr	PIntron: a fast method for detecting the gene structure due to alternative splicing via maximal pairings of a pattern and a text
title_full_unstemmed	PIntron: a fast method for detecting the gene structure due to alternative splicing via maximal pairings of a pattern and a text
title_short	PIntron: a fast method for detecting the gene structure due to alternative splicing via maximal pairings of a pattern and a text
title_sort	pintron: a fast method for detecting the gene structure due to alternative splicing via maximal pairings of a pattern and a text
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3358663/ https://www.ncbi.nlm.nih.gov/pubmed/22537006 http://dx.doi.org/10.1186/1471-2105-13-S5-S2
work_keys_str_mv	AT pirolayuri pintronafastmethodfordetectingthegenestructureduetoalternativesplicingviamaximalpairingsofapatternandatext AT rizziraffaella pintronafastmethodfordetectingthegenestructureduetoalternativesplicingviamaximalpairingsofapatternandatext AT picardiernesto pintronafastmethodfordetectingthegenestructureduetoalternativesplicingviamaximalpairingsofapatternandatext AT pesolegraziano pintronafastmethodfordetectingthegenestructureduetoalternativesplicingviamaximalpairingsofapatternandatext AT dellavedovagianluca pintronafastmethodfordetectingthegenestructureduetoalternativesplicingviamaximalpairingsofapatternandatext AT bonizzonipaola pintronafastmethodfordetectingthegenestructureduetoalternativesplicingviamaximalpairingsofapatternandatext

PIntron: a fast method for detecting the gene structure due to alternative splicing via maximal pairings of a pattern and a text

Ejemplares similares