Cargando…

EasyCluster2: an improved tool for clustering and assembling long transcriptome reads

BACKGROUND: Expressed sequences (e.g. ESTs) are a strong source of evidence to improve gene structures and predict reliable alternative splicing events. When a genome assembly is available, ESTs are suitable to generate gene-oriented clusters through the well-established EasyCluster software. Nowada...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bevilacqua, Vitoantonio, Pietroleonardo, Nicola, Giannino, Ely Ignazio, Stroppa, Fabio, Simone, Domenico, Pesole, Graziano, Picardi, Ernesto
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271567/ https://www.ncbi.nlm.nih.gov/pubmed/25474441 http://dx.doi.org/10.1186/1471-2105-15-S15-S7

_version_	1782349629125296128
author	Bevilacqua, Vitoantonio Pietroleonardo, Nicola Giannino, Ely Ignazio Stroppa, Fabio Simone, Domenico Pesole, Graziano Picardi, Ernesto
author_facet	Bevilacqua, Vitoantonio Pietroleonardo, Nicola Giannino, Ely Ignazio Stroppa, Fabio Simone, Domenico Pesole, Graziano Picardi, Ernesto
author_sort	Bevilacqua, Vitoantonio
collection	PubMed
description	BACKGROUND: Expressed sequences (e.g. ESTs) are a strong source of evidence to improve gene structures and predict reliable alternative splicing events. When a genome assembly is available, ESTs are suitable to generate gene-oriented clusters through the well-established EasyCluster software. Nowadays, EST-like sequences can be massively produced using Next Generation Sequencing (NGS) technologies. In order to handle genome-scale transcriptome data, we present here EasyCluster2, a reimplementation of EasyCluster able to speed up the creation of gene-oriented clusters and facilitate downstream analyses as the assembly of full-length transcripts and the detection of splicing isoforms. RESULTS: EasyCluster2 has been developed to facilitate the genome-based clustering of EST-like sequences generated through the NGS 454 technology. Reads mapped onto the reference genome can be uploaded using the standard GFF3 file format. Alignment parsing is initially performed to produce a first collection of pseudo-clusters by grouping reads according to the overlap of their genomic coordinates on the same strand. EasyCluster2 then refines read grouping by including in each cluster only reads sharing at least one splice site and optionally performs a Smith-Waterman alignment in the region surrounding splice sites in order to correct for potential alignment errors. In addition, EasyCluster2 can include unspliced reads, which generally account for >50% of 454 datasets, and collapses overlapping clusters. Finally, EasyCluster2 can assemble full-length transcripts using a Directed-Acyclic-Graph-based strategy, simplifying the identification of alternative splicing isoforms, thanks also to the implementation of the widespread AStalavista methodology. Accuracy and performances have been tested on real as well as simulated datasets. CONCLUSIONS: EasyCluster2 represents a unique tool to cluster and assemble transcriptome reads produced with 454 technology, as well as ESTs and full-length transcripts. The clustering procedure is enhanced with the employment of genome annotations and unspliced reads. Overall, EasyCluster2 is able to perform an effective detection of splicing isoforms, since it can refine exon-exon junctions and explore alternative splicing without known reference transcripts. Results in GFF3 format can be browsed in the UCSC Genome Browser. Therefore, EasyCluster2 is a powerful tool to generate reliable clusters for gene expression studies, facilitating the analysis also to researchers not skilled in bioinformatics.
format	Online Article Text
id	pubmed-4271567
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-42715672015-01-02 EasyCluster2: an improved tool for clustering and assembling long transcriptome reads Bevilacqua, Vitoantonio Pietroleonardo, Nicola Giannino, Ely Ignazio Stroppa, Fabio Simone, Domenico Pesole, Graziano Picardi, Ernesto BMC Bioinformatics Proceedings BACKGROUND: Expressed sequences (e.g. ESTs) are a strong source of evidence to improve gene structures and predict reliable alternative splicing events. When a genome assembly is available, ESTs are suitable to generate gene-oriented clusters through the well-established EasyCluster software. Nowadays, EST-like sequences can be massively produced using Next Generation Sequencing (NGS) technologies. In order to handle genome-scale transcriptome data, we present here EasyCluster2, a reimplementation of EasyCluster able to speed up the creation of gene-oriented clusters and facilitate downstream analyses as the assembly of full-length transcripts and the detection of splicing isoforms. RESULTS: EasyCluster2 has been developed to facilitate the genome-based clustering of EST-like sequences generated through the NGS 454 technology. Reads mapped onto the reference genome can be uploaded using the standard GFF3 file format. Alignment parsing is initially performed to produce a first collection of pseudo-clusters by grouping reads according to the overlap of their genomic coordinates on the same strand. EasyCluster2 then refines read grouping by including in each cluster only reads sharing at least one splice site and optionally performs a Smith-Waterman alignment in the region surrounding splice sites in order to correct for potential alignment errors. In addition, EasyCluster2 can include unspliced reads, which generally account for >50% of 454 datasets, and collapses overlapping clusters. Finally, EasyCluster2 can assemble full-length transcripts using a Directed-Acyclic-Graph-based strategy, simplifying the identification of alternative splicing isoforms, thanks also to the implementation of the widespread AStalavista methodology. Accuracy and performances have been tested on real as well as simulated datasets. CONCLUSIONS: EasyCluster2 represents a unique tool to cluster and assemble transcriptome reads produced with 454 technology, as well as ESTs and full-length transcripts. The clustering procedure is enhanced with the employment of genome annotations and unspliced reads. Overall, EasyCluster2 is able to perform an effective detection of splicing isoforms, since it can refine exon-exon junctions and explore alternative splicing without known reference transcripts. Results in GFF3 format can be browsed in the UCSC Genome Browser. Therefore, EasyCluster2 is a powerful tool to generate reliable clusters for gene expression studies, facilitating the analysis also to researchers not skilled in bioinformatics. BioMed Central 2014-12-03 /pmc/articles/PMC4271567/ /pubmed/25474441 http://dx.doi.org/10.1186/1471-2105-15-S15-S7 Text en Copyright © 2014 Bevilacqua et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Proceedings Bevilacqua, Vitoantonio Pietroleonardo, Nicola Giannino, Ely Ignazio Stroppa, Fabio Simone, Domenico Pesole, Graziano Picardi, Ernesto EasyCluster2: an improved tool for clustering and assembling long transcriptome reads
title	EasyCluster2: an improved tool for clustering and assembling long transcriptome reads
title_full	EasyCluster2: an improved tool for clustering and assembling long transcriptome reads
title_fullStr	EasyCluster2: an improved tool for clustering and assembling long transcriptome reads
title_full_unstemmed	EasyCluster2: an improved tool for clustering and assembling long transcriptome reads
title_short	EasyCluster2: an improved tool for clustering and assembling long transcriptome reads
title_sort	easycluster2: an improved tool for clustering and assembling long transcriptome reads
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271567/ https://www.ncbi.nlm.nih.gov/pubmed/25474441 http://dx.doi.org/10.1186/1471-2105-15-S15-S7
work_keys_str_mv	AT bevilacquavitoantonio easycluster2animprovedtoolforclusteringandassemblinglongtranscriptomereads AT pietroleonardonicola easycluster2animprovedtoolforclusteringandassemblinglongtranscriptomereads AT gianninoelyignazio easycluster2animprovedtoolforclusteringandassemblinglongtranscriptomereads AT stroppafabio easycluster2animprovedtoolforclusteringandassemblinglongtranscriptomereads AT simonedomenico easycluster2animprovedtoolforclusteringandassemblinglongtranscriptomereads AT pesolegraziano easycluster2animprovedtoolforclusteringandassemblinglongtranscriptomereads AT picardiernesto easycluster2animprovedtoolforclusteringandassemblinglongtranscriptomereads

EasyCluster2: an improved tool for clustering and assembling long transcriptome reads

Ejemplares similares