Cargando…

Semantic Assembly and Annotation of Draft RNAseq Transcripts without a Reference Genome

Transcriptomes are one of the first sources of high-throughput genomic data that have benefitted from the introduction of Next-Gen Sequencing. As sequencing technology becomes more accessible, transcriptome sequencing is applicable to multiple organisms for which genome sequences are unavailable. Cu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ptitsyn, Andrey, Temanni, Ramzi, Bouchard, Christelle, Anderson, Peter A. V.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4578894/ https://www.ncbi.nlm.nih.gov/pubmed/26393794 http://dx.doi.org/10.1371/journal.pone.0138006

_version_	1782391184884236288
author	Ptitsyn, Andrey Temanni, Ramzi Bouchard, Christelle Anderson, Peter A. V.
author_facet	Ptitsyn, Andrey Temanni, Ramzi Bouchard, Christelle Anderson, Peter A. V.
author_sort	Ptitsyn, Andrey
collection	PubMed
description	Transcriptomes are one of the first sources of high-throughput genomic data that have benefitted from the introduction of Next-Gen Sequencing. As sequencing technology becomes more accessible, transcriptome sequencing is applicable to multiple organisms for which genome sequences are unavailable. Currently all methods for de novo assembly are based on the concept of matching the nucleotide context overlapping between short fragments-reads. However, even short reads may still contain biologically relevant information which can be used as hints in guiding the assembly process. We propose a computational workflow for the reconstruction and functional annotation of expressed gene transcripts that does not require a reference genome sequence and can be tolerant to low coverage, high error rates and other issues that often lead to poor results of de novo assembly in studies of non-model organisms. We start with either raw sequences or the output of a context-based de novo transcriptome assembly. Instead of mapping reads to a reference genome or creating a completely unsupervised clustering of reads, we assemble the unknown transcriptome using nearest homologs from a public database as seeds. We consider even distant relations, indirectly linking protein-coding fragments to entire gene families in multiple distantly related genomes. The intended application of the proposed method is an additional step of semantic (based on relations between protein-coding fragments) scaffolding following traditional (i.e. based on sequence overlap) de novo assembly. The method we developed was effective in analysis of the jellyfish Cyanea capillata transcriptome and may be applicable in other studies of gene expression in species lacking a high quality reference genome sequence. Our algorithms are implemented in C and designed for parallel computation using a high-performance computer. The software is available free of charge via an open source license.
format	Online Article Text
id	pubmed-4578894
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-45788942015-10-01 Semantic Assembly and Annotation of Draft RNAseq Transcripts without a Reference Genome Ptitsyn, Andrey Temanni, Ramzi Bouchard, Christelle Anderson, Peter A. V. PLoS One Research Article Transcriptomes are one of the first sources of high-throughput genomic data that have benefitted from the introduction of Next-Gen Sequencing. As sequencing technology becomes more accessible, transcriptome sequencing is applicable to multiple organisms for which genome sequences are unavailable. Currently all methods for de novo assembly are based on the concept of matching the nucleotide context overlapping between short fragments-reads. However, even short reads may still contain biologically relevant information which can be used as hints in guiding the assembly process. We propose a computational workflow for the reconstruction and functional annotation of expressed gene transcripts that does not require a reference genome sequence and can be tolerant to low coverage, high error rates and other issues that often lead to poor results of de novo assembly in studies of non-model organisms. We start with either raw sequences or the output of a context-based de novo transcriptome assembly. Instead of mapping reads to a reference genome or creating a completely unsupervised clustering of reads, we assemble the unknown transcriptome using nearest homologs from a public database as seeds. We consider even distant relations, indirectly linking protein-coding fragments to entire gene families in multiple distantly related genomes. The intended application of the proposed method is an additional step of semantic (based on relations between protein-coding fragments) scaffolding following traditional (i.e. based on sequence overlap) de novo assembly. The method we developed was effective in analysis of the jellyfish Cyanea capillata transcriptome and may be applicable in other studies of gene expression in species lacking a high quality reference genome sequence. Our algorithms are implemented in C and designed for parallel computation using a high-performance computer. The software is available free of charge via an open source license. Public Library of Science 2015-09-22 /pmc/articles/PMC4578894/ /pubmed/26393794 http://dx.doi.org/10.1371/journal.pone.0138006 Text en © 2015 Ptitsyn et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Ptitsyn, Andrey Temanni, Ramzi Bouchard, Christelle Anderson, Peter A. V. Semantic Assembly and Annotation of Draft RNAseq Transcripts without a Reference Genome
title	Semantic Assembly and Annotation of Draft RNAseq Transcripts without a Reference Genome
title_full	Semantic Assembly and Annotation of Draft RNAseq Transcripts without a Reference Genome
title_fullStr	Semantic Assembly and Annotation of Draft RNAseq Transcripts without a Reference Genome
title_full_unstemmed	Semantic Assembly and Annotation of Draft RNAseq Transcripts without a Reference Genome
title_short	Semantic Assembly and Annotation of Draft RNAseq Transcripts without a Reference Genome
title_sort	semantic assembly and annotation of draft rnaseq transcripts without a reference genome
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4578894/ https://www.ncbi.nlm.nih.gov/pubmed/26393794 http://dx.doi.org/10.1371/journal.pone.0138006
work_keys_str_mv	AT ptitsynandrey semanticassemblyandannotationofdraftrnaseqtranscriptswithoutareferencegenome AT temanniramzi semanticassemblyandannotationofdraftrnaseqtranscriptswithoutareferencegenome AT bouchardchristelle semanticassemblyandannotationofdraftrnaseqtranscriptswithoutareferencegenome AT andersonpeterav semanticassemblyandannotationofdraftrnaseqtranscriptswithoutareferencegenome

Semantic Assembly and Annotation of Draft RNAseq Transcripts without a Reference Genome

Ejemplares similares