Cargando…

ChopStitch: exon annotation and splice graph construction using transcriptome assembly and whole genome sequencing data

MOTIVATION: Sequencing studies on non-model organisms often interrogate both genomes and transcriptomes with massive amounts of short sequences. Such studies require de novo analysis tools and techniques, when the species and closely related species lack high quality reference resources. For certain...

Descripción completa

Detalles Bibliográficos
Autores principales: Khan, Hamza, Mohamadi, Hamid, Vandervalk, Benjamin P, Warren, Rene L, Chu, Justin, Birol, Inanc
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5946899/
https://www.ncbi.nlm.nih.gov/pubmed/29300846
http://dx.doi.org/10.1093/bioinformatics/btx839
_version_ 1783322270483087360
author Khan, Hamza
Mohamadi, Hamid
Vandervalk, Benjamin P
Warren, Rene L
Chu, Justin
Birol, Inanc
author_facet Khan, Hamza
Mohamadi, Hamid
Vandervalk, Benjamin P
Warren, Rene L
Chu, Justin
Birol, Inanc
author_sort Khan, Hamza
collection PubMed
description MOTIVATION: Sequencing studies on non-model organisms often interrogate both genomes and transcriptomes with massive amounts of short sequences. Such studies require de novo analysis tools and techniques, when the species and closely related species lack high quality reference resources. For certain applications such as de novo annotation, information on putative exons and alternative splicing may be desirable. RESULTS: Here we present ChopStitch, a new method for finding putative exons de novo and constructing splice graphs using an assembled transcriptome and whole genome shotgun sequencing (WGSS) data. ChopStitch identifies exon-exon boundaries in de novo assembled RNA-Seq data with the help of a Bloom filter that represents the k-mer spectrum of WGSS reads. The algorithm also accounts for base substitutions in transcript sequences that may be derived from sequencing or assembly errors, haplotype variations, or putative RNA editing events. The primary output of our tool is a FASTA file containing putative exons. Further, exon edges are interrogated for alternative exon-exon boundaries to detect transcript isoforms, which are represented as splice graphs in DOT output format. AVAILABILITY AND IMPLEMENTATION: ChopStitch is written in Python and C++ and is released under the GPL license. It is freely available at https://github.com/bcgsc/ChopStitch. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5946899
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-59468992018-05-16 ChopStitch: exon annotation and splice graph construction using transcriptome assembly and whole genome sequencing data Khan, Hamza Mohamadi, Hamid Vandervalk, Benjamin P Warren, Rene L Chu, Justin Birol, Inanc Bioinformatics Original Papers MOTIVATION: Sequencing studies on non-model organisms often interrogate both genomes and transcriptomes with massive amounts of short sequences. Such studies require de novo analysis tools and techniques, when the species and closely related species lack high quality reference resources. For certain applications such as de novo annotation, information on putative exons and alternative splicing may be desirable. RESULTS: Here we present ChopStitch, a new method for finding putative exons de novo and constructing splice graphs using an assembled transcriptome and whole genome shotgun sequencing (WGSS) data. ChopStitch identifies exon-exon boundaries in de novo assembled RNA-Seq data with the help of a Bloom filter that represents the k-mer spectrum of WGSS reads. The algorithm also accounts for base substitutions in transcript sequences that may be derived from sequencing or assembly errors, haplotype variations, or putative RNA editing events. The primary output of our tool is a FASTA file containing putative exons. Further, exon edges are interrogated for alternative exon-exon boundaries to detect transcript isoforms, which are represented as splice graphs in DOT output format. AVAILABILITY AND IMPLEMENTATION: ChopStitch is written in Python and C++ and is released under the GPL license. It is freely available at https://github.com/bcgsc/ChopStitch. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-05-15 2017-12-29 /pmc/articles/PMC5946899/ /pubmed/29300846 http://dx.doi.org/10.1093/bioinformatics/btx839 Text en © The Author(s) 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Khan, Hamza
Mohamadi, Hamid
Vandervalk, Benjamin P
Warren, Rene L
Chu, Justin
Birol, Inanc
ChopStitch: exon annotation and splice graph construction using transcriptome assembly and whole genome sequencing data
title ChopStitch: exon annotation and splice graph construction using transcriptome assembly and whole genome sequencing data
title_full ChopStitch: exon annotation and splice graph construction using transcriptome assembly and whole genome sequencing data
title_fullStr ChopStitch: exon annotation and splice graph construction using transcriptome assembly and whole genome sequencing data
title_full_unstemmed ChopStitch: exon annotation and splice graph construction using transcriptome assembly and whole genome sequencing data
title_short ChopStitch: exon annotation and splice graph construction using transcriptome assembly and whole genome sequencing data
title_sort chopstitch: exon annotation and splice graph construction using transcriptome assembly and whole genome sequencing data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5946899/
https://www.ncbi.nlm.nih.gov/pubmed/29300846
http://dx.doi.org/10.1093/bioinformatics/btx839
work_keys_str_mv AT khanhamza chopstitchexonannotationandsplicegraphconstructionusingtranscriptomeassemblyandwholegenomesequencingdata
AT mohamadihamid chopstitchexonannotationandsplicegraphconstructionusingtranscriptomeassemblyandwholegenomesequencingdata
AT vandervalkbenjaminp chopstitchexonannotationandsplicegraphconstructionusingtranscriptomeassemblyandwholegenomesequencingdata
AT warrenrenel chopstitchexonannotationandsplicegraphconstructionusingtranscriptomeassemblyandwholegenomesequencingdata
AT chujustin chopstitchexonannotationandsplicegraphconstructionusingtranscriptomeassemblyandwholegenomesequencingdata
AT birolinanc chopstitchexonannotationandsplicegraphconstructionusingtranscriptomeassemblyandwholegenomesequencingdata