Cargando…

Evaluating approaches to find exon chains based on long reads

Transcript prediction can be modeled as a graph problem where exons are modeled as nodes and reads spanning two or more exons are modeled as exon chains. Pacific Biosciences third-generation sequencing technology produces significantly longer reads than earlier second-generation sequencing technolog...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kuosmanen, Anna, Norri, Tuukka, Mäkinen, Veli
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2017
Materias:	Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5952954/ https://www.ncbi.nlm.nih.gov/pubmed/28069635 http://dx.doi.org/10.1093/bib/bbw137

_version_	1783323289528041472
author	Kuosmanen, Anna Norri, Tuukka Mäkinen, Veli
author_facet	Kuosmanen, Anna Norri, Tuukka Mäkinen, Veli
author_sort	Kuosmanen, Anna
collection	PubMed
description	Transcript prediction can be modeled as a graph problem where exons are modeled as nodes and reads spanning two or more exons are modeled as exon chains. Pacific Biosciences third-generation sequencing technology produces significantly longer reads than earlier second-generation sequencing technologies, which gives valuable information about longer exon chains in a graph. However, with the high error rates of third-generation sequencing, aligning long reads correctly around the splice sites is a challenging task. Incorrect alignments lead to spurious nodes and arcs in the graph, which in turn lead to incorrect transcript predictions. We survey several approaches to find the exon chains corresponding to long reads in a splicing graph, and experimentally study the performance of these methods using simulated data to allow for sensitivity/precision analysis. Our experiments show that short reads from second-generation sequencing can be used to significantly improve exon chain correctness either by error-correcting the long reads before splicing graph creation, or by using them to create a splicing graph on which the long-read alignments are then projected. We also study the memory and time consumption of various modules, and show that accurate exon chains lead to significantly increased transcript prediction accuracy. Availability: The simulated data and in-house scripts used for this article are available at http://www.cs.helsinki.fi/group/gsa/exon-chains/exon-chains-bib.tar.bz2.
format	Online Article Text
id	pubmed-5952954
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-59529542018-05-18 Evaluating approaches to find exon chains based on long reads Kuosmanen, Anna Norri, Tuukka Mäkinen, Veli Brief Bioinform Papers Transcript prediction can be modeled as a graph problem where exons are modeled as nodes and reads spanning two or more exons are modeled as exon chains. Pacific Biosciences third-generation sequencing technology produces significantly longer reads than earlier second-generation sequencing technologies, which gives valuable information about longer exon chains in a graph. However, with the high error rates of third-generation sequencing, aligning long reads correctly around the splice sites is a challenging task. Incorrect alignments lead to spurious nodes and arcs in the graph, which in turn lead to incorrect transcript predictions. We survey several approaches to find the exon chains corresponding to long reads in a splicing graph, and experimentally study the performance of these methods using simulated data to allow for sensitivity/precision analysis. Our experiments show that short reads from second-generation sequencing can be used to significantly improve exon chain correctness either by error-correcting the long reads before splicing graph creation, or by using them to create a splicing graph on which the long-read alignments are then projected. We also study the memory and time consumption of various modules, and show that accurate exon chains lead to significantly increased transcript prediction accuracy. Availability: The simulated data and in-house scripts used for this article are available at http://www.cs.helsinki.fi/group/gsa/exon-chains/exon-chains-bib.tar.bz2. Oxford University Press 2017-01-09 /pmc/articles/PMC5952954/ /pubmed/28069635 http://dx.doi.org/10.1093/bib/bbw137 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Papers Kuosmanen, Anna Norri, Tuukka Mäkinen, Veli Evaluating approaches to find exon chains based on long reads
title	Evaluating approaches to find exon chains based on long reads
title_full	Evaluating approaches to find exon chains based on long reads
title_fullStr	Evaluating approaches to find exon chains based on long reads
title_full_unstemmed	Evaluating approaches to find exon chains based on long reads
title_short	Evaluating approaches to find exon chains based on long reads
title_sort	evaluating approaches to find exon chains based on long reads
topic	Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5952954/ https://www.ncbi.nlm.nih.gov/pubmed/28069635 http://dx.doi.org/10.1093/bib/bbw137
work_keys_str_mv	AT kuosmanenanna evaluatingapproachestofindexonchainsbasedonlongreads AT norrituukka evaluatingapproachestofindexonchainsbasedonlongreads AT makinenveli evaluatingapproachestofindexonchainsbasedonlongreads

Evaluating approaches to find exon chains based on long reads

Ejemplares similares