Cargando…

TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts

MOTIVATION: Long-read, single-molecule sequencing platforms hold great potential for isoform discovery and characterization of multi-exon transcripts. However, their high error rates are an obstacle to distinguishing novel transcript isoforms from sequencing artifacts. Therefore, we developed the pa...

Descripción completa

Detalles Bibliográficos
Autores principales: Wyman, Dana, Mortazavi, Ali
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6329999/
https://www.ncbi.nlm.nih.gov/pubmed/29912287
http://dx.doi.org/10.1093/bioinformatics/bty483
_version_ 1783386908772007936
author Wyman, Dana
Mortazavi, Ali
author_facet Wyman, Dana
Mortazavi, Ali
author_sort Wyman, Dana
collection PubMed
description MOTIVATION: Long-read, single-molecule sequencing platforms hold great potential for isoform discovery and characterization of multi-exon transcripts. However, their high error rates are an obstacle to distinguishing novel transcript isoforms from sequencing artifacts. Therefore, we developed the package TranscriptClean to correct mismatches, microindels and noncanonical splice junctions in mapped transcripts using the reference genome while preserving known variants. RESULTS: Our method corrects nearly all mismatches and indels present in a publically available human PacBio Iso-seq dataset, and rescues 39% of noncanonical splice junctions. AVAILABILITY AND IMPLEMENTATION: All Python and R scripts used in this paper are available at https://github.com/dewyman/TranscriptClean.
format Online
Article
Text
id pubmed-6329999
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-63299992019-01-15 TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts Wyman, Dana Mortazavi, Ali Bioinformatics Applications Notes MOTIVATION: Long-read, single-molecule sequencing platforms hold great potential for isoform discovery and characterization of multi-exon transcripts. However, their high error rates are an obstacle to distinguishing novel transcript isoforms from sequencing artifacts. Therefore, we developed the package TranscriptClean to correct mismatches, microindels and noncanonical splice junctions in mapped transcripts using the reference genome while preserving known variants. RESULTS: Our method corrects nearly all mismatches and indels present in a publically available human PacBio Iso-seq dataset, and rescues 39% of noncanonical splice junctions. AVAILABILITY AND IMPLEMENTATION: All Python and R scripts used in this paper are available at https://github.com/dewyman/TranscriptClean. Oxford University Press 2019-01-15 2018-06-15 /pmc/articles/PMC6329999/ /pubmed/29912287 http://dx.doi.org/10.1093/bioinformatics/bty483 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Notes
Wyman, Dana
Mortazavi, Ali
TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts
title TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts
title_full TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts
title_fullStr TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts
title_full_unstemmed TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts
title_short TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts
title_sort transcriptclean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6329999/
https://www.ncbi.nlm.nih.gov/pubmed/29912287
http://dx.doi.org/10.1093/bioinformatics/bty483
work_keys_str_mv AT wymandana transcriptcleanvariantawarecorrectionofindelsmismatchesandsplicejunctionsinlongreadtranscripts
AT mortazaviali transcriptcleanvariantawarecorrectionofindelsmismatchesandsplicejunctionsinlongreadtranscripts