Cargando…
TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts
MOTIVATION: Long-read, single-molecule sequencing platforms hold great potential for isoform discovery and characterization of multi-exon transcripts. However, their high error rates are an obstacle to distinguishing novel transcript isoforms from sequencing artifacts. Therefore, we developed the pa...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6329999/ https://www.ncbi.nlm.nih.gov/pubmed/29912287 http://dx.doi.org/10.1093/bioinformatics/bty483 |
_version_ | 1783386908772007936 |
---|---|
author | Wyman, Dana Mortazavi, Ali |
author_facet | Wyman, Dana Mortazavi, Ali |
author_sort | Wyman, Dana |
collection | PubMed |
description | MOTIVATION: Long-read, single-molecule sequencing platforms hold great potential for isoform discovery and characterization of multi-exon transcripts. However, their high error rates are an obstacle to distinguishing novel transcript isoforms from sequencing artifacts. Therefore, we developed the package TranscriptClean to correct mismatches, microindels and noncanonical splice junctions in mapped transcripts using the reference genome while preserving known variants. RESULTS: Our method corrects nearly all mismatches and indels present in a publically available human PacBio Iso-seq dataset, and rescues 39% of noncanonical splice junctions. AVAILABILITY AND IMPLEMENTATION: All Python and R scripts used in this paper are available at https://github.com/dewyman/TranscriptClean. |
format | Online Article Text |
id | pubmed-6329999 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-63299992019-01-15 TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts Wyman, Dana Mortazavi, Ali Bioinformatics Applications Notes MOTIVATION: Long-read, single-molecule sequencing platforms hold great potential for isoform discovery and characterization of multi-exon transcripts. However, their high error rates are an obstacle to distinguishing novel transcript isoforms from sequencing artifacts. Therefore, we developed the package TranscriptClean to correct mismatches, microindels and noncanonical splice junctions in mapped transcripts using the reference genome while preserving known variants. RESULTS: Our method corrects nearly all mismatches and indels present in a publically available human PacBio Iso-seq dataset, and rescues 39% of noncanonical splice junctions. AVAILABILITY AND IMPLEMENTATION: All Python and R scripts used in this paper are available at https://github.com/dewyman/TranscriptClean. Oxford University Press 2019-01-15 2018-06-15 /pmc/articles/PMC6329999/ /pubmed/29912287 http://dx.doi.org/10.1093/bioinformatics/bty483 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Applications Notes Wyman, Dana Mortazavi, Ali TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts |
title | TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts |
title_full | TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts |
title_fullStr | TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts |
title_full_unstemmed | TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts |
title_short | TranscriptClean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts |
title_sort | transcriptclean: variant-aware correction of indels, mismatches and splice junctions in long-read transcripts |
topic | Applications Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6329999/ https://www.ncbi.nlm.nih.gov/pubmed/29912287 http://dx.doi.org/10.1093/bioinformatics/bty483 |
work_keys_str_mv | AT wymandana transcriptcleanvariantawarecorrectionofindelsmismatchesandsplicejunctionsinlongreadtranscripts AT mortazaviali transcriptcleanvariantawarecorrectionofindelsmismatchesandsplicejunctionsinlongreadtranscripts |