Cargando…

Validation of Splicing Events in Transcriptome Sequencing Data

Genomic alignments of sequenced cellular messenger RNA contain gapped alignments which are interpreted as consequence of intron removal. The resulting gap-sites, genomic locations of alignment gaps, are landmarks representing potential splice-sites. As alignment algorithms report gap-sites with a co...

Descripción completa

Detalles Bibliográficos
Autores principales: Kaisers, Wolfgang, Ptok, Johannes, Schwender, Holger, Schaal, Heiner
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5485934/
https://www.ncbi.nlm.nih.gov/pubmed/28545234
http://dx.doi.org/10.3390/ijms18061110
_version_ 1783246160580837376
author Kaisers, Wolfgang
Ptok, Johannes
Schwender, Holger
Schaal, Heiner
author_facet Kaisers, Wolfgang
Ptok, Johannes
Schwender, Holger
Schaal, Heiner
author_sort Kaisers, Wolfgang
collection PubMed
description Genomic alignments of sequenced cellular messenger RNA contain gapped alignments which are interpreted as consequence of intron removal. The resulting gap-sites, genomic locations of alignment gaps, are landmarks representing potential splice-sites. As alignment algorithms report gap-sites with a considerable false discovery rate, validations are required. We describe two quality scores, gap quality score (gqs) and weighted gap information score (wgis), developed for validation of putative splicing events: While gqs solely relies on alignment data wgis additionally considers information from the genomic sequence. FASTQ files obtained from 54 human dermal fibroblast samples were aligned against the human genome (GRCh38) using TopHat and STAR aligner. Statistical properties of gap-sites validated by gqs and wgis were evaluated by their sequence similarity to known exon-intron borders. Within the 54 samples, TopHat identifies 1,000,380 and STAR reports 6,487,577 gap-sites. Due to the lack of strand information, however, the percentage of identified GT-AG gap-sites is rather low. While gap-sites from TopHat contain ≈89% GT-AG, gap-sites from STAR only contain ≈42% GT-AG dinucleotide pairs in merged data from 54 fibroblast samples. Validation with gqs yields 156,251 gap-sites from TopHat alignments and 166,294 from STAR alignments. Validation with wgis yields 770,327 gap-sites from TopHat alignments and 1,065,596 from STAR alignments. Both alignment algorithms, TopHat and STAR, report gap-sites with considerable false discovery rate, which can drastically be reduced by validation with gqs and wgis.
format Online
Article
Text
id pubmed-5485934
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-54859342017-06-29 Validation of Splicing Events in Transcriptome Sequencing Data Kaisers, Wolfgang Ptok, Johannes Schwender, Holger Schaal, Heiner Int J Mol Sci Article Genomic alignments of sequenced cellular messenger RNA contain gapped alignments which are interpreted as consequence of intron removal. The resulting gap-sites, genomic locations of alignment gaps, are landmarks representing potential splice-sites. As alignment algorithms report gap-sites with a considerable false discovery rate, validations are required. We describe two quality scores, gap quality score (gqs) and weighted gap information score (wgis), developed for validation of putative splicing events: While gqs solely relies on alignment data wgis additionally considers information from the genomic sequence. FASTQ files obtained from 54 human dermal fibroblast samples were aligned against the human genome (GRCh38) using TopHat and STAR aligner. Statistical properties of gap-sites validated by gqs and wgis were evaluated by their sequence similarity to known exon-intron borders. Within the 54 samples, TopHat identifies 1,000,380 and STAR reports 6,487,577 gap-sites. Due to the lack of strand information, however, the percentage of identified GT-AG gap-sites is rather low. While gap-sites from TopHat contain ≈89% GT-AG, gap-sites from STAR only contain ≈42% GT-AG dinucleotide pairs in merged data from 54 fibroblast samples. Validation with gqs yields 156,251 gap-sites from TopHat alignments and 166,294 from STAR alignments. Validation with wgis yields 770,327 gap-sites from TopHat alignments and 1,065,596 from STAR alignments. Both alignment algorithms, TopHat and STAR, report gap-sites with considerable false discovery rate, which can drastically be reduced by validation with gqs and wgis. MDPI 2017-05-23 /pmc/articles/PMC5485934/ /pubmed/28545234 http://dx.doi.org/10.3390/ijms18061110 Text en © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kaisers, Wolfgang
Ptok, Johannes
Schwender, Holger
Schaal, Heiner
Validation of Splicing Events in Transcriptome Sequencing Data
title Validation of Splicing Events in Transcriptome Sequencing Data
title_full Validation of Splicing Events in Transcriptome Sequencing Data
title_fullStr Validation of Splicing Events in Transcriptome Sequencing Data
title_full_unstemmed Validation of Splicing Events in Transcriptome Sequencing Data
title_short Validation of Splicing Events in Transcriptome Sequencing Data
title_sort validation of splicing events in transcriptome sequencing data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5485934/
https://www.ncbi.nlm.nih.gov/pubmed/28545234
http://dx.doi.org/10.3390/ijms18061110
work_keys_str_mv AT kaiserswolfgang validationofsplicingeventsintranscriptomesequencingdata
AT ptokjohannes validationofsplicingeventsintranscriptomesequencingdata
AT schwenderholger validationofsplicingeventsintranscriptomesequencingdata
AT schaalheiner validationofsplicingeventsintranscriptomesequencingdata