Cargando…

The PARA-suite: PAR-CLIP specific sequence read simulation and processing

BACKGROUND: Next-generation sequencing technologies have profoundly impacted biology over recent years. Experimental protocols, such as photoactivatable ribonucleoside-enhanced cross-linking and immunoprecipitation (PAR-CLIP), which identifies protein–RNA interactions on a genome-wide scale, commonl...

Descripción completa

Detalles Bibliográficos
Autores principales: Kloetgen, Andreas, Borkhardt, Arndt, Hoell, Jessica I., McHardy, Alice C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5088580/
https://www.ncbi.nlm.nih.gov/pubmed/27812418
http://dx.doi.org/10.7717/peerj.2619
_version_ 1782464123894759424
author Kloetgen, Andreas
Borkhardt, Arndt
Hoell, Jessica I.
McHardy, Alice C.
author_facet Kloetgen, Andreas
Borkhardt, Arndt
Hoell, Jessica I.
McHardy, Alice C.
author_sort Kloetgen, Andreas
collection PubMed
description BACKGROUND: Next-generation sequencing technologies have profoundly impacted biology over recent years. Experimental protocols, such as photoactivatable ribonucleoside-enhanced cross-linking and immunoprecipitation (PAR-CLIP), which identifies protein–RNA interactions on a genome-wide scale, commonly employ deep sequencing. With PAR-CLIP, the incorporation of photoactivatable nucleosides into nascent transcripts leads to high rates of specific nucleotide conversions during reverse transcription. So far, the specific properties of PAR-CLIP-derived sequencing reads have not been assessed in depth. METHODS: We here compared PAR-CLIP sequencing reads to regular transcriptome sequencing reads (RNA-Seq) to identify distinctive properties that are relevant for reference-based read alignment of PAR-CLIP datasets. We developed a set of freely available tools for PAR-CLIP data analysis, called the PAR-CLIP analyzer suite (PARA-suite). The PARA-suite includes error model inference, PAR-CLIP read simulation based on PAR-CLIP specific properties, a full read alignment pipeline with a modified Burrows–Wheeler Aligner algorithm and CLIP read clustering for binding site detection. RESULTS: We show that differences in the error profiles of PAR-CLIP reads relative to regular transcriptome sequencing reads (RNA-Seq) make a distinct processing advantageous. We examine the alignment accuracy of commonly applied read aligners on 10 simulated PAR-CLIP datasets using different parameter settings and identified the most accurate setup among those read aligners. We demonstrate the performance of the PARA-suite in conjunction with different binding site detection algorithms on several real PAR-CLIP and HITS-CLIP datasets. Our processing pipeline allowed the improvement of both alignment and binding site detection accuracy. AVAILABILITY: The PARA-suite toolkit and the PARA-suite aligner are available at https://github.com/akloetgen/PARA-suite and https://github.com/akloetgen/PARA-suite_aligner, respectively, under the GNU GPLv3 license.
format Online
Article
Text
id pubmed-5088580
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-50885802016-11-03 The PARA-suite: PAR-CLIP specific sequence read simulation and processing Kloetgen, Andreas Borkhardt, Arndt Hoell, Jessica I. McHardy, Alice C. PeerJ Bioinformatics BACKGROUND: Next-generation sequencing technologies have profoundly impacted biology over recent years. Experimental protocols, such as photoactivatable ribonucleoside-enhanced cross-linking and immunoprecipitation (PAR-CLIP), which identifies protein–RNA interactions on a genome-wide scale, commonly employ deep sequencing. With PAR-CLIP, the incorporation of photoactivatable nucleosides into nascent transcripts leads to high rates of specific nucleotide conversions during reverse transcription. So far, the specific properties of PAR-CLIP-derived sequencing reads have not been assessed in depth. METHODS: We here compared PAR-CLIP sequencing reads to regular transcriptome sequencing reads (RNA-Seq) to identify distinctive properties that are relevant for reference-based read alignment of PAR-CLIP datasets. We developed a set of freely available tools for PAR-CLIP data analysis, called the PAR-CLIP analyzer suite (PARA-suite). The PARA-suite includes error model inference, PAR-CLIP read simulation based on PAR-CLIP specific properties, a full read alignment pipeline with a modified Burrows–Wheeler Aligner algorithm and CLIP read clustering for binding site detection. RESULTS: We show that differences in the error profiles of PAR-CLIP reads relative to regular transcriptome sequencing reads (RNA-Seq) make a distinct processing advantageous. We examine the alignment accuracy of commonly applied read aligners on 10 simulated PAR-CLIP datasets using different parameter settings and identified the most accurate setup among those read aligners. We demonstrate the performance of the PARA-suite in conjunction with different binding site detection algorithms on several real PAR-CLIP and HITS-CLIP datasets. Our processing pipeline allowed the improvement of both alignment and binding site detection accuracy. AVAILABILITY: The PARA-suite toolkit and the PARA-suite aligner are available at https://github.com/akloetgen/PARA-suite and https://github.com/akloetgen/PARA-suite_aligner, respectively, under the GNU GPLv3 license. PeerJ Inc. 2016-10-27 /pmc/articles/PMC5088580/ /pubmed/27812418 http://dx.doi.org/10.7717/peerj.2619 Text en ©2016 Kloetgen et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Kloetgen, Andreas
Borkhardt, Arndt
Hoell, Jessica I.
McHardy, Alice C.
The PARA-suite: PAR-CLIP specific sequence read simulation and processing
title The PARA-suite: PAR-CLIP specific sequence read simulation and processing
title_full The PARA-suite: PAR-CLIP specific sequence read simulation and processing
title_fullStr The PARA-suite: PAR-CLIP specific sequence read simulation and processing
title_full_unstemmed The PARA-suite: PAR-CLIP specific sequence read simulation and processing
title_short The PARA-suite: PAR-CLIP specific sequence read simulation and processing
title_sort para-suite: par-clip specific sequence read simulation and processing
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5088580/
https://www.ncbi.nlm.nih.gov/pubmed/27812418
http://dx.doi.org/10.7717/peerj.2619
work_keys_str_mv AT kloetgenandreas theparasuiteparclipspecificsequencereadsimulationandprocessing
AT borkhardtarndt theparasuiteparclipspecificsequencereadsimulationandprocessing
AT hoelljessicai theparasuiteparclipspecificsequencereadsimulationandprocessing
AT mchardyalicec theparasuiteparclipspecificsequencereadsimulationandprocessing
AT kloetgenandreas parasuiteparclipspecificsequencereadsimulationandprocessing
AT borkhardtarndt parasuiteparclipspecificsequencereadsimulationandprocessing
AT hoelljessicai parasuiteparclipspecificsequencereadsimulationandprocessing
AT mchardyalicec parasuiteparclipspecificsequencereadsimulationandprocessing