Cargando…

Segmentor3IsBack: an R package for the fast and exact segmentation of Seq-data

BACKGROUND: Change point problems arise in many genomic analyses such as the detection of copy number variations or the detection of transcribed regions. The expanding Next Generation Sequencing technologies now allow to locate change points at the nucleotide resolution. RESULTS: Because of its comp...

Descripción completa

Detalles Bibliográficos
Autores principales: Cleynen, Alice, Koskas, Michel, Lebarbier, Emilie, Rigaill, Guillem, Robin, Stéphane
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3977952/
https://www.ncbi.nlm.nih.gov/pubmed/24612691
http://dx.doi.org/10.1186/1748-7188-9-6
_version_ 1782310484021608448
author Cleynen, Alice
Koskas, Michel
Lebarbier, Emilie
Rigaill, Guillem
Robin, Stéphane
author_facet Cleynen, Alice
Koskas, Michel
Lebarbier, Emilie
Rigaill, Guillem
Robin, Stéphane
author_sort Cleynen, Alice
collection PubMed
description BACKGROUND: Change point problems arise in many genomic analyses such as the detection of copy number variations or the detection of transcribed regions. The expanding Next Generation Sequencing technologies now allow to locate change points at the nucleotide resolution. RESULTS: Because of its complexity which is almost linear in the sequence length when the maximal number of segments is constant, and as its performance had been acknowledged for microarrays, we propose to use the Pruned Dynamic Programming algorithm for Seq-experiment outputs. This requires the adaptation of the algorithm to the negative binomial distribution with which we model the data. We show that if the dispersion in the signal is known, the PDP algorithm can be used, and we provide an estimator for this dispersion. We describe a compression framework which reduces the time complexity without modifying the accuracy of the segmentation. We propose to estimate the number of segments via a penalized likelihood criterion. We illustrate the performance of the proposed methodology on RNA-Seq data. CONCLUSIONS: We illustrate the results of our approach on a real dataset and show its good performance. Our algorithm is available as an R package on the CRAN repository.
format Online
Article
Text
id pubmed-3977952
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39779522014-04-21 Segmentor3IsBack: an R package for the fast and exact segmentation of Seq-data Cleynen, Alice Koskas, Michel Lebarbier, Emilie Rigaill, Guillem Robin, Stéphane Algorithms Mol Biol Software Article BACKGROUND: Change point problems arise in many genomic analyses such as the detection of copy number variations or the detection of transcribed regions. The expanding Next Generation Sequencing technologies now allow to locate change points at the nucleotide resolution. RESULTS: Because of its complexity which is almost linear in the sequence length when the maximal number of segments is constant, and as its performance had been acknowledged for microarrays, we propose to use the Pruned Dynamic Programming algorithm for Seq-experiment outputs. This requires the adaptation of the algorithm to the negative binomial distribution with which we model the data. We show that if the dispersion in the signal is known, the PDP algorithm can be used, and we provide an estimator for this dispersion. We describe a compression framework which reduces the time complexity without modifying the accuracy of the segmentation. We propose to estimate the number of segments via a penalized likelihood criterion. We illustrate the performance of the proposed methodology on RNA-Seq data. CONCLUSIONS: We illustrate the results of our approach on a real dataset and show its good performance. Our algorithm is available as an R package on the CRAN repository. BioMed Central 2014-03-10 /pmc/articles/PMC3977952/ /pubmed/24612691 http://dx.doi.org/10.1186/1748-7188-9-6 Text en Copyright © 2014 Cleynen et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Article
Cleynen, Alice
Koskas, Michel
Lebarbier, Emilie
Rigaill, Guillem
Robin, Stéphane
Segmentor3IsBack: an R package for the fast and exact segmentation of Seq-data
title Segmentor3IsBack: an R package for the fast and exact segmentation of Seq-data
title_full Segmentor3IsBack: an R package for the fast and exact segmentation of Seq-data
title_fullStr Segmentor3IsBack: an R package for the fast and exact segmentation of Seq-data
title_full_unstemmed Segmentor3IsBack: an R package for the fast and exact segmentation of Seq-data
title_short Segmentor3IsBack: an R package for the fast and exact segmentation of Seq-data
title_sort segmentor3isback: an r package for the fast and exact segmentation of seq-data
topic Software Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3977952/
https://www.ncbi.nlm.nih.gov/pubmed/24612691
http://dx.doi.org/10.1186/1748-7188-9-6
work_keys_str_mv AT cleynenalice segmentor3isbackanrpackageforthefastandexactsegmentationofseqdata
AT koskasmichel segmentor3isbackanrpackageforthefastandexactsegmentationofseqdata
AT lebarbieremilie segmentor3isbackanrpackageforthefastandexactsegmentationofseqdata
AT rigaillguillem segmentor3isbackanrpackageforthefastandexactsegmentationofseqdata
AT robinstephane segmentor3isbackanrpackageforthefastandexactsegmentationofseqdata