Cargando…

Detecting Statistically Significant Common Insertion Sites in Retroviral Insertional Mutagenesis Screens

Retroviral insertional mutagenesis screens, which identify genes involved in tumor development in mice, have yielded a substantial number of retroviral integration sites, and this number is expected to grow substantially due to the introduction of high-throughput screening techniques. The data of va...

Descripción completa

Detalles Bibliográficos
Autores principales: de Ridder, Jeroen, Uren, Anthony, Kool, Jaap, Reinders, Marcel, Wessels, Lodewyk
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1676030/
https://www.ncbi.nlm.nih.gov/pubmed/17154714
http://dx.doi.org/10.1371/journal.pcbi.0020166
_version_ 1782131140074668032
author de Ridder, Jeroen
Uren, Anthony
Kool, Jaap
Reinders, Marcel
Wessels, Lodewyk
author_facet de Ridder, Jeroen
Uren, Anthony
Kool, Jaap
Reinders, Marcel
Wessels, Lodewyk
author_sort de Ridder, Jeroen
collection PubMed
description Retroviral insertional mutagenesis screens, which identify genes involved in tumor development in mice, have yielded a substantial number of retroviral integration sites, and this number is expected to grow substantially due to the introduction of high-throughput screening techniques. The data of various retroviral insertional mutagenesis screens are compiled in the publicly available Retroviral Tagged Cancer Gene Database (RTCGD). Integrally analyzing these screens for the presence of common insertion sites (CISs, i.e., regions in the genome that have been hit by viral insertions in multiple independent tumors significantly more than expected by chance) requires an approach that corrects for the increased probability of finding false CISs as the amount of available data increases. Moreover, significance estimates of CISs should be established taking into account both the noise, arising from the random nature of the insertion process, as well as the bias, stemming from preferential insertion sites present in the genome and the data retrieval methodology. We introduce a framework, the kernel convolution (KC) framework, to find CISs in a noisy and biased environment using a predefined significance level while controlling the family-wise error (FWE) (the probability of detecting false CISs). Where previous methods use one, two, or three predetermined fixed scales, our method is capable of operating at any biologically relevant scale. This creates the possibility to analyze the CISs in a scale space by varying the width of the CISs, providing new insights in the behavior of CISs across multiple scales. Our method also features the possibility of including models for background bias. Using simulated data, we evaluate the KC framework using three kernel functions, the Gaussian, triangular, and rectangular kernel function. We applied the Gaussian KC to the data from the combined set of screens in the RTCGD and found that 53% of the CISs do not reach the significance threshold in this combined setting. Still, with the FWE under control, application of our method resulted in the discovery of eight novel CISs, which each have a probability less than 5% of being false detections.
format Text
id pubmed-1676030
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-16760302006-12-08 Detecting Statistically Significant Common Insertion Sites in Retroviral Insertional Mutagenesis Screens de Ridder, Jeroen Uren, Anthony Kool, Jaap Reinders, Marcel Wessels, Lodewyk PLoS Comput Biol Research Article Retroviral insertional mutagenesis screens, which identify genes involved in tumor development in mice, have yielded a substantial number of retroviral integration sites, and this number is expected to grow substantially due to the introduction of high-throughput screening techniques. The data of various retroviral insertional mutagenesis screens are compiled in the publicly available Retroviral Tagged Cancer Gene Database (RTCGD). Integrally analyzing these screens for the presence of common insertion sites (CISs, i.e., regions in the genome that have been hit by viral insertions in multiple independent tumors significantly more than expected by chance) requires an approach that corrects for the increased probability of finding false CISs as the amount of available data increases. Moreover, significance estimates of CISs should be established taking into account both the noise, arising from the random nature of the insertion process, as well as the bias, stemming from preferential insertion sites present in the genome and the data retrieval methodology. We introduce a framework, the kernel convolution (KC) framework, to find CISs in a noisy and biased environment using a predefined significance level while controlling the family-wise error (FWE) (the probability of detecting false CISs). Where previous methods use one, two, or three predetermined fixed scales, our method is capable of operating at any biologically relevant scale. This creates the possibility to analyze the CISs in a scale space by varying the width of the CISs, providing new insights in the behavior of CISs across multiple scales. Our method also features the possibility of including models for background bias. Using simulated data, we evaluate the KC framework using three kernel functions, the Gaussian, triangular, and rectangular kernel function. We applied the Gaussian KC to the data from the combined set of screens in the RTCGD and found that 53% of the CISs do not reach the significance threshold in this combined setting. Still, with the FWE under control, application of our method resulted in the discovery of eight novel CISs, which each have a probability less than 5% of being false detections. Public Library of Science 2006-12 2006-12-08 /pmc/articles/PMC1676030/ /pubmed/17154714 http://dx.doi.org/10.1371/journal.pcbi.0020166 Text en © 2006 de Ridder et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
de Ridder, Jeroen
Uren, Anthony
Kool, Jaap
Reinders, Marcel
Wessels, Lodewyk
Detecting Statistically Significant Common Insertion Sites in Retroviral Insertional Mutagenesis Screens
title Detecting Statistically Significant Common Insertion Sites in Retroviral Insertional Mutagenesis Screens
title_full Detecting Statistically Significant Common Insertion Sites in Retroviral Insertional Mutagenesis Screens
title_fullStr Detecting Statistically Significant Common Insertion Sites in Retroviral Insertional Mutagenesis Screens
title_full_unstemmed Detecting Statistically Significant Common Insertion Sites in Retroviral Insertional Mutagenesis Screens
title_short Detecting Statistically Significant Common Insertion Sites in Retroviral Insertional Mutagenesis Screens
title_sort detecting statistically significant common insertion sites in retroviral insertional mutagenesis screens
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1676030/
https://www.ncbi.nlm.nih.gov/pubmed/17154714
http://dx.doi.org/10.1371/journal.pcbi.0020166
work_keys_str_mv AT deridderjeroen detectingstatisticallysignificantcommoninsertionsitesinretroviralinsertionalmutagenesisscreens
AT urenanthony detectingstatisticallysignificantcommoninsertionsitesinretroviralinsertionalmutagenesisscreens
AT kooljaap detectingstatisticallysignificantcommoninsertionsitesinretroviralinsertionalmutagenesisscreens
AT reindersmarcel detectingstatisticallysignificantcommoninsertionsitesinretroviralinsertionalmutagenesisscreens
AT wesselslodewyk detectingstatisticallysignificantcommoninsertionsitesinretroviralinsertionalmutagenesisscreens