Cargando…

The rainfall plot: its motivation, characteristics and pitfalls

BACKGROUND: A visualization referred to as rainfall plot has recently gained popularity in genome data analysis. The plot is mostly used for illustrating the distribution of somatic cancer mutations along a reference genome, typically aiming to identify mutation hotspots. In general terms, the rainf...

Descripción completa

Detalles Bibliográficos
Autores principales: Domanska, Diana, Vodák, Daniel, Lund-Andersen, Christin, Salvatore, Stefania, Hovig, Eivind, Sandve, Geir Kjetil
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5437519/
https://www.ncbi.nlm.nih.gov/pubmed/28521741
http://dx.doi.org/10.1186/s12859-017-1679-8
_version_ 1783237601329676288
author Domanska, Diana
Vodák, Daniel
Lund-Andersen, Christin
Salvatore, Stefania
Hovig, Eivind
Sandve, Geir Kjetil
author_facet Domanska, Diana
Vodák, Daniel
Lund-Andersen, Christin
Salvatore, Stefania
Hovig, Eivind
Sandve, Geir Kjetil
author_sort Domanska, Diana
collection PubMed
description BACKGROUND: A visualization referred to as rainfall plot has recently gained popularity in genome data analysis. The plot is mostly used for illustrating the distribution of somatic cancer mutations along a reference genome, typically aiming to identify mutation hotspots. In general terms, the rainfall plot can be seen as a scatter plot showing the location of events on the x-axis versus the distance between consecutive events on the y-axis. Despite its frequent use, the motivation for applying this particular visualization and the appropriateness of its usage have never been critically addressed in detail. RESULTS: We show that the rainfall plot allows visual detection even for events occurring at high frequency over very short distances. In addition, event clustering at multiple scales may be detected as distinct horizontal bands in rainfall plots. At the same time, due to the limited size of standard figures, rainfall plots might suffer from inability to distinguish overlapping events, especially when multiple datasets are plotted in the same figure. We demonstrate the consequences of plot congestion, which results in obscured visual data interpretations. CONCLUSIONS: This work provides the first comprehensive survey of the characteristics and proper usage of rainfall plots. We find that the rainfall plot is able to convey a large amount of information without any need for parameterization or tuning. However, we also demonstrate how plot congestion and the use of a logarithmic y-axis may result in obscured visual data interpretations. To aid the productive utilization of rainfall plots, we demonstrate their characteristics and potential pitfalls using both simulated and real data, and provide a set of practical guidelines for their proper interpretation and usage. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1679-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5437519
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54375192017-05-19 The rainfall plot: its motivation, characteristics and pitfalls Domanska, Diana Vodák, Daniel Lund-Andersen, Christin Salvatore, Stefania Hovig, Eivind Sandve, Geir Kjetil BMC Bioinformatics Research Article BACKGROUND: A visualization referred to as rainfall plot has recently gained popularity in genome data analysis. The plot is mostly used for illustrating the distribution of somatic cancer mutations along a reference genome, typically aiming to identify mutation hotspots. In general terms, the rainfall plot can be seen as a scatter plot showing the location of events on the x-axis versus the distance between consecutive events on the y-axis. Despite its frequent use, the motivation for applying this particular visualization and the appropriateness of its usage have never been critically addressed in detail. RESULTS: We show that the rainfall plot allows visual detection even for events occurring at high frequency over very short distances. In addition, event clustering at multiple scales may be detected as distinct horizontal bands in rainfall plots. At the same time, due to the limited size of standard figures, rainfall plots might suffer from inability to distinguish overlapping events, especially when multiple datasets are plotted in the same figure. We demonstrate the consequences of plot congestion, which results in obscured visual data interpretations. CONCLUSIONS: This work provides the first comprehensive survey of the characteristics and proper usage of rainfall plots. We find that the rainfall plot is able to convey a large amount of information without any need for parameterization or tuning. However, we also demonstrate how plot congestion and the use of a logarithmic y-axis may result in obscured visual data interpretations. To aid the productive utilization of rainfall plots, we demonstrate their characteristics and potential pitfalls using both simulated and real data, and provide a set of practical guidelines for their proper interpretation and usage. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1679-8) contains supplementary material, which is available to authorized users. BioMed Central 2017-05-18 /pmc/articles/PMC5437519/ /pubmed/28521741 http://dx.doi.org/10.1186/s12859-017-1679-8 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Domanska, Diana
Vodák, Daniel
Lund-Andersen, Christin
Salvatore, Stefania
Hovig, Eivind
Sandve, Geir Kjetil
The rainfall plot: its motivation, characteristics and pitfalls
title The rainfall plot: its motivation, characteristics and pitfalls
title_full The rainfall plot: its motivation, characteristics and pitfalls
title_fullStr The rainfall plot: its motivation, characteristics and pitfalls
title_full_unstemmed The rainfall plot: its motivation, characteristics and pitfalls
title_short The rainfall plot: its motivation, characteristics and pitfalls
title_sort rainfall plot: its motivation, characteristics and pitfalls
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5437519/
https://www.ncbi.nlm.nih.gov/pubmed/28521741
http://dx.doi.org/10.1186/s12859-017-1679-8
work_keys_str_mv AT domanskadiana therainfallplotitsmotivationcharacteristicsandpitfalls
AT vodakdaniel therainfallplotitsmotivationcharacteristicsandpitfalls
AT lundandersenchristin therainfallplotitsmotivationcharacteristicsandpitfalls
AT salvatorestefania therainfallplotitsmotivationcharacteristicsandpitfalls
AT hovigeivind therainfallplotitsmotivationcharacteristicsandpitfalls
AT sandvegeirkjetil therainfallplotitsmotivationcharacteristicsandpitfalls
AT domanskadiana rainfallplotitsmotivationcharacteristicsandpitfalls
AT vodakdaniel rainfallplotitsmotivationcharacteristicsandpitfalls
AT lundandersenchristin rainfallplotitsmotivationcharacteristicsandpitfalls
AT salvatorestefania rainfallplotitsmotivationcharacteristicsandpitfalls
AT hovigeivind rainfallplotitsmotivationcharacteristicsandpitfalls
AT sandvegeirkjetil rainfallplotitsmotivationcharacteristicsandpitfalls