Cargando…

RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections

Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-dat...

Descripción completa

Detalles Bibliográficos
Autores principales: Castro-Mondragon, Jaime Abraham, Jaeger, Sébastien, Thieffry, Denis, Thomas-Chollier, Morgane, van Helden, Jacques
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737723/
https://www.ncbi.nlm.nih.gov/pubmed/28591841
http://dx.doi.org/10.1093/nar/gkx314
_version_ 1783287575316791296
author Castro-Mondragon, Jaime Abraham
Jaeger, Sébastien
Thieffry, Denis
Thomas-Chollier, Morgane
van Helden, Jacques
author_facet Castro-Mondragon, Jaime Abraham
Jaeger, Sébastien
Thieffry, Denis
Thomas-Chollier, Morgane
van Helden, Jacques
author_sort Castro-Mondragon, Jaime Abraham
collection PubMed
description Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant TFBM collections. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools, and highlights biologically relevant variations of similar motifs. We also ran a large-scale application to cluster ∼11 000 motifs from 24 entire databases, showing that matrix-clustering correctly groups motifs belonging to the same TF families, and drastically reduced motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines.
format Online
Article
Text
id pubmed-5737723
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-57377232018-01-04 RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections Castro-Mondragon, Jaime Abraham Jaeger, Sébastien Thieffry, Denis Thomas-Chollier, Morgane van Helden, Jacques Nucleic Acids Res Methods Online Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant TFBM collections. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools, and highlights biologically relevant variations of similar motifs. We also ran a large-scale application to cluster ∼11 000 motifs from 24 entire databases, showing that matrix-clustering correctly groups motifs belonging to the same TF families, and drastically reduced motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines. Oxford University Press 2017-07-27 2017-06-07 /pmc/articles/PMC5737723/ /pubmed/28591841 http://dx.doi.org/10.1093/nar/gkx314 Text en © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Castro-Mondragon, Jaime Abraham
Jaeger, Sébastien
Thieffry, Denis
Thomas-Chollier, Morgane
van Helden, Jacques
RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections
title RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections
title_full RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections
title_fullStr RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections
title_full_unstemmed RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections
title_short RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections
title_sort rsat matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737723/
https://www.ncbi.nlm.nih.gov/pubmed/28591841
http://dx.doi.org/10.1093/nar/gkx314
work_keys_str_mv AT castromondragonjaimeabraham rsatmatrixclusteringdynamicexplorationandredundancyreductionoftranscriptionfactorbindingmotifcollections
AT jaegersebastien rsatmatrixclusteringdynamicexplorationandredundancyreductionoftranscriptionfactorbindingmotifcollections
AT thieffrydenis rsatmatrixclusteringdynamicexplorationandredundancyreductionoftranscriptionfactorbindingmotifcollections
AT thomascholliermorgane rsatmatrixclusteringdynamicexplorationandredundancyreductionoftranscriptionfactorbindingmotifcollections
AT vanheldenjacques rsatmatrixclusteringdynamicexplorationandredundancyreductionoftranscriptionfactorbindingmotifcollections