Cargando…

GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs

Transcription factors (TFs) are proteins that promote or reduce the expression of genes by binding short genomic DNA sequences known as transcription factor binding sites (TFBS). While several tools have been developed to scan for potential occurrences of TFBS in linear DNA sequences or reference ge...

Descripción completa

Detalles Bibliográficos
Autores principales: Tognon, Manuel, Bonnici, Vincenzo, Garrison, Erik, Giugno, Rosalba, Pinello, Luca
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8519448/
https://www.ncbi.nlm.nih.gov/pubmed/34570769
http://dx.doi.org/10.1371/journal.pcbi.1009444
_version_ 1784584451676700672
author Tognon, Manuel
Bonnici, Vincenzo
Garrison, Erik
Giugno, Rosalba
Pinello, Luca
author_facet Tognon, Manuel
Bonnici, Vincenzo
Garrison, Erik
Giugno, Rosalba
Pinello, Luca
author_sort Tognon, Manuel
collection PubMed
description Transcription factors (TFs) are proteins that promote or reduce the expression of genes by binding short genomic DNA sequences known as transcription factor binding sites (TFBS). While several tools have been developed to scan for potential occurrences of TFBS in linear DNA sequences or reference genomes, no tool exists to find them in pangenome variation graphs (VGs). VGs are sequence-labelled graphs that can efficiently encode collections of genomes and their variants in a single, compact data structure. Because VGs can losslessly compress large pangenomes, TFBS scanning in VGs can efficiently capture how genomic variation affects the potential binding landscape of TFs in a population of individuals. Here we present GRAFIMO (GRAph-based Finding of Individual Motif Occurrences), a command-line tool for the scanning of known TF DNA motifs represented as Position Weight Matrices (PWMs) in VGs. GRAFIMO extends the standard PWM scanning procedure by considering variations and alternative haplotypes encoded in a VG. Using GRAFIMO on a VG based on individuals from the 1000 Genomes project we recover several potential binding sites that are enhanced, weakened or missed when scanning only the reference genome, and which could constitute individual-specific binding events. GRAFIMO is available as an open-source tool, under the MIT license, at https://github.com/pinellolab/GRAFIMO and https://github.com/InfOmics/GRAFIMO.
format Online
Article
Text
id pubmed-8519448
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-85194482021-10-16 GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs Tognon, Manuel Bonnici, Vincenzo Garrison, Erik Giugno, Rosalba Pinello, Luca PLoS Comput Biol Research Article Transcription factors (TFs) are proteins that promote or reduce the expression of genes by binding short genomic DNA sequences known as transcription factor binding sites (TFBS). While several tools have been developed to scan for potential occurrences of TFBS in linear DNA sequences or reference genomes, no tool exists to find them in pangenome variation graphs (VGs). VGs are sequence-labelled graphs that can efficiently encode collections of genomes and their variants in a single, compact data structure. Because VGs can losslessly compress large pangenomes, TFBS scanning in VGs can efficiently capture how genomic variation affects the potential binding landscape of TFs in a population of individuals. Here we present GRAFIMO (GRAph-based Finding of Individual Motif Occurrences), a command-line tool for the scanning of known TF DNA motifs represented as Position Weight Matrices (PWMs) in VGs. GRAFIMO extends the standard PWM scanning procedure by considering variations and alternative haplotypes encoded in a VG. Using GRAFIMO on a VG based on individuals from the 1000 Genomes project we recover several potential binding sites that are enhanced, weakened or missed when scanning only the reference genome, and which could constitute individual-specific binding events. GRAFIMO is available as an open-source tool, under the MIT license, at https://github.com/pinellolab/GRAFIMO and https://github.com/InfOmics/GRAFIMO. Public Library of Science 2021-09-27 /pmc/articles/PMC8519448/ /pubmed/34570769 http://dx.doi.org/10.1371/journal.pcbi.1009444 Text en © 2021 Tognon et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Tognon, Manuel
Bonnici, Vincenzo
Garrison, Erik
Giugno, Rosalba
Pinello, Luca
GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs
title GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs
title_full GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs
title_fullStr GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs
title_full_unstemmed GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs
title_short GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs
title_sort grafimo: variant and haplotype aware motif scanning on pangenome graphs
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8519448/
https://www.ncbi.nlm.nih.gov/pubmed/34570769
http://dx.doi.org/10.1371/journal.pcbi.1009444
work_keys_str_mv AT tognonmanuel grafimovariantandhaplotypeawaremotifscanningonpangenomegraphs
AT bonnicivincenzo grafimovariantandhaplotypeawaremotifscanningonpangenomegraphs
AT garrisonerik grafimovariantandhaplotypeawaremotifscanningonpangenomegraphs
AT giugnorosalba grafimovariantandhaplotypeawaremotifscanningonpangenomegraphs
AT pinelloluca grafimovariantandhaplotypeawaremotifscanningonpangenomegraphs