Cargando…

Personalized and graph genomes reveal missing signal in epigenomic data

BACKGROUND: Epigenomic studies that use next generation sequencing experiments typically rely on the alignment of reads to a reference sequence. However, because of genetic diversity and the diploid nature of the human genome, we hypothesize that using a generic reference could lead to incorrectly m...

Descripción completa

Detalles Bibliográficos
Autores principales: Groza, Cristian, Kwan, Tony, Soranzo, Nicole, Pastinen, Tomi, Bourque, Guillaume
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7249353/
https://www.ncbi.nlm.nih.gov/pubmed/32450900
http://dx.doi.org/10.1186/s13059-020-02038-8
_version_ 1783538573928038400
author Groza, Cristian
Kwan, Tony
Soranzo, Nicole
Pastinen, Tomi
Bourque, Guillaume
author_facet Groza, Cristian
Kwan, Tony
Soranzo, Nicole
Pastinen, Tomi
Bourque, Guillaume
author_sort Groza, Cristian
collection PubMed
description BACKGROUND: Epigenomic studies that use next generation sequencing experiments typically rely on the alignment of reads to a reference sequence. However, because of genetic diversity and the diploid nature of the human genome, we hypothesize that using a generic reference could lead to incorrectly mapped reads and bias downstream results. RESULTS: We show that accounting for genetic variation using a modified reference genome or a de novo assembled genome can alter histone H3K4me1 and H3K27ac ChIP-seq peak calls either by creating new personal peaks or by the loss of reference peaks. Using permissive cutoffs, modified reference genomes are found to alter approximately 1% of peak calls while de novo assembled genomes alter up to 5% of peaks. We also show statistically significant differences in the amount of reads observed in regions associated with the new, altered, and unchanged peaks. We report that short insertions and deletions (indels), followed by single nucleotide variants (SNVs), have the highest probability of modifying peak calls. We show that using a graph personalized genome represents a reasonable compromise between modified reference genomes and de novo assembled genomes. We demonstrate that altered peaks have a genomic distribution typical of other peaks. CONCLUSIONS: Analyzing epigenomic datasets with personalized and graph genomes allows the recovery of new peaks enriched for indels and SNVs. These altered peaks are more likely to differ between individuals and, as such, could be relevant in the study of various human phenotypes.
format Online
Article
Text
id pubmed-7249353
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-72493532020-06-04 Personalized and graph genomes reveal missing signal in epigenomic data Groza, Cristian Kwan, Tony Soranzo, Nicole Pastinen, Tomi Bourque, Guillaume Genome Biol Research BACKGROUND: Epigenomic studies that use next generation sequencing experiments typically rely on the alignment of reads to a reference sequence. However, because of genetic diversity and the diploid nature of the human genome, we hypothesize that using a generic reference could lead to incorrectly mapped reads and bias downstream results. RESULTS: We show that accounting for genetic variation using a modified reference genome or a de novo assembled genome can alter histone H3K4me1 and H3K27ac ChIP-seq peak calls either by creating new personal peaks or by the loss of reference peaks. Using permissive cutoffs, modified reference genomes are found to alter approximately 1% of peak calls while de novo assembled genomes alter up to 5% of peaks. We also show statistically significant differences in the amount of reads observed in regions associated with the new, altered, and unchanged peaks. We report that short insertions and deletions (indels), followed by single nucleotide variants (SNVs), have the highest probability of modifying peak calls. We show that using a graph personalized genome represents a reasonable compromise between modified reference genomes and de novo assembled genomes. We demonstrate that altered peaks have a genomic distribution typical of other peaks. CONCLUSIONS: Analyzing epigenomic datasets with personalized and graph genomes allows the recovery of new peaks enriched for indels and SNVs. These altered peaks are more likely to differ between individuals and, as such, could be relevant in the study of various human phenotypes. BioMed Central 2020-05-25 /pmc/articles/PMC7249353/ /pubmed/32450900 http://dx.doi.org/10.1186/s13059-020-02038-8 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Groza, Cristian
Kwan, Tony
Soranzo, Nicole
Pastinen, Tomi
Bourque, Guillaume
Personalized and graph genomes reveal missing signal in epigenomic data
title Personalized and graph genomes reveal missing signal in epigenomic data
title_full Personalized and graph genomes reveal missing signal in epigenomic data
title_fullStr Personalized and graph genomes reveal missing signal in epigenomic data
title_full_unstemmed Personalized and graph genomes reveal missing signal in epigenomic data
title_short Personalized and graph genomes reveal missing signal in epigenomic data
title_sort personalized and graph genomes reveal missing signal in epigenomic data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7249353/
https://www.ncbi.nlm.nih.gov/pubmed/32450900
http://dx.doi.org/10.1186/s13059-020-02038-8
work_keys_str_mv AT grozacristian personalizedandgraphgenomesrevealmissingsignalinepigenomicdata
AT kwantony personalizedandgraphgenomesrevealmissingsignalinepigenomicdata
AT soranzonicole personalizedandgraphgenomesrevealmissingsignalinepigenomicdata
AT pastinentomi personalizedandgraphgenomesrevealmissingsignalinepigenomicdata
AT bourqueguillaume personalizedandgraphgenomesrevealmissingsignalinepigenomicdata