Cargando…
Personalized and graph genomes reveal missing signal in epigenomic data
BACKGROUND: Epigenomic studies that use next generation sequencing experiments typically rely on the alignment of reads to a reference sequence. However, because of genetic diversity and the diploid nature of the human genome, we hypothesize that using a generic reference could lead to incorrectly m...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7249353/ https://www.ncbi.nlm.nih.gov/pubmed/32450900 http://dx.doi.org/10.1186/s13059-020-02038-8 |
_version_ | 1783538573928038400 |
---|---|
author | Groza, Cristian Kwan, Tony Soranzo, Nicole Pastinen, Tomi Bourque, Guillaume |
author_facet | Groza, Cristian Kwan, Tony Soranzo, Nicole Pastinen, Tomi Bourque, Guillaume |
author_sort | Groza, Cristian |
collection | PubMed |
description | BACKGROUND: Epigenomic studies that use next generation sequencing experiments typically rely on the alignment of reads to a reference sequence. However, because of genetic diversity and the diploid nature of the human genome, we hypothesize that using a generic reference could lead to incorrectly mapped reads and bias downstream results. RESULTS: We show that accounting for genetic variation using a modified reference genome or a de novo assembled genome can alter histone H3K4me1 and H3K27ac ChIP-seq peak calls either by creating new personal peaks or by the loss of reference peaks. Using permissive cutoffs, modified reference genomes are found to alter approximately 1% of peak calls while de novo assembled genomes alter up to 5% of peaks. We also show statistically significant differences in the amount of reads observed in regions associated with the new, altered, and unchanged peaks. We report that short insertions and deletions (indels), followed by single nucleotide variants (SNVs), have the highest probability of modifying peak calls. We show that using a graph personalized genome represents a reasonable compromise between modified reference genomes and de novo assembled genomes. We demonstrate that altered peaks have a genomic distribution typical of other peaks. CONCLUSIONS: Analyzing epigenomic datasets with personalized and graph genomes allows the recovery of new peaks enriched for indels and SNVs. These altered peaks are more likely to differ between individuals and, as such, could be relevant in the study of various human phenotypes. |
format | Online Article Text |
id | pubmed-7249353 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-72493532020-06-04 Personalized and graph genomes reveal missing signal in epigenomic data Groza, Cristian Kwan, Tony Soranzo, Nicole Pastinen, Tomi Bourque, Guillaume Genome Biol Research BACKGROUND: Epigenomic studies that use next generation sequencing experiments typically rely on the alignment of reads to a reference sequence. However, because of genetic diversity and the diploid nature of the human genome, we hypothesize that using a generic reference could lead to incorrectly mapped reads and bias downstream results. RESULTS: We show that accounting for genetic variation using a modified reference genome or a de novo assembled genome can alter histone H3K4me1 and H3K27ac ChIP-seq peak calls either by creating new personal peaks or by the loss of reference peaks. Using permissive cutoffs, modified reference genomes are found to alter approximately 1% of peak calls while de novo assembled genomes alter up to 5% of peaks. We also show statistically significant differences in the amount of reads observed in regions associated with the new, altered, and unchanged peaks. We report that short insertions and deletions (indels), followed by single nucleotide variants (SNVs), have the highest probability of modifying peak calls. We show that using a graph personalized genome represents a reasonable compromise between modified reference genomes and de novo assembled genomes. We demonstrate that altered peaks have a genomic distribution typical of other peaks. CONCLUSIONS: Analyzing epigenomic datasets with personalized and graph genomes allows the recovery of new peaks enriched for indels and SNVs. These altered peaks are more likely to differ between individuals and, as such, could be relevant in the study of various human phenotypes. BioMed Central 2020-05-25 /pmc/articles/PMC7249353/ /pubmed/32450900 http://dx.doi.org/10.1186/s13059-020-02038-8 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Groza, Cristian Kwan, Tony Soranzo, Nicole Pastinen, Tomi Bourque, Guillaume Personalized and graph genomes reveal missing signal in epigenomic data |
title | Personalized and graph genomes reveal missing signal in epigenomic data |
title_full | Personalized and graph genomes reveal missing signal in epigenomic data |
title_fullStr | Personalized and graph genomes reveal missing signal in epigenomic data |
title_full_unstemmed | Personalized and graph genomes reveal missing signal in epigenomic data |
title_short | Personalized and graph genomes reveal missing signal in epigenomic data |
title_sort | personalized and graph genomes reveal missing signal in epigenomic data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7249353/ https://www.ncbi.nlm.nih.gov/pubmed/32450900 http://dx.doi.org/10.1186/s13059-020-02038-8 |
work_keys_str_mv | AT grozacristian personalizedandgraphgenomesrevealmissingsignalinepigenomicdata AT kwantony personalizedandgraphgenomesrevealmissingsignalinepigenomicdata AT soranzonicole personalizedandgraphgenomesrevealmissingsignalinepigenomicdata AT pastinentomi personalizedandgraphgenomesrevealmissingsignalinepigenomicdata AT bourqueguillaume personalizedandgraphgenomesrevealmissingsignalinepigenomicdata |