Cargando…

ARPEGGIO: Automated Reproducible Polyploid EpiGenetic GuIdance workflOw

BACKGROUND: Whole genome duplication (WGD) events are common in the evolutionary history of many living organisms. For decades, researchers have been trying to understand the genetic and epigenetic impact of WGD and its underlying molecular mechanisms. Particular attention was given to allopolyploid...

Descripción completa

Detalles Bibliográficos
Autores principales: Milosavljevic, Stefan, Kuo, Tony, Decarli, Samuele, Mohn, Lucas, Sese, Jun, Shimizu, Kentaro K., Shimizu-Inatsugi, Rie, Robinson, Mark D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8285871/
https://www.ncbi.nlm.nih.gov/pubmed/34273949
http://dx.doi.org/10.1186/s12864-021-07845-2
_version_ 1783723635616251904
author Milosavljevic, Stefan
Kuo, Tony
Decarli, Samuele
Mohn, Lucas
Sese, Jun
Shimizu, Kentaro K.
Shimizu-Inatsugi, Rie
Robinson, Mark D.
author_facet Milosavljevic, Stefan
Kuo, Tony
Decarli, Samuele
Mohn, Lucas
Sese, Jun
Shimizu, Kentaro K.
Shimizu-Inatsugi, Rie
Robinson, Mark D.
author_sort Milosavljevic, Stefan
collection PubMed
description BACKGROUND: Whole genome duplication (WGD) events are common in the evolutionary history of many living organisms. For decades, researchers have been trying to understand the genetic and epigenetic impact of WGD and its underlying molecular mechanisms. Particular attention was given to allopolyploid study systems, species resulting from an hybridization event accompanied by WGD. Investigating the mechanisms behind the survival of a newly formed allopolyploid highlighted the key role of DNA methylation. With the improvement of high-throughput methods, such as whole genome bisulfite sequencing (WGBS), an opportunity opened to further understand the role of DNA methylation at a larger scale and higher resolution. However, only a few studies have applied WGBS to allopolyploids, which might be due to lack of genomic resources combined with a burdensome data analysis process. To overcome these problems, we developed the Automated Reproducible Polyploid EpiGenetic GuIdance workflOw (ARPEGGIO): the first workflow for the analysis of epigenetic data in polyploids. This workflow analyzes WGBS data from allopolyploid species via the genome assemblies of the allopolyploid’s parent species. ARPEGGIO utilizes an updated read classification algorithm (EAGLE-RC), to tackle the challenge of sequence similarity amongst parental genomes. ARPEGGIO offers automation, but more importantly, a complete set of analyses including spot checks starting from raw WGBS data: quality checks, trimming, alignment, methylation extraction, statistical analyses and downstream analyses. A full run of ARPEGGIO outputs a list of genes showing differential methylation. ARPEGGIO was made simple to set up, run and interpret, and its implementation ensures reproducibility by including both package management and containerization. RESULTS: We evaluated ARPEGGIO in two ways. First, we tested EAGLE-RC’s performance with publicly available datasets given a ground truth, and we show that EAGLE-RC decreases the error rate by 3 to 4 times compared to standard approaches. Second, using the same initial dataset, we show agreement between ARPEGGIO’s output and published results. Compared to other similar workflows, ARPEGGIO is the only one supporting polyploid data. CONCLUSIONS: The goal of ARPEGGIO is to promote, support and improve polyploid research with a reproducible and automated set of analyses in a convenient implementation. ARPEGGIO is available at https://github.com/supermaxiste/ARPEGGIO. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-021-07845-2.
format Online
Article
Text
id pubmed-8285871
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-82858712021-07-19 ARPEGGIO: Automated Reproducible Polyploid EpiGenetic GuIdance workflOw Milosavljevic, Stefan Kuo, Tony Decarli, Samuele Mohn, Lucas Sese, Jun Shimizu, Kentaro K. Shimizu-Inatsugi, Rie Robinson, Mark D. BMC Genomics Software BACKGROUND: Whole genome duplication (WGD) events are common in the evolutionary history of many living organisms. For decades, researchers have been trying to understand the genetic and epigenetic impact of WGD and its underlying molecular mechanisms. Particular attention was given to allopolyploid study systems, species resulting from an hybridization event accompanied by WGD. Investigating the mechanisms behind the survival of a newly formed allopolyploid highlighted the key role of DNA methylation. With the improvement of high-throughput methods, such as whole genome bisulfite sequencing (WGBS), an opportunity opened to further understand the role of DNA methylation at a larger scale and higher resolution. However, only a few studies have applied WGBS to allopolyploids, which might be due to lack of genomic resources combined with a burdensome data analysis process. To overcome these problems, we developed the Automated Reproducible Polyploid EpiGenetic GuIdance workflOw (ARPEGGIO): the first workflow for the analysis of epigenetic data in polyploids. This workflow analyzes WGBS data from allopolyploid species via the genome assemblies of the allopolyploid’s parent species. ARPEGGIO utilizes an updated read classification algorithm (EAGLE-RC), to tackle the challenge of sequence similarity amongst parental genomes. ARPEGGIO offers automation, but more importantly, a complete set of analyses including spot checks starting from raw WGBS data: quality checks, trimming, alignment, methylation extraction, statistical analyses and downstream analyses. A full run of ARPEGGIO outputs a list of genes showing differential methylation. ARPEGGIO was made simple to set up, run and interpret, and its implementation ensures reproducibility by including both package management and containerization. RESULTS: We evaluated ARPEGGIO in two ways. First, we tested EAGLE-RC’s performance with publicly available datasets given a ground truth, and we show that EAGLE-RC decreases the error rate by 3 to 4 times compared to standard approaches. Second, using the same initial dataset, we show agreement between ARPEGGIO’s output and published results. Compared to other similar workflows, ARPEGGIO is the only one supporting polyploid data. CONCLUSIONS: The goal of ARPEGGIO is to promote, support and improve polyploid research with a reproducible and automated set of analyses in a convenient implementation. ARPEGGIO is available at https://github.com/supermaxiste/ARPEGGIO. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-021-07845-2. BioMed Central 2021-07-17 /pmc/articles/PMC8285871/ /pubmed/34273949 http://dx.doi.org/10.1186/s12864-021-07845-2 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Milosavljevic, Stefan
Kuo, Tony
Decarli, Samuele
Mohn, Lucas
Sese, Jun
Shimizu, Kentaro K.
Shimizu-Inatsugi, Rie
Robinson, Mark D.
ARPEGGIO: Automated Reproducible Polyploid EpiGenetic GuIdance workflOw
title ARPEGGIO: Automated Reproducible Polyploid EpiGenetic GuIdance workflOw
title_full ARPEGGIO: Automated Reproducible Polyploid EpiGenetic GuIdance workflOw
title_fullStr ARPEGGIO: Automated Reproducible Polyploid EpiGenetic GuIdance workflOw
title_full_unstemmed ARPEGGIO: Automated Reproducible Polyploid EpiGenetic GuIdance workflOw
title_short ARPEGGIO: Automated Reproducible Polyploid EpiGenetic GuIdance workflOw
title_sort arpeggio: automated reproducible polyploid epigenetic guidance workflow
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8285871/
https://www.ncbi.nlm.nih.gov/pubmed/34273949
http://dx.doi.org/10.1186/s12864-021-07845-2
work_keys_str_mv AT milosavljevicstefan arpeggioautomatedreproduciblepolyploidepigeneticguidanceworkflow
AT kuotony arpeggioautomatedreproduciblepolyploidepigeneticguidanceworkflow
AT decarlisamuele arpeggioautomatedreproduciblepolyploidepigeneticguidanceworkflow
AT mohnlucas arpeggioautomatedreproduciblepolyploidepigeneticguidanceworkflow
AT sesejun arpeggioautomatedreproduciblepolyploidepigeneticguidanceworkflow
AT shimizukentarok arpeggioautomatedreproduciblepolyploidepigeneticguidanceworkflow
AT shimizuinatsugirie arpeggioautomatedreproduciblepolyploidepigeneticguidanceworkflow
AT robinsonmarkd arpeggioautomatedreproduciblepolyploidepigeneticguidanceworkflow