Cargando…

SuperDCA for genome-wide epistasis analysis

The potential for genome-wide modelling of epistasis has recently surfaced given the possibility of sequencing densely sampled populations and the emerging families of statistical interaction models. Direct coupling analysis (DCA) has previously been shown to yield valuable predictions for single pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Puranen, Santeri, Pesonen, Maiju, Pensar, Johan, Xu, Ying Ying, Lees, John A., Bentley, Stephen D., Croucher, Nicholas J., Corander, Jukka
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Microbiology Society 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6096938/
https://www.ncbi.nlm.nih.gov/pubmed/29813016
http://dx.doi.org/10.1099/mgen.0.000184
_version_ 1783348201402662912
author Puranen, Santeri
Pesonen, Maiju
Pensar, Johan
Xu, Ying Ying
Lees, John A.
Bentley, Stephen D.
Croucher, Nicholas J.
Corander, Jukka
author_facet Puranen, Santeri
Pesonen, Maiju
Pensar, Johan
Xu, Ying Ying
Lees, John A.
Bentley, Stephen D.
Croucher, Nicholas J.
Corander, Jukka
author_sort Puranen, Santeri
collection PubMed
description The potential for genome-wide modelling of epistasis has recently surfaced given the possibility of sequencing densely sampled populations and the emerging families of statistical interaction models. Direct coupling analysis (DCA) has previously been shown to yield valuable predictions for single protein structures, and has recently been extended to genome-wide analysis of bacteria, identifying novel interactions in the co-evolution between resistance, virulence and core genome elements. However, earlier computational DCA methods have not been scalable to enable model fitting simultaneously to 10(4)–10(5) polymorphisms, representing the amount of core genomic variation observed in analyses of many bacterial species. Here, we introduce a novel inference method (SuperDCA) that employs a new scoring principle, efficient parallelization, optimization and filtering on phylogenetic information to achieve scalability for up to 10(5) polymorphisms. Using two large population samples of Streptococcus pneumoniae, we demonstrate the ability of SuperDCA to make additional significant biological findings about this major human pathogen. We also show that our method can uncover signals of selection that are not detectable by genome-wide association analysis, even though our analysis does not require phenotypic measurements. SuperDCA, thus, holds considerable potential in building understanding about numerous organisms at a systems biological level.
format Online
Article
Text
id pubmed-6096938
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Microbiology Society
record_format MEDLINE/PubMed
spelling pubmed-60969382018-08-20 SuperDCA for genome-wide epistasis analysis Puranen, Santeri Pesonen, Maiju Pensar, Johan Xu, Ying Ying Lees, John A. Bentley, Stephen D. Croucher, Nicholas J. Corander, Jukka Microb Genom Research Article The potential for genome-wide modelling of epistasis has recently surfaced given the possibility of sequencing densely sampled populations and the emerging families of statistical interaction models. Direct coupling analysis (DCA) has previously been shown to yield valuable predictions for single protein structures, and has recently been extended to genome-wide analysis of bacteria, identifying novel interactions in the co-evolution between resistance, virulence and core genome elements. However, earlier computational DCA methods have not been scalable to enable model fitting simultaneously to 10(4)–10(5) polymorphisms, representing the amount of core genomic variation observed in analyses of many bacterial species. Here, we introduce a novel inference method (SuperDCA) that employs a new scoring principle, efficient parallelization, optimization and filtering on phylogenetic information to achieve scalability for up to 10(5) polymorphisms. Using two large population samples of Streptococcus pneumoniae, we demonstrate the ability of SuperDCA to make additional significant biological findings about this major human pathogen. We also show that our method can uncover signals of selection that are not detectable by genome-wide association analysis, even though our analysis does not require phenotypic measurements. SuperDCA, thus, holds considerable potential in building understanding about numerous organisms at a systems biological level. Microbiology Society 2018-05-29 /pmc/articles/PMC6096938/ /pubmed/29813016 http://dx.doi.org/10.1099/mgen.0.000184 Text en © 2018 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Puranen, Santeri
Pesonen, Maiju
Pensar, Johan
Xu, Ying Ying
Lees, John A.
Bentley, Stephen D.
Croucher, Nicholas J.
Corander, Jukka
SuperDCA for genome-wide epistasis analysis
title SuperDCA for genome-wide epistasis analysis
title_full SuperDCA for genome-wide epistasis analysis
title_fullStr SuperDCA for genome-wide epistasis analysis
title_full_unstemmed SuperDCA for genome-wide epistasis analysis
title_short SuperDCA for genome-wide epistasis analysis
title_sort superdca for genome-wide epistasis analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6096938/
https://www.ncbi.nlm.nih.gov/pubmed/29813016
http://dx.doi.org/10.1099/mgen.0.000184
work_keys_str_mv AT puranensanteri superdcaforgenomewideepistasisanalysis
AT pesonenmaiju superdcaforgenomewideepistasisanalysis
AT pensarjohan superdcaforgenomewideepistasisanalysis
AT xuyingying superdcaforgenomewideepistasisanalysis
AT leesjohna superdcaforgenomewideepistasisanalysis
AT bentleystephend superdcaforgenomewideepistasisanalysis
AT crouchernicholasj superdcaforgenomewideepistasisanalysis
AT coranderjukka superdcaforgenomewideepistasisanalysis