Cargando…

HypercubeME: two hundred million combinatorially complete datasets from a single experiment

MOTIVATION: Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the sin...

Descripción completa

Detalles Bibliográficos
Autores principales: Esteban, Laura A, Lonishin, Lyubov R, Bobrovskiy, Daniil M, Leleytner, Gregory, Bogatyreva, Natalya S, Kondrashov, Fyodor A, Ivankov, Dmitry N
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703787/
https://www.ncbi.nlm.nih.gov/pubmed/31742320
http://dx.doi.org/10.1093/bioinformatics/btz841
_version_ 1783616696424071168
author Esteban, Laura A
Lonishin, Lyubov R
Bobrovskiy, Daniil M
Leleytner, Gregory
Bogatyreva, Natalya S
Kondrashov, Fyodor A
Ivankov, Dmitry N
author_facet Esteban, Laura A
Lonishin, Lyubov R
Bobrovskiy, Daniil M
Leleytner, Gregory
Bogatyreva, Natalya S
Kondrashov, Fyodor A
Ivankov, Dmitry N
author_sort Esteban, Laura A
collection PubMed
description MOTIVATION: Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2(n) genotypes of an n-dimensional hypercube in genotype space forming a ‘combinatorially complete dataset’. So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets. RESULTS: We present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199 847 053 unique combinatorially complete genotype combinations of dimensionality ranging from 2 to 12. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data. AVAILABILITY AND IMPLEMENTATION: https://github.com/ivankovlab/HypercubeME.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7703787
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-77037872020-12-07 HypercubeME: two hundred million combinatorially complete datasets from a single experiment Esteban, Laura A Lonishin, Lyubov R Bobrovskiy, Daniil M Leleytner, Gregory Bogatyreva, Natalya S Kondrashov, Fyodor A Ivankov, Dmitry N Bioinformatics Applications Note MOTIVATION: Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2(n) genotypes of an n-dimensional hypercube in genotype space forming a ‘combinatorially complete dataset’. So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets. RESULTS: We present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199 847 053 unique combinatorially complete genotype combinations of dimensionality ranging from 2 to 12. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data. AVAILABILITY AND IMPLEMENTATION: https://github.com/ivankovlab/HypercubeME.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-03-15 2019-11-19 /pmc/articles/PMC7703787/ /pubmed/31742320 http://dx.doi.org/10.1093/bioinformatics/btz841 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Applications Note
Esteban, Laura A
Lonishin, Lyubov R
Bobrovskiy, Daniil M
Leleytner, Gregory
Bogatyreva, Natalya S
Kondrashov, Fyodor A
Ivankov, Dmitry N
HypercubeME: two hundred million combinatorially complete datasets from a single experiment
title HypercubeME: two hundred million combinatorially complete datasets from a single experiment
title_full HypercubeME: two hundred million combinatorially complete datasets from a single experiment
title_fullStr HypercubeME: two hundred million combinatorially complete datasets from a single experiment
title_full_unstemmed HypercubeME: two hundred million combinatorially complete datasets from a single experiment
title_short HypercubeME: two hundred million combinatorially complete datasets from a single experiment
title_sort hypercubeme: two hundred million combinatorially complete datasets from a single experiment
topic Applications Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7703787/
https://www.ncbi.nlm.nih.gov/pubmed/31742320
http://dx.doi.org/10.1093/bioinformatics/btz841
work_keys_str_mv AT estebanlauraa hypercubemetwohundredmillioncombinatoriallycompletedatasetsfromasingleexperiment
AT lonishinlyubovr hypercubemetwohundredmillioncombinatoriallycompletedatasetsfromasingleexperiment
AT bobrovskiydaniilm hypercubemetwohundredmillioncombinatoriallycompletedatasetsfromasingleexperiment
AT leleytnergregory hypercubemetwohundredmillioncombinatoriallycompletedatasetsfromasingleexperiment
AT bogatyrevanatalyas hypercubemetwohundredmillioncombinatoriallycompletedatasetsfromasingleexperiment
AT kondrashovfyodora hypercubemetwohundredmillioncombinatoriallycompletedatasetsfromasingleexperiment
AT ivankovdmitryn hypercubemetwohundredmillioncombinatoriallycompletedatasetsfromasingleexperiment