Cargando…

Reconstruction of evolving gene variants and fitness from short sequencing reads

Directed evolution can generate proteins with tailor-made activities. However, full-length genotypes, their frequencies, and fitnesses are difficult to measure for evolving gene-length biomolecules using most high-throughput DNA sequencing methods as short read lengths can lose mutation linkages in...

Descripción completa

Detalles Bibliográficos
Autores principales: Shen, Max W., Zhao, Kevin T., Liu, David R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8551035/
https://www.ncbi.nlm.nih.gov/pubmed/34635842
http://dx.doi.org/10.1038/s41589-021-00876-6
_version_ 1784591074394636288
author Shen, Max W.
Zhao, Kevin T.
Liu, David R.
author_facet Shen, Max W.
Zhao, Kevin T.
Liu, David R.
author_sort Shen, Max W.
collection PubMed
description Directed evolution can generate proteins with tailor-made activities. However, full-length genotypes, their frequencies, and fitnesses are difficult to measure for evolving gene-length biomolecules using most high-throughput DNA sequencing methods as short read lengths can lose mutation linkages in haplotypes. We present Evoracle, a machine learning method that accurately reconstructs full-length genotypes (R(2) = 0.94) and fitness using short-read data from directed evolution experiments, with substantial improvements over related methods. We validate Evoracle on phage-assisted continuous evolution (PACE), phage-assisted non-continuous evolution (PANCE) of adenine base editors, and OrthoRep evolution of drug-resistant enzymes. Evoracle retains strong performance (R(2) = 0.86) on data with complete linkage loss between neighboring nucleotides and large measurement noise such as pooled Sanger sequencing data (~$10/timepoint), and broadens the accessibility of training machine learning models on gene variant fitnesses. Evoracle can also identify high-fitness variants, including low-frequency ‘rising stars’, well before they are identifiable from consensus mutations.
format Online
Article
Text
id pubmed-8551035
institution National Center for Biotechnology Information
language English
publishDate 2021
record_format MEDLINE/PubMed
spelling pubmed-85510352022-04-11 Reconstruction of evolving gene variants and fitness from short sequencing reads Shen, Max W. Zhao, Kevin T. Liu, David R. Nat Chem Biol Article Directed evolution can generate proteins with tailor-made activities. However, full-length genotypes, their frequencies, and fitnesses are difficult to measure for evolving gene-length biomolecules using most high-throughput DNA sequencing methods as short read lengths can lose mutation linkages in haplotypes. We present Evoracle, a machine learning method that accurately reconstructs full-length genotypes (R(2) = 0.94) and fitness using short-read data from directed evolution experiments, with substantial improvements over related methods. We validate Evoracle on phage-assisted continuous evolution (PACE), phage-assisted non-continuous evolution (PANCE) of adenine base editors, and OrthoRep evolution of drug-resistant enzymes. Evoracle retains strong performance (R(2) = 0.86) on data with complete linkage loss between neighboring nucleotides and large measurement noise such as pooled Sanger sequencing data (~$10/timepoint), and broadens the accessibility of training machine learning models on gene variant fitnesses. Evoracle can also identify high-fitness variants, including low-frequency ‘rising stars’, well before they are identifiable from consensus mutations. 2021-10-11 2021-11 /pmc/articles/PMC8551035/ /pubmed/34635842 http://dx.doi.org/10.1038/s41589-021-00876-6 Text en Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms
spellingShingle Article
Shen, Max W.
Zhao, Kevin T.
Liu, David R.
Reconstruction of evolving gene variants and fitness from short sequencing reads
title Reconstruction of evolving gene variants and fitness from short sequencing reads
title_full Reconstruction of evolving gene variants and fitness from short sequencing reads
title_fullStr Reconstruction of evolving gene variants and fitness from short sequencing reads
title_full_unstemmed Reconstruction of evolving gene variants and fitness from short sequencing reads
title_short Reconstruction of evolving gene variants and fitness from short sequencing reads
title_sort reconstruction of evolving gene variants and fitness from short sequencing reads
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8551035/
https://www.ncbi.nlm.nih.gov/pubmed/34635842
http://dx.doi.org/10.1038/s41589-021-00876-6
work_keys_str_mv AT shenmaxw reconstructionofevolvinggenevariantsandfitnessfromshortsequencingreads
AT zhaokevint reconstructionofevolvinggenevariantsandfitnessfromshortsequencingreads
AT liudavidr reconstructionofevolvinggenevariantsandfitnessfromshortsequencingreads