Cargando…
Reconstruction of evolving gene variants and fitness from short sequencing reads
Directed evolution can generate proteins with tailor-made activities. However, full-length genotypes, their frequencies, and fitnesses are difficult to measure for evolving gene-length biomolecules using most high-throughput DNA sequencing methods as short read lengths can lose mutation linkages in...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8551035/ https://www.ncbi.nlm.nih.gov/pubmed/34635842 http://dx.doi.org/10.1038/s41589-021-00876-6 |
_version_ | 1784591074394636288 |
---|---|
author | Shen, Max W. Zhao, Kevin T. Liu, David R. |
author_facet | Shen, Max W. Zhao, Kevin T. Liu, David R. |
author_sort | Shen, Max W. |
collection | PubMed |
description | Directed evolution can generate proteins with tailor-made activities. However, full-length genotypes, their frequencies, and fitnesses are difficult to measure for evolving gene-length biomolecules using most high-throughput DNA sequencing methods as short read lengths can lose mutation linkages in haplotypes. We present Evoracle, a machine learning method that accurately reconstructs full-length genotypes (R(2) = 0.94) and fitness using short-read data from directed evolution experiments, with substantial improvements over related methods. We validate Evoracle on phage-assisted continuous evolution (PACE), phage-assisted non-continuous evolution (PANCE) of adenine base editors, and OrthoRep evolution of drug-resistant enzymes. Evoracle retains strong performance (R(2) = 0.86) on data with complete linkage loss between neighboring nucleotides and large measurement noise such as pooled Sanger sequencing data (~$10/timepoint), and broadens the accessibility of training machine learning models on gene variant fitnesses. Evoracle can also identify high-fitness variants, including low-frequency ‘rising stars’, well before they are identifiable from consensus mutations. |
format | Online Article Text |
id | pubmed-8551035 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
record_format | MEDLINE/PubMed |
spelling | pubmed-85510352022-04-11 Reconstruction of evolving gene variants and fitness from short sequencing reads Shen, Max W. Zhao, Kevin T. Liu, David R. Nat Chem Biol Article Directed evolution can generate proteins with tailor-made activities. However, full-length genotypes, their frequencies, and fitnesses are difficult to measure for evolving gene-length biomolecules using most high-throughput DNA sequencing methods as short read lengths can lose mutation linkages in haplotypes. We present Evoracle, a machine learning method that accurately reconstructs full-length genotypes (R(2) = 0.94) and fitness using short-read data from directed evolution experiments, with substantial improvements over related methods. We validate Evoracle on phage-assisted continuous evolution (PACE), phage-assisted non-continuous evolution (PANCE) of adenine base editors, and OrthoRep evolution of drug-resistant enzymes. Evoracle retains strong performance (R(2) = 0.86) on data with complete linkage loss between neighboring nucleotides and large measurement noise such as pooled Sanger sequencing data (~$10/timepoint), and broadens the accessibility of training machine learning models on gene variant fitnesses. Evoracle can also identify high-fitness variants, including low-frequency ‘rising stars’, well before they are identifiable from consensus mutations. 2021-10-11 2021-11 /pmc/articles/PMC8551035/ /pubmed/34635842 http://dx.doi.org/10.1038/s41589-021-00876-6 Text en Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms |
spellingShingle | Article Shen, Max W. Zhao, Kevin T. Liu, David R. Reconstruction of evolving gene variants and fitness from short sequencing reads |
title | Reconstruction of evolving gene variants and fitness from short sequencing reads |
title_full | Reconstruction of evolving gene variants and fitness from short sequencing reads |
title_fullStr | Reconstruction of evolving gene variants and fitness from short sequencing reads |
title_full_unstemmed | Reconstruction of evolving gene variants and fitness from short sequencing reads |
title_short | Reconstruction of evolving gene variants and fitness from short sequencing reads |
title_sort | reconstruction of evolving gene variants and fitness from short sequencing reads |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8551035/ https://www.ncbi.nlm.nih.gov/pubmed/34635842 http://dx.doi.org/10.1038/s41589-021-00876-6 |
work_keys_str_mv | AT shenmaxw reconstructionofevolvinggenevariantsandfitnessfromshortsequencingreads AT zhaokevint reconstructionofevolvinggenevariantsandfitnessfromshortsequencingreads AT liudavidr reconstructionofevolvinggenevariantsandfitnessfromshortsequencingreads |