Cargando…
Identifying widespread and recurrent variants of genetic parts to improve annotation of engineered DNA sequences
Engineered plasmids have been workhorses of recombinant DNA technology for nearly half a century. Plasmids are used to clone DNA sequences encoding new genetic parts and to reprogram cells by combining these parts in new ways. Historically, many genetic parts on plasmids were copied and reused witho...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10120640/ https://www.ncbi.nlm.nih.gov/pubmed/37090600 http://dx.doi.org/10.1101/2023.04.10.536277 |
_version_ | 1785029216743456768 |
---|---|
author | McGuffie, Matthew J. Barrick, Jeffrey E. |
author_facet | McGuffie, Matthew J. Barrick, Jeffrey E. |
author_sort | McGuffie, Matthew J. |
collection | PubMed |
description | Engineered plasmids have been workhorses of recombinant DNA technology for nearly half a century. Plasmids are used to clone DNA sequences encoding new genetic parts and to reprogram cells by combining these parts in new ways. Historically, many genetic parts on plasmids were copied and reused without routinely checking their DNA sequences. With the widespread use of high-throughput DNA sequencing technologies, we now know that plasmids often contain variants of common genetic parts that differ slightly from their canonical sequences. Because the exact provenance of a genetic part on a particular plasmid is usually unknown, it is difficult to determine whether these differences arose due to mutations during plasmid construction and propagation or due to intentional editing by researchers. In either case, it is important to understand how the sequence changes alter the properties of the genetic part. We analyzed the sequences of over 50,000 engineered plasmids using depositor metadata and a metric inspired by the natural language processing field. We detected 217 uncatalogued genetic part variants that were especially widespread or were likely the result of convergent evolution or engineering. Several of these uncatalogued variants are known mutants of plasmid origins of replication or antibiotic resistance genes that are missing from current annotation databases. However, most are uncharacterized, and 3/5 of the plasmids we analyzed contained at least one of the uncatalogued variants. Our results include a list of genetic parts to prioritize for refining engineered plasmid annotation pipelines, highlight widespread variants of parts that warrant further investigation to see whether they have altered characteristics, and suggest cases where unintentional evolution of plasmid parts may be affecting the reliability and reproducibility of science. |
format | Online Article Text |
id | pubmed-10120640 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-101206402023-04-22 Identifying widespread and recurrent variants of genetic parts to improve annotation of engineered DNA sequences McGuffie, Matthew J. Barrick, Jeffrey E. bioRxiv Article Engineered plasmids have been workhorses of recombinant DNA technology for nearly half a century. Plasmids are used to clone DNA sequences encoding new genetic parts and to reprogram cells by combining these parts in new ways. Historically, many genetic parts on plasmids were copied and reused without routinely checking their DNA sequences. With the widespread use of high-throughput DNA sequencing technologies, we now know that plasmids often contain variants of common genetic parts that differ slightly from their canonical sequences. Because the exact provenance of a genetic part on a particular plasmid is usually unknown, it is difficult to determine whether these differences arose due to mutations during plasmid construction and propagation or due to intentional editing by researchers. In either case, it is important to understand how the sequence changes alter the properties of the genetic part. We analyzed the sequences of over 50,000 engineered plasmids using depositor metadata and a metric inspired by the natural language processing field. We detected 217 uncatalogued genetic part variants that were especially widespread or were likely the result of convergent evolution or engineering. Several of these uncatalogued variants are known mutants of plasmid origins of replication or antibiotic resistance genes that are missing from current annotation databases. However, most are uncharacterized, and 3/5 of the plasmids we analyzed contained at least one of the uncatalogued variants. Our results include a list of genetic parts to prioritize for refining engineered plasmid annotation pipelines, highlight widespread variants of parts that warrant further investigation to see whether they have altered characteristics, and suggest cases where unintentional evolution of plasmid parts may be affecting the reliability and reproducibility of science. Cold Spring Harbor Laboratory 2023-04-10 /pmc/articles/PMC10120640/ /pubmed/37090600 http://dx.doi.org/10.1101/2023.04.10.536277 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article McGuffie, Matthew J. Barrick, Jeffrey E. Identifying widespread and recurrent variants of genetic parts to improve annotation of engineered DNA sequences |
title | Identifying widespread and recurrent variants of genetic parts to improve annotation of engineered DNA sequences |
title_full | Identifying widespread and recurrent variants of genetic parts to improve annotation of engineered DNA sequences |
title_fullStr | Identifying widespread and recurrent variants of genetic parts to improve annotation of engineered DNA sequences |
title_full_unstemmed | Identifying widespread and recurrent variants of genetic parts to improve annotation of engineered DNA sequences |
title_short | Identifying widespread and recurrent variants of genetic parts to improve annotation of engineered DNA sequences |
title_sort | identifying widespread and recurrent variants of genetic parts to improve annotation of engineered dna sequences |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10120640/ https://www.ncbi.nlm.nih.gov/pubmed/37090600 http://dx.doi.org/10.1101/2023.04.10.536277 |
work_keys_str_mv | AT mcguffiematthewj identifyingwidespreadandrecurrentvariantsofgeneticpartstoimproveannotationofengineereddnasequences AT barrickjeffreye identifyingwidespreadandrecurrentvariantsofgeneticpartstoimproveannotationofengineereddnasequences |