Cargando…

Pacybara: Accurate long-read sequencing for barcoded mutagenized allelic libraries

Long read sequencing technologies, an attractive solution for many applications, usually suffer from higher error rates. Alignment of multiple reads can improve base-calling accuracy, but some applications, e.g. the sequencing of mutagenized libraries where multiple distinct clones differ by one or...

Descripción completa

Detalles Bibliográficos
Autores principales: Weile, Jochen, Cote, Atina G., Kishore, Nishka, Tabet, Daniel, van Loggerenberg, Warren, Rayhan, Ashyad, Roth, Frederick P
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9980134/
https://www.ncbi.nlm.nih.gov/pubmed/36865234
http://dx.doi.org/10.1101/2023.02.22.529427
_version_ 1784899855511977984
author Weile, Jochen
Cote, Atina G.
Kishore, Nishka
Tabet, Daniel
van Loggerenberg, Warren
Rayhan, Ashyad
Roth, Frederick P
author_facet Weile, Jochen
Cote, Atina G.
Kishore, Nishka
Tabet, Daniel
van Loggerenberg, Warren
Rayhan, Ashyad
Roth, Frederick P
author_sort Weile, Jochen
collection PubMed
description Long read sequencing technologies, an attractive solution for many applications, usually suffer from higher error rates. Alignment of multiple reads can improve base-calling accuracy, but some applications, e.g. the sequencing of mutagenized libraries where multiple distinct clones differ by one or few variants, require the use of barcodes or unique molecular identifiers. Unfortunately, not only can sequencing errors interfere with correct barcode identification, but a given barcode sequence may be linked to multiple independent clones within a given library. Here we focus on the target application of sequencing mutagenized libraries in the context of multiplexed assays of variant effects (MAVEs). MAVEs are increasingly used to create comprehensive genotype-phenotype maps that can aid clinical variant interpretation. Many MAVE methods use barcoded mutant libraries and thus require the accurate association of barcode with genotype, e.g. using long-read sequencing. Existing pipelines do not account for inaccurate sequencing or non-unique barcodes. Here, we describe Pacybara, which handles these issues by clustering long reads based on the similarities of (error-prone) barcodes while detecting the association of a single barcode with multiple genotypes. Pacybara also detects recombinant (chimeric) clones and reduces false positive indel calls. In an example application, we show that Pacybara increases the sensitivity of a MAVE-derived missense variant effect map.
format Online
Article
Text
id pubmed-9980134
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-99801342023-03-03 Pacybara: Accurate long-read sequencing for barcoded mutagenized allelic libraries Weile, Jochen Cote, Atina G. Kishore, Nishka Tabet, Daniel van Loggerenberg, Warren Rayhan, Ashyad Roth, Frederick P bioRxiv Article Long read sequencing technologies, an attractive solution for many applications, usually suffer from higher error rates. Alignment of multiple reads can improve base-calling accuracy, but some applications, e.g. the sequencing of mutagenized libraries where multiple distinct clones differ by one or few variants, require the use of barcodes or unique molecular identifiers. Unfortunately, not only can sequencing errors interfere with correct barcode identification, but a given barcode sequence may be linked to multiple independent clones within a given library. Here we focus on the target application of sequencing mutagenized libraries in the context of multiplexed assays of variant effects (MAVEs). MAVEs are increasingly used to create comprehensive genotype-phenotype maps that can aid clinical variant interpretation. Many MAVE methods use barcoded mutant libraries and thus require the accurate association of barcode with genotype, e.g. using long-read sequencing. Existing pipelines do not account for inaccurate sequencing or non-unique barcodes. Here, we describe Pacybara, which handles these issues by clustering long reads based on the similarities of (error-prone) barcodes while detecting the association of a single barcode with multiple genotypes. Pacybara also detects recombinant (chimeric) clones and reduces false positive indel calls. In an example application, we show that Pacybara increases the sensitivity of a MAVE-derived missense variant effect map. Cold Spring Harbor Laboratory 2023-02-23 /pmc/articles/PMC9980134/ /pubmed/36865234 http://dx.doi.org/10.1101/2023.02.22.529427 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Weile, Jochen
Cote, Atina G.
Kishore, Nishka
Tabet, Daniel
van Loggerenberg, Warren
Rayhan, Ashyad
Roth, Frederick P
Pacybara: Accurate long-read sequencing for barcoded mutagenized allelic libraries
title Pacybara: Accurate long-read sequencing for barcoded mutagenized allelic libraries
title_full Pacybara: Accurate long-read sequencing for barcoded mutagenized allelic libraries
title_fullStr Pacybara: Accurate long-read sequencing for barcoded mutagenized allelic libraries
title_full_unstemmed Pacybara: Accurate long-read sequencing for barcoded mutagenized allelic libraries
title_short Pacybara: Accurate long-read sequencing for barcoded mutagenized allelic libraries
title_sort pacybara: accurate long-read sequencing for barcoded mutagenized allelic libraries
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9980134/
https://www.ncbi.nlm.nih.gov/pubmed/36865234
http://dx.doi.org/10.1101/2023.02.22.529427
work_keys_str_mv AT weilejochen pacybaraaccuratelongreadsequencingforbarcodedmutagenizedalleliclibraries
AT coteatinag pacybaraaccuratelongreadsequencingforbarcodedmutagenizedalleliclibraries
AT kishorenishka pacybaraaccuratelongreadsequencingforbarcodedmutagenizedalleliclibraries
AT tabetdaniel pacybaraaccuratelongreadsequencingforbarcodedmutagenizedalleliclibraries
AT vanloggerenbergwarren pacybaraaccuratelongreadsequencingforbarcodedmutagenizedalleliclibraries
AT rayhanashyad pacybaraaccuratelongreadsequencingforbarcodedmutagenizedalleliclibraries
AT rothfrederickp pacybaraaccuratelongreadsequencingforbarcodedmutagenizedalleliclibraries