Cargando…

Identification and quantification of chimeric sequencing reads in a highly multiplexed RAD‐seq protocol

Highly multiplexed approaches have become common in genomic studies. They have improved the cost‐effectiveness of genotyping hundreds of individuals using combinatorially barcoded adapters. These strategies, however, can potentially misassigned reads to incorrect samples. Here, we used a modified qu...

Descripción completa

Detalles Bibliográficos
Autores principales: Martin Cerezo, Maria Luisa, Raval, Rohan, de Haro Reyes, Bernardo, Kucka, Marek, Chan, Frank Yingguang, Bryk, Jarosław
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9796921/
https://www.ncbi.nlm.nih.gov/pubmed/35668693
http://dx.doi.org/10.1111/1755-0998.13661
_version_ 1784860600292081664
author Martin Cerezo, Maria Luisa
Raval, Rohan
de Haro Reyes, Bernardo
Kucka, Marek
Chan, Frank Yingguang
Bryk, Jarosław
author_facet Martin Cerezo, Maria Luisa
Raval, Rohan
de Haro Reyes, Bernardo
Kucka, Marek
Chan, Frank Yingguang
Bryk, Jarosław
author_sort Martin Cerezo, Maria Luisa
collection PubMed
description Highly multiplexed approaches have become common in genomic studies. They have improved the cost‐effectiveness of genotyping hundreds of individuals using combinatorially barcoded adapters. These strategies, however, can potentially misassigned reads to incorrect samples. Here, we used a modified quaddRAD protocol to analyse the occurrence of index hopping and PCR chimeras in a series of experiments with up to 100 multiplexed samples per sequencing lane (639 samples in total). We created two types of sequencing libraries: four libraries of type A, where PCRs were run on individual samples before multiplexing, and three libraries of type B, where PCRs were run on pooled samples. We used fixed pairs of inner barcodes to identify chimeric reads. Type B libraries show a higher percentage of misassigned reads (1.15%) than type A libraries (0.65%). We also quantify the commonly undetectable chimeric sequences that occur whenever multiplexed groups of samples with different outer barcodes are sequenced together on a single flow cell. Our results suggest that these types of chimeric sequences represent up to 1.56% and 1.29% of reads in type A and B libraries, respectively. We also show that increasing the number of mismatches allowed for barcode rescue to above 2 dramatically increases the number of recovered chimeric reads. We provide recommendations for developing highly multiplexed RAD‐seq protocols and analysing the resulting data to minimize the generation of chimeric sequences, allowing their quantification and a finer control on the number of PCR cycles necessary to generate enough input DNA for library preparation.
format Online
Article
Text
id pubmed-9796921
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-97969212023-01-04 Identification and quantification of chimeric sequencing reads in a highly multiplexed RAD‐seq protocol Martin Cerezo, Maria Luisa Raval, Rohan de Haro Reyes, Bernardo Kucka, Marek Chan, Frank Yingguang Bryk, Jarosław Mol Ecol Resour RESOURCE ARTICLES Highly multiplexed approaches have become common in genomic studies. They have improved the cost‐effectiveness of genotyping hundreds of individuals using combinatorially barcoded adapters. These strategies, however, can potentially misassigned reads to incorrect samples. Here, we used a modified quaddRAD protocol to analyse the occurrence of index hopping and PCR chimeras in a series of experiments with up to 100 multiplexed samples per sequencing lane (639 samples in total). We created two types of sequencing libraries: four libraries of type A, where PCRs were run on individual samples before multiplexing, and three libraries of type B, where PCRs were run on pooled samples. We used fixed pairs of inner barcodes to identify chimeric reads. Type B libraries show a higher percentage of misassigned reads (1.15%) than type A libraries (0.65%). We also quantify the commonly undetectable chimeric sequences that occur whenever multiplexed groups of samples with different outer barcodes are sequenced together on a single flow cell. Our results suggest that these types of chimeric sequences represent up to 1.56% and 1.29% of reads in type A and B libraries, respectively. We also show that increasing the number of mismatches allowed for barcode rescue to above 2 dramatically increases the number of recovered chimeric reads. We provide recommendations for developing highly multiplexed RAD‐seq protocols and analysing the resulting data to minimize the generation of chimeric sequences, allowing their quantification and a finer control on the number of PCR cycles necessary to generate enough input DNA for library preparation. John Wiley and Sons Inc. 2022-06-27 2022-11 /pmc/articles/PMC9796921/ /pubmed/35668693 http://dx.doi.org/10.1111/1755-0998.13661 Text en © 2022 The Authors. Molecular Ecology Resources published by John Wiley & Sons Ltd. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle RESOURCE ARTICLES
Martin Cerezo, Maria Luisa
Raval, Rohan
de Haro Reyes, Bernardo
Kucka, Marek
Chan, Frank Yingguang
Bryk, Jarosław
Identification and quantification of chimeric sequencing reads in a highly multiplexed RAD‐seq protocol
title Identification and quantification of chimeric sequencing reads in a highly multiplexed RAD‐seq protocol
title_full Identification and quantification of chimeric sequencing reads in a highly multiplexed RAD‐seq protocol
title_fullStr Identification and quantification of chimeric sequencing reads in a highly multiplexed RAD‐seq protocol
title_full_unstemmed Identification and quantification of chimeric sequencing reads in a highly multiplexed RAD‐seq protocol
title_short Identification and quantification of chimeric sequencing reads in a highly multiplexed RAD‐seq protocol
title_sort identification and quantification of chimeric sequencing reads in a highly multiplexed rad‐seq protocol
topic RESOURCE ARTICLES
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9796921/
https://www.ncbi.nlm.nih.gov/pubmed/35668693
http://dx.doi.org/10.1111/1755-0998.13661
work_keys_str_mv AT martincerezomarialuisa identificationandquantificationofchimericsequencingreadsinahighlymultiplexedradseqprotocol
AT ravalrohan identificationandquantificationofchimericsequencingreadsinahighlymultiplexedradseqprotocol
AT deharoreyesbernardo identificationandquantificationofchimericsequencingreadsinahighlymultiplexedradseqprotocol
AT kuckamarek identificationandquantificationofchimericsequencingreadsinahighlymultiplexedradseqprotocol
AT chanfrankyingguang identificationandquantificationofchimericsequencingreadsinahighlymultiplexedradseqprotocol
AT brykjarosław identificationandquantificationofchimericsequencingreadsinahighlymultiplexedradseqprotocol