Cargando…

MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes

Massively parallel reporter assays (MPRAs) enable high-throughput functional evaluation of numerous DNA regulatory elements and/or their mutant variants. The assays are based on the construction of reporter plasmid libraries containing two variable parts, a region of interest (ROI) and a barcode (BC...

Descripción completa

Detalles Bibliográficos
Autores principales: Letiagina, Anna E., Omelina, Evgeniya S., Ivankin, Anton V., Pindyurin, Alexey V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8148044/
https://www.ncbi.nlm.nih.gov/pubmed/34046055
http://dx.doi.org/10.3389/fgene.2021.618189
_version_ 1783697765439635456
author Letiagina, Anna E.
Omelina, Evgeniya S.
Ivankin, Anton V.
Pindyurin, Alexey V.
author_facet Letiagina, Anna E.
Omelina, Evgeniya S.
Ivankin, Anton V.
Pindyurin, Alexey V.
author_sort Letiagina, Anna E.
collection PubMed
description Massively parallel reporter assays (MPRAs) enable high-throughput functional evaluation of numerous DNA regulatory elements and/or their mutant variants. The assays are based on the construction of reporter plasmid libraries containing two variable parts, a region of interest (ROI) and a barcode (BC), located outside and within the transcription unit, respectively. Importantly, each plasmid molecule in a such a highly diverse library is characterized by a unique BC–ROI association. The reporter constructs are delivered to target cells and expression of BCs at the transcript level is assayed by RT-PCR followed by next-generation sequencing (NGS). The obtained values are normalized to the abundance of BCs in the plasmid DNA sample. Altogether, this allows evaluating the regulatory potential of the associated ROI sequences. However, depending on the MPRA library construction design, the BC and ROI sequences as well as their associations can be a priori unknown. In such a case, the BC and ROI sequences, their possible mutant variants, and unambiguous BC–ROI associations have to be identified, whereas all uncertain cases have to be excluded from the analysis. Besides the preparation of additional “mapping” samples for NGS, this also requires specific bioinformatics tools. Here, we present a pipeline for processing raw MPRA data obtained by NGS for reporter construct libraries with a priori unknown sequences of BCs and ROIs. The pipeline robustly identifies unambiguous (so-called genuine) BCs and ROIs associated with them, calculates the normalized expression level for each BC and the averaged values for each ROI, and provides a graphical visualization of the processed data.
format Online
Article
Text
id pubmed-8148044
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-81480442021-05-26 MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes Letiagina, Anna E. Omelina, Evgeniya S. Ivankin, Anton V. Pindyurin, Alexey V. Front Genet Genetics Massively parallel reporter assays (MPRAs) enable high-throughput functional evaluation of numerous DNA regulatory elements and/or their mutant variants. The assays are based on the construction of reporter plasmid libraries containing two variable parts, a region of interest (ROI) and a barcode (BC), located outside and within the transcription unit, respectively. Importantly, each plasmid molecule in a such a highly diverse library is characterized by a unique BC–ROI association. The reporter constructs are delivered to target cells and expression of BCs at the transcript level is assayed by RT-PCR followed by next-generation sequencing (NGS). The obtained values are normalized to the abundance of BCs in the plasmid DNA sample. Altogether, this allows evaluating the regulatory potential of the associated ROI sequences. However, depending on the MPRA library construction design, the BC and ROI sequences as well as their associations can be a priori unknown. In such a case, the BC and ROI sequences, their possible mutant variants, and unambiguous BC–ROI associations have to be identified, whereas all uncertain cases have to be excluded from the analysis. Besides the preparation of additional “mapping” samples for NGS, this also requires specific bioinformatics tools. Here, we present a pipeline for processing raw MPRA data obtained by NGS for reporter construct libraries with a priori unknown sequences of BCs and ROIs. The pipeline robustly identifies unambiguous (so-called genuine) BCs and ROIs associated with them, calculates the normalized expression level for each BC and the averaged values for each ROI, and provides a graphical visualization of the processed data. Frontiers Media S.A. 2021-05-11 /pmc/articles/PMC8148044/ /pubmed/34046055 http://dx.doi.org/10.3389/fgene.2021.618189 Text en Copyright © 2021 Letiagina, Omelina, Ivankin and Pindyurin. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Letiagina, Anna E.
Omelina, Evgeniya S.
Ivankin, Anton V.
Pindyurin, Alexey V.
MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes
title MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes
title_full MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes
title_fullStr MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes
title_full_unstemmed MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes
title_short MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes
title_sort mpradecoder: processing of the raw mpra data with a priori unknown sequences of the region of interest and associated barcodes
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8148044/
https://www.ncbi.nlm.nih.gov/pubmed/34046055
http://dx.doi.org/10.3389/fgene.2021.618189
work_keys_str_mv AT letiaginaannae mpradecoderprocessingoftherawmpradatawithaprioriunknownsequencesoftheregionofinterestandassociatedbarcodes
AT omelinaevgeniyas mpradecoderprocessingoftherawmpradatawithaprioriunknownsequencesoftheregionofinterestandassociatedbarcodes
AT ivankinantonv mpradecoderprocessingoftherawmpradatawithaprioriunknownsequencesoftheregionofinterestandassociatedbarcodes
AT pindyurinalexeyv mpradecoderprocessingoftherawmpradatawithaprioriunknownsequencesoftheregionofinterestandassociatedbarcodes