Cargando…

Underlying causes for prevalent false positives and false negatives in STARR-seq data

Self-transcribing active regulatory region sequencing (STARR-seq) and its variants have been widely used to characterize enhancers. However, it has been reported that up to 87% of STARR-seq peaks are located in repressive chromatin and are not functional in the tested cells. While some of the STARR-...

Descripción completa

Detalles Bibliográficos
Autores principales: Ni, Pengyu, Wu, Siwen, Su, Zhengchang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516709/
https://www.ncbi.nlm.nih.gov/pubmed/37745976
http://dx.doi.org/10.1093/nargab/lqad085
_version_ 1785109184436502528
author Ni, Pengyu
Wu, Siwen
Su, Zhengchang
author_facet Ni, Pengyu
Wu, Siwen
Su, Zhengchang
author_sort Ni, Pengyu
collection PubMed
description Self-transcribing active regulatory region sequencing (STARR-seq) and its variants have been widely used to characterize enhancers. However, it has been reported that up to 87% of STARR-seq peaks are located in repressive chromatin and are not functional in the tested cells. While some of the STARR-seq peaks in repressive chromatin might be active in other cell/tissue types, some others might be false positives. Meanwhile, many active enhancers may not be identified by the current STARR-seq methods. Although methods have been proposed to mitigate systematic errors caused by the use of plasmid vectors, the artifacts due to the intrinsic limitations of current STARR-seq methods are still prevalent and the underlying causes are not fully understood. Based on predicted cis-regulatory modules (CRMs) and non-CRMs in the human genome as well as predicted active CRMs and non-active CRMs in a few human cell lines/tissues with STARR-seq data available, we reveal prevalent false positives and false negatives in STARR-seq peaks generated by major variants of STARR-seq methods and possible underlying causes. Our results will help design strategies to improve STARR-seq methods and interpret the results.
format Online
Article
Text
id pubmed-10516709
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-105167092023-09-23 Underlying causes for prevalent false positives and false negatives in STARR-seq data Ni, Pengyu Wu, Siwen Su, Zhengchang NAR Genom Bioinform Standard Article Self-transcribing active regulatory region sequencing (STARR-seq) and its variants have been widely used to characterize enhancers. However, it has been reported that up to 87% of STARR-seq peaks are located in repressive chromatin and are not functional in the tested cells. While some of the STARR-seq peaks in repressive chromatin might be active in other cell/tissue types, some others might be false positives. Meanwhile, many active enhancers may not be identified by the current STARR-seq methods. Although methods have been proposed to mitigate systematic errors caused by the use of plasmid vectors, the artifacts due to the intrinsic limitations of current STARR-seq methods are still prevalent and the underlying causes are not fully understood. Based on predicted cis-regulatory modules (CRMs) and non-CRMs in the human genome as well as predicted active CRMs and non-active CRMs in a few human cell lines/tissues with STARR-seq data available, we reveal prevalent false positives and false negatives in STARR-seq peaks generated by major variants of STARR-seq methods and possible underlying causes. Our results will help design strategies to improve STARR-seq methods and interpret the results. Oxford University Press 2023-09-22 /pmc/articles/PMC10516709/ /pubmed/37745976 http://dx.doi.org/10.1093/nargab/lqad085 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Standard Article
Ni, Pengyu
Wu, Siwen
Su, Zhengchang
Underlying causes for prevalent false positives and false negatives in STARR-seq data
title Underlying causes for prevalent false positives and false negatives in STARR-seq data
title_full Underlying causes for prevalent false positives and false negatives in STARR-seq data
title_fullStr Underlying causes for prevalent false positives and false negatives in STARR-seq data
title_full_unstemmed Underlying causes for prevalent false positives and false negatives in STARR-seq data
title_short Underlying causes for prevalent false positives and false negatives in STARR-seq data
title_sort underlying causes for prevalent false positives and false negatives in starr-seq data
topic Standard Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516709/
https://www.ncbi.nlm.nih.gov/pubmed/37745976
http://dx.doi.org/10.1093/nargab/lqad085
work_keys_str_mv AT nipengyu underlyingcausesforprevalentfalsepositivesandfalsenegativesinstarrseqdata
AT wusiwen underlyingcausesforprevalentfalsepositivesandfalsenegativesinstarrseqdata
AT suzhengchang underlyingcausesforprevalentfalsepositivesandfalsenegativesinstarrseqdata