Cargando…

excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies

SUMMARY: Exclusion regions are sections of reference genomes with abnormal pileups of short sequencing reads. Removing reads overlapping them improves biological signal, and these benefits are most pronounced in differential analysis settings. Several labs created exclusion region sets, available pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Ogata, Jonathan D, Mu, Wancen, Davis, Eric S, Xue, Bingjie, Harrell, J Chuck, Sheffield, Nathan C, Phanstiel, Douglas H, Love, Michael I, Dozmorov, Mikhail G
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10126321/
https://www.ncbi.nlm.nih.gov/pubmed/37067481
http://dx.doi.org/10.1093/bioinformatics/btad198
_version_ 1785030216535506944
author Ogata, Jonathan D
Mu, Wancen
Davis, Eric S
Xue, Bingjie
Harrell, J Chuck
Sheffield, Nathan C
Phanstiel, Douglas H
Love, Michael I
Dozmorov, Mikhail G
author_facet Ogata, Jonathan D
Mu, Wancen
Davis, Eric S
Xue, Bingjie
Harrell, J Chuck
Sheffield, Nathan C
Phanstiel, Douglas H
Love, Michael I
Dozmorov, Mikhail G
author_sort Ogata, Jonathan D
collection PubMed
description SUMMARY: Exclusion regions are sections of reference genomes with abnormal pileups of short sequencing reads. Removing reads overlapping them improves biological signal, and these benefits are most pronounced in differential analysis settings. Several labs created exclusion region sets, available primarily through ENCODE and Github. However, the variety of exclusion sets creates uncertainty which sets to use. Furthermore, gap regions (e.g. centromeres, telomeres, short arms) create additional considerations in generating exclusion sets. We generated exclusion sets for the latest human T2T-CHM13 and mouse GRCm39 genomes and systematically assembled and annotated these and other sets in the excluderanges R/Bioconductor data package, also accessible via the BEDbase.org API. The package provides unified access to 82 GenomicRanges objects covering six organisms, multiple genome assemblies, and types of exclusion regions. For human hg38 genome assembly, we recommend hg38.Kundaje.GRCh38_unified_blacklist as the most well-curated and annotated, and sets generated by the Blacklist tool for other organisms. AVAILABILITY AND IMPLEMENTATION: https://bioconductor.org/packages/excluderanges/. Package website: https://dozmorovlab.github.io/excluderanges/.
format Online
Article
Text
id pubmed-10126321
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-101263212023-04-26 excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies Ogata, Jonathan D Mu, Wancen Davis, Eric S Xue, Bingjie Harrell, J Chuck Sheffield, Nathan C Phanstiel, Douglas H Love, Michael I Dozmorov, Mikhail G Bioinformatics Applications Note SUMMARY: Exclusion regions are sections of reference genomes with abnormal pileups of short sequencing reads. Removing reads overlapping them improves biological signal, and these benefits are most pronounced in differential analysis settings. Several labs created exclusion region sets, available primarily through ENCODE and Github. However, the variety of exclusion sets creates uncertainty which sets to use. Furthermore, gap regions (e.g. centromeres, telomeres, short arms) create additional considerations in generating exclusion sets. We generated exclusion sets for the latest human T2T-CHM13 and mouse GRCm39 genomes and systematically assembled and annotated these and other sets in the excluderanges R/Bioconductor data package, also accessible via the BEDbase.org API. The package provides unified access to 82 GenomicRanges objects covering six organisms, multiple genome assemblies, and types of exclusion regions. For human hg38 genome assembly, we recommend hg38.Kundaje.GRCh38_unified_blacklist as the most well-curated and annotated, and sets generated by the Blacklist tool for other organisms. AVAILABILITY AND IMPLEMENTATION: https://bioconductor.org/packages/excluderanges/. Package website: https://dozmorovlab.github.io/excluderanges/. Oxford University Press 2023-04-17 /pmc/articles/PMC10126321/ /pubmed/37067481 http://dx.doi.org/10.1093/bioinformatics/btad198 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Note
Ogata, Jonathan D
Mu, Wancen
Davis, Eric S
Xue, Bingjie
Harrell, J Chuck
Sheffield, Nathan C
Phanstiel, Douglas H
Love, Michael I
Dozmorov, Mikhail G
excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies
title excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies
title_full excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies
title_fullStr excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies
title_full_unstemmed excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies
title_short excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies
title_sort excluderanges: exclusion sets for t2t-chm13, grcm39, and other genome assemblies
topic Applications Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10126321/
https://www.ncbi.nlm.nih.gov/pubmed/37067481
http://dx.doi.org/10.1093/bioinformatics/btad198
work_keys_str_mv AT ogatajonathand excluderangesexclusionsetsfort2tchm13grcm39andothergenomeassemblies
AT muwancen excluderangesexclusionsetsfort2tchm13grcm39andothergenomeassemblies
AT daviserics excluderangesexclusionsetsfort2tchm13grcm39andothergenomeassemblies
AT xuebingjie excluderangesexclusionsetsfort2tchm13grcm39andothergenomeassemblies
AT harrelljchuck excluderangesexclusionsetsfort2tchm13grcm39andothergenomeassemblies
AT sheffieldnathanc excluderangesexclusionsetsfort2tchm13grcm39andothergenomeassemblies
AT phanstieldouglash excluderangesexclusionsetsfort2tchm13grcm39andothergenomeassemblies
AT lovemichaeli excluderangesexclusionsetsfort2tchm13grcm39andothergenomeassemblies
AT dozmorovmikhailg excluderangesexclusionsetsfort2tchm13grcm39andothergenomeassemblies