Cargando…
excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies
SUMMARY: Exclusion regions are sections of reference genomes with abnormal pileups of short sequencing reads. Removing reads overlapping them improves biological signal, and these benefits are most pronounced in differential analysis settings. Several labs created exclusion region sets, available pr...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10126321/ https://www.ncbi.nlm.nih.gov/pubmed/37067481 http://dx.doi.org/10.1093/bioinformatics/btad198 |
_version_ | 1785030216535506944 |
---|---|
author | Ogata, Jonathan D Mu, Wancen Davis, Eric S Xue, Bingjie Harrell, J Chuck Sheffield, Nathan C Phanstiel, Douglas H Love, Michael I Dozmorov, Mikhail G |
author_facet | Ogata, Jonathan D Mu, Wancen Davis, Eric S Xue, Bingjie Harrell, J Chuck Sheffield, Nathan C Phanstiel, Douglas H Love, Michael I Dozmorov, Mikhail G |
author_sort | Ogata, Jonathan D |
collection | PubMed |
description | SUMMARY: Exclusion regions are sections of reference genomes with abnormal pileups of short sequencing reads. Removing reads overlapping them improves biological signal, and these benefits are most pronounced in differential analysis settings. Several labs created exclusion region sets, available primarily through ENCODE and Github. However, the variety of exclusion sets creates uncertainty which sets to use. Furthermore, gap regions (e.g. centromeres, telomeres, short arms) create additional considerations in generating exclusion sets. We generated exclusion sets for the latest human T2T-CHM13 and mouse GRCm39 genomes and systematically assembled and annotated these and other sets in the excluderanges R/Bioconductor data package, also accessible via the BEDbase.org API. The package provides unified access to 82 GenomicRanges objects covering six organisms, multiple genome assemblies, and types of exclusion regions. For human hg38 genome assembly, we recommend hg38.Kundaje.GRCh38_unified_blacklist as the most well-curated and annotated, and sets generated by the Blacklist tool for other organisms. AVAILABILITY AND IMPLEMENTATION: https://bioconductor.org/packages/excluderanges/. Package website: https://dozmorovlab.github.io/excluderanges/. |
format | Online Article Text |
id | pubmed-10126321 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-101263212023-04-26 excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies Ogata, Jonathan D Mu, Wancen Davis, Eric S Xue, Bingjie Harrell, J Chuck Sheffield, Nathan C Phanstiel, Douglas H Love, Michael I Dozmorov, Mikhail G Bioinformatics Applications Note SUMMARY: Exclusion regions are sections of reference genomes with abnormal pileups of short sequencing reads. Removing reads overlapping them improves biological signal, and these benefits are most pronounced in differential analysis settings. Several labs created exclusion region sets, available primarily through ENCODE and Github. However, the variety of exclusion sets creates uncertainty which sets to use. Furthermore, gap regions (e.g. centromeres, telomeres, short arms) create additional considerations in generating exclusion sets. We generated exclusion sets for the latest human T2T-CHM13 and mouse GRCm39 genomes and systematically assembled and annotated these and other sets in the excluderanges R/Bioconductor data package, also accessible via the BEDbase.org API. The package provides unified access to 82 GenomicRanges objects covering six organisms, multiple genome assemblies, and types of exclusion regions. For human hg38 genome assembly, we recommend hg38.Kundaje.GRCh38_unified_blacklist as the most well-curated and annotated, and sets generated by the Blacklist tool for other organisms. AVAILABILITY AND IMPLEMENTATION: https://bioconductor.org/packages/excluderanges/. Package website: https://dozmorovlab.github.io/excluderanges/. Oxford University Press 2023-04-17 /pmc/articles/PMC10126321/ /pubmed/37067481 http://dx.doi.org/10.1093/bioinformatics/btad198 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Applications Note Ogata, Jonathan D Mu, Wancen Davis, Eric S Xue, Bingjie Harrell, J Chuck Sheffield, Nathan C Phanstiel, Douglas H Love, Michael I Dozmorov, Mikhail G excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies |
title | excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies |
title_full | excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies |
title_fullStr | excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies |
title_full_unstemmed | excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies |
title_short | excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies |
title_sort | excluderanges: exclusion sets for t2t-chm13, grcm39, and other genome assemblies |
topic | Applications Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10126321/ https://www.ncbi.nlm.nih.gov/pubmed/37067481 http://dx.doi.org/10.1093/bioinformatics/btad198 |
work_keys_str_mv | AT ogatajonathand excluderangesexclusionsetsfort2tchm13grcm39andothergenomeassemblies AT muwancen excluderangesexclusionsetsfort2tchm13grcm39andothergenomeassemblies AT daviserics excluderangesexclusionsetsfort2tchm13grcm39andothergenomeassemblies AT xuebingjie excluderangesexclusionsetsfort2tchm13grcm39andothergenomeassemblies AT harrelljchuck excluderangesexclusionsetsfort2tchm13grcm39andothergenomeassemblies AT sheffieldnathanc excluderangesexclusionsetsfort2tchm13grcm39andothergenomeassemblies AT phanstieldouglash excluderangesexclusionsetsfort2tchm13grcm39andothergenomeassemblies AT lovemichaeli excluderangesexclusionsetsfort2tchm13grcm39andothergenomeassemblies AT dozmorovmikhailg excluderangesexclusionsetsfort2tchm13grcm39andothergenomeassemblies |