Cargando…

FILER: a framework for harmonizing and querying large-scale functional genomics knowledge

Querying massive functional genomic and annotation data collections, linking and summarizing the query results across data sources/data types are important steps in high-throughput genomic and genetic analytical workflows. However, these steps are made difficult by the heterogeneity and breadth of d...

Descripción completa

Detalles Bibliográficos
Autores principales: Kuksa, Pavel P, Leung, Yuk Yee, Gangadharan, Prabhakaran, Katanic, Zivadin, Kleidermacher, Lauren, Amlie-Wolf, Alexandre, Lee, Chien-Yueh, Qu, Liming, Greenfest-Allen, Emily, Valladares, Otto, Wang, Li-San
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8759563/
https://www.ncbi.nlm.nih.gov/pubmed/35047815
http://dx.doi.org/10.1093/nargab/lqab123
_version_ 1784633127362101248
author Kuksa, Pavel P
Leung, Yuk Yee
Gangadharan, Prabhakaran
Katanic, Zivadin
Kleidermacher, Lauren
Amlie-Wolf, Alexandre
Lee, Chien-Yueh
Qu, Liming
Greenfest-Allen, Emily
Valladares, Otto
Wang, Li-San
author_facet Kuksa, Pavel P
Leung, Yuk Yee
Gangadharan, Prabhakaran
Katanic, Zivadin
Kleidermacher, Lauren
Amlie-Wolf, Alexandre
Lee, Chien-Yueh
Qu, Liming
Greenfest-Allen, Emily
Valladares, Otto
Wang, Li-San
author_sort Kuksa, Pavel P
collection PubMed
description Querying massive functional genomic and annotation data collections, linking and summarizing the query results across data sources/data types are important steps in high-throughput genomic and genetic analytical workflows. However, these steps are made difficult by the heterogeneity and breadth of data sources, experimental assays, biological conditions/tissues/cell types and file formats. FILER (FunctIonaL gEnomics Repository) is a framework for querying large-scale genomics knowledge with a large, curated integrated catalog of harmonized functional genomic and annotation data coupled with a scalable genomic search and querying interface. FILER uniquely provides: (i) streamlined access to >50 000 harmonized, annotated genomic datasets across >20 integrated data sources, >1100 tissues/cell types and >20 experimental assays; (ii) a scalable genomic querying interface; and (iii) ability to analyze and annotate user’s experimental data. This rich resource spans >17 billion GRCh37/hg19 and GRCh38/hg38 genomic records. Our benchmark querying 7 × 10(9) hg19 FILER records shows FILER is highly scalable, with a sub-linear 32-fold increase in querying time when increasing the number of queries 1000-fold from 1000 to 1 000 000 intervals. Together, these features facilitate reproducible research and streamline integrating/querying large-scale genomic data within analyses/workflows. FILER can be deployed on cloud or local servers (https://bitbucket.org/wanglab-upenn/FILER) for integration with custom pipelines and is freely available (https://lisanwanglab.org/FILER).
format Online
Article
Text
id pubmed-8759563
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-87595632022-01-18 FILER: a framework for harmonizing and querying large-scale functional genomics knowledge Kuksa, Pavel P Leung, Yuk Yee Gangadharan, Prabhakaran Katanic, Zivadin Kleidermacher, Lauren Amlie-Wolf, Alexandre Lee, Chien-Yueh Qu, Liming Greenfest-Allen, Emily Valladares, Otto Wang, Li-San NAR Genom Bioinform Standard Article Querying massive functional genomic and annotation data collections, linking and summarizing the query results across data sources/data types are important steps in high-throughput genomic and genetic analytical workflows. However, these steps are made difficult by the heterogeneity and breadth of data sources, experimental assays, biological conditions/tissues/cell types and file formats. FILER (FunctIonaL gEnomics Repository) is a framework for querying large-scale genomics knowledge with a large, curated integrated catalog of harmonized functional genomic and annotation data coupled with a scalable genomic search and querying interface. FILER uniquely provides: (i) streamlined access to >50 000 harmonized, annotated genomic datasets across >20 integrated data sources, >1100 tissues/cell types and >20 experimental assays; (ii) a scalable genomic querying interface; and (iii) ability to analyze and annotate user’s experimental data. This rich resource spans >17 billion GRCh37/hg19 and GRCh38/hg38 genomic records. Our benchmark querying 7 × 10(9) hg19 FILER records shows FILER is highly scalable, with a sub-linear 32-fold increase in querying time when increasing the number of queries 1000-fold from 1000 to 1 000 000 intervals. Together, these features facilitate reproducible research and streamline integrating/querying large-scale genomic data within analyses/workflows. FILER can be deployed on cloud or local servers (https://bitbucket.org/wanglab-upenn/FILER) for integration with custom pipelines and is freely available (https://lisanwanglab.org/FILER). Oxford University Press 2022-01-14 /pmc/articles/PMC8759563/ /pubmed/35047815 http://dx.doi.org/10.1093/nargab/lqab123 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Standard Article
Kuksa, Pavel P
Leung, Yuk Yee
Gangadharan, Prabhakaran
Katanic, Zivadin
Kleidermacher, Lauren
Amlie-Wolf, Alexandre
Lee, Chien-Yueh
Qu, Liming
Greenfest-Allen, Emily
Valladares, Otto
Wang, Li-San
FILER: a framework for harmonizing and querying large-scale functional genomics knowledge
title FILER: a framework for harmonizing and querying large-scale functional genomics knowledge
title_full FILER: a framework for harmonizing and querying large-scale functional genomics knowledge
title_fullStr FILER: a framework for harmonizing and querying large-scale functional genomics knowledge
title_full_unstemmed FILER: a framework for harmonizing and querying large-scale functional genomics knowledge
title_short FILER: a framework for harmonizing and querying large-scale functional genomics knowledge
title_sort filer: a framework for harmonizing and querying large-scale functional genomics knowledge
topic Standard Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8759563/
https://www.ncbi.nlm.nih.gov/pubmed/35047815
http://dx.doi.org/10.1093/nargab/lqab123
work_keys_str_mv AT kuksapavelp fileraframeworkforharmonizingandqueryinglargescalefunctionalgenomicsknowledge
AT leungyukyee fileraframeworkforharmonizingandqueryinglargescalefunctionalgenomicsknowledge
AT gangadharanprabhakaran fileraframeworkforharmonizingandqueryinglargescalefunctionalgenomicsknowledge
AT kataniczivadin fileraframeworkforharmonizingandqueryinglargescalefunctionalgenomicsknowledge
AT kleidermacherlauren fileraframeworkforharmonizingandqueryinglargescalefunctionalgenomicsknowledge
AT amliewolfalexandre fileraframeworkforharmonizingandqueryinglargescalefunctionalgenomicsknowledge
AT leechienyueh fileraframeworkforharmonizingandqueryinglargescalefunctionalgenomicsknowledge
AT quliming fileraframeworkforharmonizingandqueryinglargescalefunctionalgenomicsknowledge
AT greenfestallenemily fileraframeworkforharmonizingandqueryinglargescalefunctionalgenomicsknowledge
AT valladaresotto fileraframeworkforharmonizingandqueryinglargescalefunctionalgenomicsknowledge
AT wanglisan fileraframeworkforharmonizingandqueryinglargescalefunctionalgenomicsknowledge