Cargando…
Transforming L1000 profiles to RNA-seq-like profiles with deep learning
The L1000 technology, a cost-effective high-throughput transcriptomics technology, has been applied to profile a collection of human cell lines for their gene expression response to > 30,000 chemical and genetic perturbations. In total, there are currently over 3 million available L1000 profiles....
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9472394/ https://www.ncbi.nlm.nih.gov/pubmed/36100892 http://dx.doi.org/10.1186/s12859-022-04895-5 |
_version_ | 1784789294657830912 |
---|---|
author | Jeon, Minji Xie, Zhuorui Evangelista, John E. Wojciechowicz, Megan L. Clarke, Daniel J. B. Ma’ayan, Avi |
author_facet | Jeon, Minji Xie, Zhuorui Evangelista, John E. Wojciechowicz, Megan L. Clarke, Daniel J. B. Ma’ayan, Avi |
author_sort | Jeon, Minji |
collection | PubMed |
description | The L1000 technology, a cost-effective high-throughput transcriptomics technology, has been applied to profile a collection of human cell lines for their gene expression response to > 30,000 chemical and genetic perturbations. In total, there are currently over 3 million available L1000 profiles. Such a dataset is invaluable for the discovery of drug and target candidates and for inferring mechanisms of action for small molecules. The L1000 assay only measures the mRNA expression of 978 landmark genes while 11,350 additional genes are computationally reliably inferred. The lack of full genome coverage limits knowledge discovery for half of the human protein coding genes, and the potential for integration with other transcriptomics profiling data. Here we present a Deep Learning two-step model that transforms L1000 profiles to RNA-seq-like profiles. The input to the model are the measured 978 landmark genes while the output is a vector of 23,614 RNA-seq-like gene expression profiles. The model first transforms the landmark genes into RNA-seq-like 978 gene profiles using a modified CycleGAN model applied to unpaired data. The transformed 978 RNA-seq-like landmark genes are then extrapolated into the full genome space with a fully connected neural network model. The two-step model achieves 0.914 Pearson’s correlation coefficients and 1.167 root mean square errors when tested on a published paired L1000/RNA-seq dataset produced by the LINCS and GTEx programs. The processed RNA-seq-like profiles are made available for download, signature search, and gene centric reverse search with unique case studies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04895-5. |
format | Online Article Text |
id | pubmed-9472394 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-94723942022-09-15 Transforming L1000 profiles to RNA-seq-like profiles with deep learning Jeon, Minji Xie, Zhuorui Evangelista, John E. Wojciechowicz, Megan L. Clarke, Daniel J. B. Ma’ayan, Avi BMC Bioinformatics Research The L1000 technology, a cost-effective high-throughput transcriptomics technology, has been applied to profile a collection of human cell lines for their gene expression response to > 30,000 chemical and genetic perturbations. In total, there are currently over 3 million available L1000 profiles. Such a dataset is invaluable for the discovery of drug and target candidates and for inferring mechanisms of action for small molecules. The L1000 assay only measures the mRNA expression of 978 landmark genes while 11,350 additional genes are computationally reliably inferred. The lack of full genome coverage limits knowledge discovery for half of the human protein coding genes, and the potential for integration with other transcriptomics profiling data. Here we present a Deep Learning two-step model that transforms L1000 profiles to RNA-seq-like profiles. The input to the model are the measured 978 landmark genes while the output is a vector of 23,614 RNA-seq-like gene expression profiles. The model first transforms the landmark genes into RNA-seq-like 978 gene profiles using a modified CycleGAN model applied to unpaired data. The transformed 978 RNA-seq-like landmark genes are then extrapolated into the full genome space with a fully connected neural network model. The two-step model achieves 0.914 Pearson’s correlation coefficients and 1.167 root mean square errors when tested on a published paired L1000/RNA-seq dataset produced by the LINCS and GTEx programs. The processed RNA-seq-like profiles are made available for download, signature search, and gene centric reverse search with unique case studies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04895-5. BioMed Central 2022-09-13 /pmc/articles/PMC9472394/ /pubmed/36100892 http://dx.doi.org/10.1186/s12859-022-04895-5 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Jeon, Minji Xie, Zhuorui Evangelista, John E. Wojciechowicz, Megan L. Clarke, Daniel J. B. Ma’ayan, Avi Transforming L1000 profiles to RNA-seq-like profiles with deep learning |
title | Transforming L1000 profiles to RNA-seq-like profiles with deep learning |
title_full | Transforming L1000 profiles to RNA-seq-like profiles with deep learning |
title_fullStr | Transforming L1000 profiles to RNA-seq-like profiles with deep learning |
title_full_unstemmed | Transforming L1000 profiles to RNA-seq-like profiles with deep learning |
title_short | Transforming L1000 profiles to RNA-seq-like profiles with deep learning |
title_sort | transforming l1000 profiles to rna-seq-like profiles with deep learning |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9472394/ https://www.ncbi.nlm.nih.gov/pubmed/36100892 http://dx.doi.org/10.1186/s12859-022-04895-5 |
work_keys_str_mv | AT jeonminji transformingl1000profilestornaseqlikeprofileswithdeeplearning AT xiezhuorui transformingl1000profilestornaseqlikeprofileswithdeeplearning AT evangelistajohne transformingl1000profilestornaseqlikeprofileswithdeeplearning AT wojciechowiczmeganl transformingl1000profilestornaseqlikeprofileswithdeeplearning AT clarkedanieljb transformingl1000profilestornaseqlikeprofileswithdeeplearning AT maayanavi transformingl1000profilestornaseqlikeprofileswithdeeplearning |