Cargando…

Bias-invariant RNA-sequencing metadata annotation

BACKGROUND: Recent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Missing annotations makes it impossible for researchers to fin...

Descripción completa

Detalles Bibliográficos
Autores principales: Wartmann, Hannes, Heins, Sven, Kloiber, Karin, Bonn, Stefan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8559615/
https://www.ncbi.nlm.nih.gov/pubmed/34553213
http://dx.doi.org/10.1093/gigascience/giab064
_version_ 1784592792727584768
author Wartmann, Hannes
Heins, Sven
Kloiber, Karin
Bonn, Stefan
author_facet Wartmann, Hannes
Heins, Sven
Kloiber, Karin
Bonn, Stefan
author_sort Wartmann, Hannes
collection PubMed
description BACKGROUND: Recent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Missing annotations makes it impossible for researchers to find datasets specific to their needs. FINDINGS: Here, we investigate RNA-sequencing metadata prediction based on gene expression values. We present a deep-learning–based domain adaptation algorithm for the automatic annotation of RNA-sequencing metadata. We show, in multiple experiments, that our model is better at integrating heterogeneous training data compared with existing linear regression–based approaches, resulting in improved tissue type classification. By using a model architecture similar to Siamese networks, the algorithm can learn biases from datasets with few samples. CONCLUSION: Using our novel domain adaptation approach, we achieved metadata annotation accuracies up to 15.7% better than a previously published method. Using the best model, we provide a list of >10,000 novel tissue and sex label annotations for 8,495 unique SRA samples. Our approach has the potential to revive idle datasets by automated annotation making them more searchable.
format Online
Article
Text
id pubmed-8559615
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-85596152021-11-02 Bias-invariant RNA-sequencing metadata annotation Wartmann, Hannes Heins, Sven Kloiber, Karin Bonn, Stefan Gigascience Technical Note BACKGROUND: Recent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Missing annotations makes it impossible for researchers to find datasets specific to their needs. FINDINGS: Here, we investigate RNA-sequencing metadata prediction based on gene expression values. We present a deep-learning–based domain adaptation algorithm for the automatic annotation of RNA-sequencing metadata. We show, in multiple experiments, that our model is better at integrating heterogeneous training data compared with existing linear regression–based approaches, resulting in improved tissue type classification. By using a model architecture similar to Siamese networks, the algorithm can learn biases from datasets with few samples. CONCLUSION: Using our novel domain adaptation approach, we achieved metadata annotation accuracies up to 15.7% better than a previously published method. Using the best model, we provide a list of >10,000 novel tissue and sex label annotations for 8,495 unique SRA samples. Our approach has the potential to revive idle datasets by automated annotation making them more searchable. Oxford University Press 2021-09-22 /pmc/articles/PMC8559615/ /pubmed/34553213 http://dx.doi.org/10.1093/gigascience/giab064 Text en © The Author(s) 2021. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Wartmann, Hannes
Heins, Sven
Kloiber, Karin
Bonn, Stefan
Bias-invariant RNA-sequencing metadata annotation
title Bias-invariant RNA-sequencing metadata annotation
title_full Bias-invariant RNA-sequencing metadata annotation
title_fullStr Bias-invariant RNA-sequencing metadata annotation
title_full_unstemmed Bias-invariant RNA-sequencing metadata annotation
title_short Bias-invariant RNA-sequencing metadata annotation
title_sort bias-invariant rna-sequencing metadata annotation
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8559615/
https://www.ncbi.nlm.nih.gov/pubmed/34553213
http://dx.doi.org/10.1093/gigascience/giab064
work_keys_str_mv AT wartmannhannes biasinvariantrnasequencingmetadataannotation
AT heinssven biasinvariantrnasequencingmetadataannotation
AT kloiberkarin biasinvariantrnasequencingmetadataannotation
AT bonnstefan biasinvariantrnasequencingmetadataannotation