Cargando…
A multi-view genomic data simulator
BACKGROUND: OMICs technologies allow to assay the state of a large number of different features (e.g., mRNA expression, miRNA expression, copy number variation, DNA methylation, etc.) from the same samples. The objective of these experiments is usually to find a reduced set of significant features,...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4448275/ https://www.ncbi.nlm.nih.gov/pubmed/25962835 http://dx.doi.org/10.1186/s12859-015-0577-1 |
_version_ | 1782373685293744128 |
---|---|
author | Fratello, Michele Serra, Angela Fortino, Vittorio Raiconi, Giancarlo Tagliaferri, Roberto Greco, Dario |
author_facet | Fratello, Michele Serra, Angela Fortino, Vittorio Raiconi, Giancarlo Tagliaferri, Roberto Greco, Dario |
author_sort | Fratello, Michele |
collection | PubMed |
description | BACKGROUND: OMICs technologies allow to assay the state of a large number of different features (e.g., mRNA expression, miRNA expression, copy number variation, DNA methylation, etc.) from the same samples. The objective of these experiments is usually to find a reduced set of significant features, which can be used to differentiate the conditions assayed. In terms of development of novel feature selection computational methods, this task is challenging for the lack of fully annotated biological datasets to be used for benchmarking. A possible way to tackle this problem is generating appropriate synthetic datasets, whose composition and behaviour are fully controlled and known a priori. RESULTS: Here we propose a novel method centred on the generation of networks of interactions among different biological molecules, especially involved in regulating gene expression. Synthetic datasets are obtained from ordinary differential equations based models with known parameters. Our results show that the generated datasets are well mimicking the behaviour of real data, for popular data analysis methods are able to selectively identify existing interactions. CONCLUSIONS: The proposed method can be used in conjunction to real biological datasets in the assessment of data mining techniques. The main strength of this method consists in the full control on the simulated data while retaining coherence with the real biological processes. The R package MVBioDataSim is freely available to the scientific community at http://neuronelab.unisa.it/?p=1722. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0577-1) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4448275 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-44482752015-05-30 A multi-view genomic data simulator Fratello, Michele Serra, Angela Fortino, Vittorio Raiconi, Giancarlo Tagliaferri, Roberto Greco, Dario BMC Bioinformatics Research Article BACKGROUND: OMICs technologies allow to assay the state of a large number of different features (e.g., mRNA expression, miRNA expression, copy number variation, DNA methylation, etc.) from the same samples. The objective of these experiments is usually to find a reduced set of significant features, which can be used to differentiate the conditions assayed. In terms of development of novel feature selection computational methods, this task is challenging for the lack of fully annotated biological datasets to be used for benchmarking. A possible way to tackle this problem is generating appropriate synthetic datasets, whose composition and behaviour are fully controlled and known a priori. RESULTS: Here we propose a novel method centred on the generation of networks of interactions among different biological molecules, especially involved in regulating gene expression. Synthetic datasets are obtained from ordinary differential equations based models with known parameters. Our results show that the generated datasets are well mimicking the behaviour of real data, for popular data analysis methods are able to selectively identify existing interactions. CONCLUSIONS: The proposed method can be used in conjunction to real biological datasets in the assessment of data mining techniques. The main strength of this method consists in the full control on the simulated data while retaining coherence with the real biological processes. The R package MVBioDataSim is freely available to the scientific community at http://neuronelab.unisa.it/?p=1722. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0577-1) contains supplementary material, which is available to authorized users. BioMed Central 2015-05-12 /pmc/articles/PMC4448275/ /pubmed/25962835 http://dx.doi.org/10.1186/s12859-015-0577-1 Text en © Fratello et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Fratello, Michele Serra, Angela Fortino, Vittorio Raiconi, Giancarlo Tagliaferri, Roberto Greco, Dario A multi-view genomic data simulator |
title | A multi-view genomic data simulator |
title_full | A multi-view genomic data simulator |
title_fullStr | A multi-view genomic data simulator |
title_full_unstemmed | A multi-view genomic data simulator |
title_short | A multi-view genomic data simulator |
title_sort | multi-view genomic data simulator |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4448275/ https://www.ncbi.nlm.nih.gov/pubmed/25962835 http://dx.doi.org/10.1186/s12859-015-0577-1 |
work_keys_str_mv | AT fratellomichele amultiviewgenomicdatasimulator AT serraangela amultiviewgenomicdatasimulator AT fortinovittorio amultiviewgenomicdatasimulator AT raiconigiancarlo amultiviewgenomicdatasimulator AT tagliaferriroberto amultiviewgenomicdatasimulator AT grecodario amultiviewgenomicdatasimulator AT fratellomichele multiviewgenomicdatasimulator AT serraangela multiviewgenomicdatasimulator AT fortinovittorio multiviewgenomicdatasimulator AT raiconigiancarlo multiviewgenomicdatasimulator AT tagliaferriroberto multiviewgenomicdatasimulator AT grecodario multiviewgenomicdatasimulator |