Cargando…

A multi-view genomic data simulator

BACKGROUND: OMICs technologies allow to assay the state of a large number of different features (e.g., mRNA expression, miRNA expression, copy number variation, DNA methylation, etc.) from the same samples. The objective of these experiments is usually to find a reduced set of significant features,...

Descripción completa

Detalles Bibliográficos
Autores principales: Fratello, Michele, Serra, Angela, Fortino, Vittorio, Raiconi, Giancarlo, Tagliaferri, Roberto, Greco, Dario
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4448275/
https://www.ncbi.nlm.nih.gov/pubmed/25962835
http://dx.doi.org/10.1186/s12859-015-0577-1
_version_ 1782373685293744128
author Fratello, Michele
Serra, Angela
Fortino, Vittorio
Raiconi, Giancarlo
Tagliaferri, Roberto
Greco, Dario
author_facet Fratello, Michele
Serra, Angela
Fortino, Vittorio
Raiconi, Giancarlo
Tagliaferri, Roberto
Greco, Dario
author_sort Fratello, Michele
collection PubMed
description BACKGROUND: OMICs technologies allow to assay the state of a large number of different features (e.g., mRNA expression, miRNA expression, copy number variation, DNA methylation, etc.) from the same samples. The objective of these experiments is usually to find a reduced set of significant features, which can be used to differentiate the conditions assayed. In terms of development of novel feature selection computational methods, this task is challenging for the lack of fully annotated biological datasets to be used for benchmarking. A possible way to tackle this problem is generating appropriate synthetic datasets, whose composition and behaviour are fully controlled and known a priori. RESULTS: Here we propose a novel method centred on the generation of networks of interactions among different biological molecules, especially involved in regulating gene expression. Synthetic datasets are obtained from ordinary differential equations based models with known parameters. Our results show that the generated datasets are well mimicking the behaviour of real data, for popular data analysis methods are able to selectively identify existing interactions. CONCLUSIONS: The proposed method can be used in conjunction to real biological datasets in the assessment of data mining techniques. The main strength of this method consists in the full control on the simulated data while retaining coherence with the real biological processes. The R package MVBioDataSim is freely available to the scientific community at http://neuronelab.unisa.it/?p=1722. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0577-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4448275
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44482752015-05-30 A multi-view genomic data simulator Fratello, Michele Serra, Angela Fortino, Vittorio Raiconi, Giancarlo Tagliaferri, Roberto Greco, Dario BMC Bioinformatics Research Article BACKGROUND: OMICs technologies allow to assay the state of a large number of different features (e.g., mRNA expression, miRNA expression, copy number variation, DNA methylation, etc.) from the same samples. The objective of these experiments is usually to find a reduced set of significant features, which can be used to differentiate the conditions assayed. In terms of development of novel feature selection computational methods, this task is challenging for the lack of fully annotated biological datasets to be used for benchmarking. A possible way to tackle this problem is generating appropriate synthetic datasets, whose composition and behaviour are fully controlled and known a priori. RESULTS: Here we propose a novel method centred on the generation of networks of interactions among different biological molecules, especially involved in regulating gene expression. Synthetic datasets are obtained from ordinary differential equations based models with known parameters. Our results show that the generated datasets are well mimicking the behaviour of real data, for popular data analysis methods are able to selectively identify existing interactions. CONCLUSIONS: The proposed method can be used in conjunction to real biological datasets in the assessment of data mining techniques. The main strength of this method consists in the full control on the simulated data while retaining coherence with the real biological processes. The R package MVBioDataSim is freely available to the scientific community at http://neuronelab.unisa.it/?p=1722. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0577-1) contains supplementary material, which is available to authorized users. BioMed Central 2015-05-12 /pmc/articles/PMC4448275/ /pubmed/25962835 http://dx.doi.org/10.1186/s12859-015-0577-1 Text en © Fratello et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Fratello, Michele
Serra, Angela
Fortino, Vittorio
Raiconi, Giancarlo
Tagliaferri, Roberto
Greco, Dario
A multi-view genomic data simulator
title A multi-view genomic data simulator
title_full A multi-view genomic data simulator
title_fullStr A multi-view genomic data simulator
title_full_unstemmed A multi-view genomic data simulator
title_short A multi-view genomic data simulator
title_sort multi-view genomic data simulator
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4448275/
https://www.ncbi.nlm.nih.gov/pubmed/25962835
http://dx.doi.org/10.1186/s12859-015-0577-1
work_keys_str_mv AT fratellomichele amultiviewgenomicdatasimulator
AT serraangela amultiviewgenomicdatasimulator
AT fortinovittorio amultiviewgenomicdatasimulator
AT raiconigiancarlo amultiviewgenomicdatasimulator
AT tagliaferriroberto amultiviewgenomicdatasimulator
AT grecodario amultiviewgenomicdatasimulator
AT fratellomichele multiviewgenomicdatasimulator
AT serraangela multiviewgenomicdatasimulator
AT fortinovittorio multiviewgenomicdatasimulator
AT raiconigiancarlo multiviewgenomicdatasimulator
AT tagliaferriroberto multiviewgenomicdatasimulator
AT grecodario multiviewgenomicdatasimulator