Cargando…

Machine learning patterns for neuroimaging-genetic studies in the cloud

Brain imaging is a natural intermediate phenotype to understand the link between genetic information and behavior or brain pathologies risk factors. Massive efforts have been made in the last few years to acquire high-dimensional neuroimaging and genetic data on large cohorts of subjects. The statis...

Descripción completa

Detalles Bibliográficos
Autores principales: Da Mota, Benoit, Tudoran, Radu, Costan, Alexandru, Varoquaux, Gaël, Brasche, Goetz, Conrod, Patricia, Lemaitre, Herve, Paus, Tomas, Rietschel, Marcella, Frouin, Vincent, Poline, Jean-Baptiste, Antoniu, Gabriel, Thirion, Bertrand
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3986524/
https://www.ncbi.nlm.nih.gov/pubmed/24782753
http://dx.doi.org/10.3389/fninf.2014.00031
Descripción
Sumario:Brain imaging is a natural intermediate phenotype to understand the link between genetic information and behavior or brain pathologies risk factors. Massive efforts have been made in the last few years to acquire high-dimensional neuroimaging and genetic data on large cohorts of subjects. The statistical analysis of such data is carried out with increasingly sophisticated techniques and represents a great computational challenge. Fortunately, increasing computational power in distributed architectures can be harnessed, if new neuroinformatics infrastructures are designed and training to use these new tools is provided. Combining a MapReduce framework (TomusBLOB) with machine learning algorithms (Scikit-learn library), we design a scalable analysis tool that can deal with non-parametric statistics on high-dimensional data. End-users describe the statistical procedure to perform and can then test the model on their own computers before running the very same code in the cloud at a larger scale. We illustrate the potential of our approach on real data with an experiment showing how the functional signal in subcortical brain regions can be significantly fit with genome-wide genotypes. This experiment demonstrates the scalability and the reliability of our framework in the cloud with a 2 weeks deployment on hundreds of virtual machines.