Cargando…

A new tool called DISSECT for analysing large genomic data sets using a Big Data approach

Large-scale genetic and genomic data are increasingly available and the major bottleneck in their analysis is a lack of sufficiently scalable computational tools. To address this problem in the context of complex traits analysis, we present DISSECT. DISSECT is a new and freely available software tha...

Descripción completa

Detalles Bibliográficos
Autores principales: Canela-Xandri, Oriol, Law, Andy, Gray, Alan, Woolliams, John A., Tenesa, Albert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4682108/
https://www.ncbi.nlm.nih.gov/pubmed/26657010
http://dx.doi.org/10.1038/ncomms10162
_version_ 1782405837808992256
author Canela-Xandri, Oriol
Law, Andy
Gray, Alan
Woolliams, John A.
Tenesa, Albert
author_facet Canela-Xandri, Oriol
Law, Andy
Gray, Alan
Woolliams, John A.
Tenesa, Albert
author_sort Canela-Xandri, Oriol
collection PubMed
description Large-scale genetic and genomic data are increasingly available and the major bottleneck in their analysis is a lack of sufficiently scalable computational tools. To address this problem in the context of complex traits analysis, we present DISSECT. DISSECT is a new and freely available software that is able to exploit the distributed-memory parallel computational architectures of compute clusters, to perform a wide range of genomic and epidemiologic analyses, which currently can only be carried out on reduced sample sizes or under restricted conditions. We demonstrate the usefulness of our new tool by addressing the challenge of predicting phenotypes from genotype data in human populations using mixed-linear model analysis. We analyse simulated traits from 470,000 individuals genotyped for 590,004 SNPs in ∼4 h using the combined computational power of 8,400 processor cores. We find that prediction accuracies in excess of 80% of the theoretical maximum could be achieved with large sample sizes.
format Online
Article
Text
id pubmed-4682108
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-46821082015-12-29 A new tool called DISSECT for analysing large genomic data sets using a Big Data approach Canela-Xandri, Oriol Law, Andy Gray, Alan Woolliams, John A. Tenesa, Albert Nat Commun Article Large-scale genetic and genomic data are increasingly available and the major bottleneck in their analysis is a lack of sufficiently scalable computational tools. To address this problem in the context of complex traits analysis, we present DISSECT. DISSECT is a new and freely available software that is able to exploit the distributed-memory parallel computational architectures of compute clusters, to perform a wide range of genomic and epidemiologic analyses, which currently can only be carried out on reduced sample sizes or under restricted conditions. We demonstrate the usefulness of our new tool by addressing the challenge of predicting phenotypes from genotype data in human populations using mixed-linear model analysis. We analyse simulated traits from 470,000 individuals genotyped for 590,004 SNPs in ∼4 h using the combined computational power of 8,400 processor cores. We find that prediction accuracies in excess of 80% of the theoretical maximum could be achieved with large sample sizes. Nature Publishing Group 2015-12-11 /pmc/articles/PMC4682108/ /pubmed/26657010 http://dx.doi.org/10.1038/ncomms10162 Text en Copyright © 2015, Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved. http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Canela-Xandri, Oriol
Law, Andy
Gray, Alan
Woolliams, John A.
Tenesa, Albert
A new tool called DISSECT for analysing large genomic data sets using a Big Data approach
title A new tool called DISSECT for analysing large genomic data sets using a Big Data approach
title_full A new tool called DISSECT for analysing large genomic data sets using a Big Data approach
title_fullStr A new tool called DISSECT for analysing large genomic data sets using a Big Data approach
title_full_unstemmed A new tool called DISSECT for analysing large genomic data sets using a Big Data approach
title_short A new tool called DISSECT for analysing large genomic data sets using a Big Data approach
title_sort new tool called dissect for analysing large genomic data sets using a big data approach
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4682108/
https://www.ncbi.nlm.nih.gov/pubmed/26657010
http://dx.doi.org/10.1038/ncomms10162
work_keys_str_mv AT canelaxandrioriol anewtoolcalleddissectforanalysinglargegenomicdatasetsusingabigdataapproach
AT lawandy anewtoolcalleddissectforanalysinglargegenomicdatasetsusingabigdataapproach
AT grayalan anewtoolcalleddissectforanalysinglargegenomicdatasetsusingabigdataapproach
AT woolliamsjohna anewtoolcalleddissectforanalysinglargegenomicdatasetsusingabigdataapproach
AT tenesaalbert anewtoolcalleddissectforanalysinglargegenomicdatasetsusingabigdataapproach
AT canelaxandrioriol newtoolcalleddissectforanalysinglargegenomicdatasetsusingabigdataapproach
AT lawandy newtoolcalleddissectforanalysinglargegenomicdatasetsusingabigdataapproach
AT grayalan newtoolcalleddissectforanalysinglargegenomicdatasetsusingabigdataapproach
AT woolliamsjohna newtoolcalleddissectforanalysinglargegenomicdatasetsusingabigdataapproach
AT tenesaalbert newtoolcalleddissectforanalysinglargegenomicdatasetsusingabigdataapproach