Cargando…

Stepwise Distributed Open Innovation Contests for Software Development: Acceleration of Genome-Wide Association Analysis

Background: The association of differing genotypes with disease-related phenotypic traits offers great potential to both help identify new therapeutic targets and support stratification of patients who would gain the greatest benefit from specific drug classes. Development of low-cost genotyping and...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hill, Andrew, Loh, Po-Ru, Bharadwaj, Ragu B., Pons, Pascal, Shang, Jingbo, Guinan, Eva, Lakhani, Karim, Kilty, Iain, Jelinsky, Scott A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2017
Materias:	Technical Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5467032/ https://www.ncbi.nlm.nih.gov/pubmed/28327993 http://dx.doi.org/10.1093/gigascience/gix009

_version_	1783243199672745984
author	Hill, Andrew Loh, Po-Ru Bharadwaj, Ragu B. Pons, Pascal Shang, Jingbo Guinan, Eva Lakhani, Karim Kilty, Iain Jelinsky, Scott A.
author_facet	Hill, Andrew Loh, Po-Ru Bharadwaj, Ragu B. Pons, Pascal Shang, Jingbo Guinan, Eva Lakhani, Karim Kilty, Iain Jelinsky, Scott A.
author_sort	Hill, Andrew
collection	PubMed
description	Background: The association of differing genotypes with disease-related phenotypic traits offers great potential to both help identify new therapeutic targets and support stratification of patients who would gain the greatest benefit from specific drug classes. Development of low-cost genotyping and sequencing has made collecting large-scale genotyping data routine in population and therapeutic intervention studies. In addition, a range of new technologies is being used to capture numerous new and complex phenotypic descriptors. As a result, genotype and phenotype datasets have grown exponentially. Genome-wide association studies associate genotypes and phenotypes using methods such as logistic regression. As existing tools for association analysis limit the efficiency by which value can be extracted from increasing volumes of data, there is a pressing need for new software tools that can accelerate association analyses on large genotype-phenotype datasets. Results: Using open innovation (OI) and contest-based crowdsourcing, the logistic regression analysis in a leading, community-standard genetics software package (PLINK 1.07) was substantially accelerated. OI allowed us to do this in <6 months by providing rapid access to highly skilled programmers with specialized, difficult-to-find skill sets. Through a crowd-based contest a combination of computational, numeric, and algorithmic approaches was identified that accelerated the logistic regression in PLINK 1.07 by 18- to 45-fold. Combining contest-derived logistic regression code with coarse-grained parallelization, multithreading, and associated changes to data initialization code further developed through distributed innovation, we achieved an end-to-end speedup of 591-fold for a data set size of 6678 subjects by 645 863 variants, compared to PLINK 1.07's logistic regression. This represents a reduction in run time from 4.8 hours to 29 seconds. Accelerated logistic regression code developed in this project has been incorporated into the PLINK2 project. Conclusions: Using iterative competition-based OI, we have developed a new, faster implementation of logistic regression for genome-wide association studies analysis. We present lessons learned and recommendations on running a successful OI process for bioinformatics.
format	Online Article Text
id	pubmed-5467032
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-54670322017-06-19 Stepwise Distributed Open Innovation Contests for Software Development: Acceleration of Genome-Wide Association Analysis Hill, Andrew Loh, Po-Ru Bharadwaj, Ragu B. Pons, Pascal Shang, Jingbo Guinan, Eva Lakhani, Karim Kilty, Iain Jelinsky, Scott A. Gigascience Technical Note Background: The association of differing genotypes with disease-related phenotypic traits offers great potential to both help identify new therapeutic targets and support stratification of patients who would gain the greatest benefit from specific drug classes. Development of low-cost genotyping and sequencing has made collecting large-scale genotyping data routine in population and therapeutic intervention studies. In addition, a range of new technologies is being used to capture numerous new and complex phenotypic descriptors. As a result, genotype and phenotype datasets have grown exponentially. Genome-wide association studies associate genotypes and phenotypes using methods such as logistic regression. As existing tools for association analysis limit the efficiency by which value can be extracted from increasing volumes of data, there is a pressing need for new software tools that can accelerate association analyses on large genotype-phenotype datasets. Results: Using open innovation (OI) and contest-based crowdsourcing, the logistic regression analysis in a leading, community-standard genetics software package (PLINK 1.07) was substantially accelerated. OI allowed us to do this in <6 months by providing rapid access to highly skilled programmers with specialized, difficult-to-find skill sets. Through a crowd-based contest a combination of computational, numeric, and algorithmic approaches was identified that accelerated the logistic regression in PLINK 1.07 by 18- to 45-fold. Combining contest-derived logistic regression code with coarse-grained parallelization, multithreading, and associated changes to data initialization code further developed through distributed innovation, we achieved an end-to-end speedup of 591-fold for a data set size of 6678 subjects by 645 863 variants, compared to PLINK 1.07's logistic regression. This represents a reduction in run time from 4.8 hours to 29 seconds. Accelerated logistic regression code developed in this project has been incorporated into the PLINK2 project. Conclusions: Using iterative competition-based OI, we have developed a new, faster implementation of logistic regression for genome-wide association studies analysis. We present lessons learned and recommendations on running a successful OI process for bioinformatics. Oxford University Press 2017-02-28 /pmc/articles/PMC5467032/ /pubmed/28327993 http://dx.doi.org/10.1093/gigascience/gix009 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Technical Note Hill, Andrew Loh, Po-Ru Bharadwaj, Ragu B. Pons, Pascal Shang, Jingbo Guinan, Eva Lakhani, Karim Kilty, Iain Jelinsky, Scott A. Stepwise Distributed Open Innovation Contests for Software Development: Acceleration of Genome-Wide Association Analysis
title	Stepwise Distributed Open Innovation Contests for Software Development: Acceleration of Genome-Wide Association Analysis
title_full	Stepwise Distributed Open Innovation Contests for Software Development: Acceleration of Genome-Wide Association Analysis
title_fullStr	Stepwise Distributed Open Innovation Contests for Software Development: Acceleration of Genome-Wide Association Analysis
title_full_unstemmed	Stepwise Distributed Open Innovation Contests for Software Development: Acceleration of Genome-Wide Association Analysis
title_short	Stepwise Distributed Open Innovation Contests for Software Development: Acceleration of Genome-Wide Association Analysis
title_sort	stepwise distributed open innovation contests for software development: acceleration of genome-wide association analysis
topic	Technical Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5467032/ https://www.ncbi.nlm.nih.gov/pubmed/28327993 http://dx.doi.org/10.1093/gigascience/gix009
work_keys_str_mv	AT hillandrew stepwisedistributedopeninnovationcontestsforsoftwaredevelopmentaccelerationofgenomewideassociationanalysis AT lohporu stepwisedistributedopeninnovationcontestsforsoftwaredevelopmentaccelerationofgenomewideassociationanalysis AT bharadwajragub stepwisedistributedopeninnovationcontestsforsoftwaredevelopmentaccelerationofgenomewideassociationanalysis AT ponspascal stepwisedistributedopeninnovationcontestsforsoftwaredevelopmentaccelerationofgenomewideassociationanalysis AT shangjingbo stepwisedistributedopeninnovationcontestsforsoftwaredevelopmentaccelerationofgenomewideassociationanalysis AT guinaneva stepwisedistributedopeninnovationcontestsforsoftwaredevelopmentaccelerationofgenomewideassociationanalysis AT lakhanikarim stepwisedistributedopeninnovationcontestsforsoftwaredevelopmentaccelerationofgenomewideassociationanalysis AT kiltyiain stepwisedistributedopeninnovationcontestsforsoftwaredevelopmentaccelerationofgenomewideassociationanalysis AT jelinskyscotta stepwisedistributedopeninnovationcontestsforsoftwaredevelopmentaccelerationofgenomewideassociationanalysis

Stepwise Distributed Open Innovation Contests for Software Development: Acceleration of Genome-Wide Association Analysis

Ejemplares similares