Cargando…

Odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data

BACKGROUND: Genome imputation, admixture resolution and genome-wide association analyses are timely and computationally intensive processes with many composite and requisite steps. Analysis time increases further when building and installing the run programs required for these analyses. For scientis...

Descripción completa

Detalles Bibliográficos
Autores principales:	Eller, Ryan J., Janga, Sarath C., Walsh, Susan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6599316/ https://www.ncbi.nlm.nih.gov/pubmed/31253090 http://dx.doi.org/10.1186/s12859-019-2964-5

_version_	1783430938426867712
author	Eller, Ryan J. Janga, Sarath C. Walsh, Susan
author_facet	Eller, Ryan J. Janga, Sarath C. Walsh, Susan
author_sort	Eller, Ryan J.
collection	PubMed
description	BACKGROUND: Genome imputation, admixture resolution and genome-wide association analyses are timely and computationally intensive processes with many composite and requisite steps. Analysis time increases further when building and installing the run programs required for these analyses. For scientists that may not be as versed in programing language, but want to perform these operations hands on, there is a lengthy learning curve to utilize the vast number of programs available for these analyses. RESULTS: In an effort to streamline the entire process with easy-to-use steps for scientists working with big data, the Odyssey pipeline was developed. Odyssey is a simplified, efficient, semi-automated genome-wide imputation and analysis pipeline, which prepares raw genetic data, performs pre-imputation quality control, phasing, imputation, post-imputation quality control, population stratification analysis, and genome-wide association with statistical data analysis, including result visualization. Odyssey is a pipeline that integrates programs such as PLINK, SHAPEIT, Eagle, IMPUTE, Minimac, and several R packages, to create a seamless, easy-to-use, and modular workflow controlled via a single user-friendly configuration file. Odyssey was built with compatibility in mind, and thus utilizes the Singularity container solution, which can be run on Linux, MacOS, and Windows platforms. It is also easily scalable from a simple desktop to a High-Performance System (HPS). CONCLUSION: Odyssey facilitates efficient and fast genome-wide association analysis automation and can go from raw genetic data to genome: phenome association visualization and analyses results in 3–8 h on average, depending on the input data, choice of programs within the pipeline and available computer resources. Odyssey was built to be flexible, portable, compatible, scalable, and easy to setup. Biologists less familiar with programing can now work hands on with their own big data using this easy-to-use pipeline. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2964-5) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6599316
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-65993162019-07-11 Odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data Eller, Ryan J. Janga, Sarath C. Walsh, Susan BMC Bioinformatics Methodology Article BACKGROUND: Genome imputation, admixture resolution and genome-wide association analyses are timely and computationally intensive processes with many composite and requisite steps. Analysis time increases further when building and installing the run programs required for these analyses. For scientists that may not be as versed in programing language, but want to perform these operations hands on, there is a lengthy learning curve to utilize the vast number of programs available for these analyses. RESULTS: In an effort to streamline the entire process with easy-to-use steps for scientists working with big data, the Odyssey pipeline was developed. Odyssey is a simplified, efficient, semi-automated genome-wide imputation and analysis pipeline, which prepares raw genetic data, performs pre-imputation quality control, phasing, imputation, post-imputation quality control, population stratification analysis, and genome-wide association with statistical data analysis, including result visualization. Odyssey is a pipeline that integrates programs such as PLINK, SHAPEIT, Eagle, IMPUTE, Minimac, and several R packages, to create a seamless, easy-to-use, and modular workflow controlled via a single user-friendly configuration file. Odyssey was built with compatibility in mind, and thus utilizes the Singularity container solution, which can be run on Linux, MacOS, and Windows platforms. It is also easily scalable from a simple desktop to a High-Performance System (HPS). CONCLUSION: Odyssey facilitates efficient and fast genome-wide association analysis automation and can go from raw genetic data to genome: phenome association visualization and analyses results in 3–8 h on average, depending on the input data, choice of programs within the pipeline and available computer resources. Odyssey was built to be flexible, portable, compatible, scalable, and easy to setup. Biologists less familiar with programing can now work hands on with their own big data using this easy-to-use pipeline. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2964-5) contains supplementary material, which is available to authorized users. BioMed Central 2019-06-28 /pmc/articles/PMC6599316/ /pubmed/31253090 http://dx.doi.org/10.1186/s12859-019-2964-5 Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Eller, Ryan J. Janga, Sarath C. Walsh, Susan Odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data
title	Odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data
title_full	Odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data
title_fullStr	Odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data
title_full_unstemmed	Odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data
title_short	Odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data
title_sort	odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6599316/ https://www.ncbi.nlm.nih.gov/pubmed/31253090 http://dx.doi.org/10.1186/s12859-019-2964-5
work_keys_str_mv	AT ellerryanj odysseyasemiautomatedpipelineforphasingimputationandanalysisofgenomewidegeneticdata AT jangasarathc odysseyasemiautomatedpipelineforphasingimputationandanalysisofgenomewidegeneticdata AT walshsusan odysseyasemiautomatedpipelineforphasingimputationandanalysisofgenomewidegeneticdata

Odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data

Ejemplares similares