Cargando…

From Sequencer to Supercomputer: An Automatic Pipeline for Managing and Processing Next Generation Sequencing Data

Next Generation Sequencing is highly resource intensive. NGS Tasks related to data processing, management and analysis require high-end computing servers or even clusters. Additionally, processing NGS experiments requires suitable storage space and significant manual interaction. At The Ohio State U...

Descripción completa

Detalles Bibliográficos
Autores principales: Camerlengo, Terry, Ozer, Hatice Gulcin, Onti-Srinivasan, Raghuram, Yan, Pearlly, Huang, Tim, Parvin, Jeffrey, Huang, Kun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Medical Informatics Association 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3392054/
https://www.ncbi.nlm.nih.gov/pubmed/22779037
_version_ 1782237587306446848
author Camerlengo, Terry
Ozer, Hatice Gulcin
Onti-Srinivasan, Raghuram
Yan, Pearlly
Huang, Tim
Parvin, Jeffrey
Huang, Kun
author_facet Camerlengo, Terry
Ozer, Hatice Gulcin
Onti-Srinivasan, Raghuram
Yan, Pearlly
Huang, Tim
Parvin, Jeffrey
Huang, Kun
author_sort Camerlengo, Terry
collection PubMed
description Next Generation Sequencing is highly resource intensive. NGS Tasks related to data processing, management and analysis require high-end computing servers or even clusters. Additionally, processing NGS experiments requires suitable storage space and significant manual interaction. At The Ohio State University's Biomedical Informatics Shared Resource, we designed and implemented a scalable architecture to address the challenges associated with the resource intensive nature of NGS secondary analysis built around Illumina Genome Analyzer II sequencers and Illumina’s Gerald data processing pipeline. The software infrastructure includes a distributed computing platform consisting of a LIMS called QUEST (http://bisr.osumc.edu), an Automation Server, a computer cluster for processing NGS pipelines, and a network attached storage device expandable up to 40TB. The system has been architected to scale to multiple sequencers without requiring additional computing or labor resources. This platform provides demonstrates how to manage and automate NGS experiments in an institutional or core facility setting.
format Online
Article
Text
id pubmed-3392054
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher American Medical Informatics Association
record_format MEDLINE/PubMed
spelling pubmed-33920542012-07-09 From Sequencer to Supercomputer: An Automatic Pipeline for Managing and Processing Next Generation Sequencing Data Camerlengo, Terry Ozer, Hatice Gulcin Onti-Srinivasan, Raghuram Yan, Pearlly Huang, Tim Parvin, Jeffrey Huang, Kun AMIA Jt Summits Transl Sci Proc Articles Next Generation Sequencing is highly resource intensive. NGS Tasks related to data processing, management and analysis require high-end computing servers or even clusters. Additionally, processing NGS experiments requires suitable storage space and significant manual interaction. At The Ohio State University's Biomedical Informatics Shared Resource, we designed and implemented a scalable architecture to address the challenges associated with the resource intensive nature of NGS secondary analysis built around Illumina Genome Analyzer II sequencers and Illumina’s Gerald data processing pipeline. The software infrastructure includes a distributed computing platform consisting of a LIMS called QUEST (http://bisr.osumc.edu), an Automation Server, a computer cluster for processing NGS pipelines, and a network attached storage device expandable up to 40TB. The system has been architected to scale to multiple sequencers without requiring additional computing or labor resources. This platform provides demonstrates how to manage and automate NGS experiments in an institutional or core facility setting. American Medical Informatics Association 2012-03-19 /pmc/articles/PMC3392054/ /pubmed/22779037 Text en ©2012 AMIA - All rights reserved. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose
spellingShingle Articles
Camerlengo, Terry
Ozer, Hatice Gulcin
Onti-Srinivasan, Raghuram
Yan, Pearlly
Huang, Tim
Parvin, Jeffrey
Huang, Kun
From Sequencer to Supercomputer: An Automatic Pipeline for Managing and Processing Next Generation Sequencing Data
title From Sequencer to Supercomputer: An Automatic Pipeline for Managing and Processing Next Generation Sequencing Data
title_full From Sequencer to Supercomputer: An Automatic Pipeline for Managing and Processing Next Generation Sequencing Data
title_fullStr From Sequencer to Supercomputer: An Automatic Pipeline for Managing and Processing Next Generation Sequencing Data
title_full_unstemmed From Sequencer to Supercomputer: An Automatic Pipeline for Managing and Processing Next Generation Sequencing Data
title_short From Sequencer to Supercomputer: An Automatic Pipeline for Managing and Processing Next Generation Sequencing Data
title_sort from sequencer to supercomputer: an automatic pipeline for managing and processing next generation sequencing data
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3392054/
https://www.ncbi.nlm.nih.gov/pubmed/22779037
work_keys_str_mv AT camerlengoterry fromsequencertosupercomputeranautomaticpipelineformanagingandprocessingnextgenerationsequencingdata
AT ozerhaticegulcin fromsequencertosupercomputeranautomaticpipelineformanagingandprocessingnextgenerationsequencingdata
AT ontisrinivasanraghuram fromsequencertosupercomputeranautomaticpipelineformanagingandprocessingnextgenerationsequencingdata
AT yanpearlly fromsequencertosupercomputeranautomaticpipelineformanagingandprocessingnextgenerationsequencingdata
AT huangtim fromsequencertosupercomputeranautomaticpipelineformanagingandprocessingnextgenerationsequencingdata
AT parvinjeffrey fromsequencertosupercomputeranautomaticpipelineformanagingandprocessingnextgenerationsequencingdata
AT huangkun fromsequencertosupercomputeranautomaticpipelineformanagingandprocessingnextgenerationsequencingdata