Cargando…

Nanopore sequencing data analysis using Microsoft Azure cloud computing service

Genetic information provides insights into the exome, genome, epigenetics and structural organisation of the organism. Given the enormous amount of genetic information, scientists are able to perform mammoth tasks to improve the standard of health care such as determining genetic influences on outco...

Descripción completa

Detalles Bibliográficos
Autores principales: Truong, Linh, Ayora, Felipe, D’Orsogna, Lloyd, Martinez, Patricia, De Santis, Dianne
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9718390/
https://www.ncbi.nlm.nih.gov/pubmed/36459531
http://dx.doi.org/10.1371/journal.pone.0278609
_version_ 1784843081623797760
author Truong, Linh
Ayora, Felipe
D’Orsogna, Lloyd
Martinez, Patricia
De Santis, Dianne
author_facet Truong, Linh
Ayora, Felipe
D’Orsogna, Lloyd
Martinez, Patricia
De Santis, Dianne
author_sort Truong, Linh
collection PubMed
description Genetic information provides insights into the exome, genome, epigenetics and structural organisation of the organism. Given the enormous amount of genetic information, scientists are able to perform mammoth tasks to improve the standard of health care such as determining genetic influences on outcome of allogeneic transplantation. Cloud based computing has increasingly become a key choice for many scientists, engineers and institutions as it offers on-demand network access and users can conveniently rent rather than buy all required computing resources. With the positive advancements of cloud computing and nanopore sequencing data output, we were motivated to develop an automated and scalable analysis pipeline utilizing cloud infrastructure in Microsoft Azure to accelerate HLA genotyping service and improve the efficiency of the workflow at lower cost. In this study, we describe (i) the selection process for suitable virtual machine sizes for computing resources to balance between the best performance versus cost effectiveness; (ii) the building of Docker containers to include all tools in the cloud computational environment; (iii) the comparison of HLA genotype concordance between the in-house manual method and the automated cloud-based pipeline to assess data accuracy. In conclusion, the Microsoft Azure cloud based data analysis pipeline was shown to meet all the key imperatives for performance, cost, usability, simplicity and accuracy. Importantly, the pipeline allows for the on-going maintenance and testing of version changes before implementation. This pipeline is suitable for the data analysis from MinION sequencing platform and could be adopted for other data analysis application processes.
format Online
Article
Text
id pubmed-9718390
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-97183902022-12-03 Nanopore sequencing data analysis using Microsoft Azure cloud computing service Truong, Linh Ayora, Felipe D’Orsogna, Lloyd Martinez, Patricia De Santis, Dianne PLoS One Lab Protocol Genetic information provides insights into the exome, genome, epigenetics and structural organisation of the organism. Given the enormous amount of genetic information, scientists are able to perform mammoth tasks to improve the standard of health care such as determining genetic influences on outcome of allogeneic transplantation. Cloud based computing has increasingly become a key choice for many scientists, engineers and institutions as it offers on-demand network access and users can conveniently rent rather than buy all required computing resources. With the positive advancements of cloud computing and nanopore sequencing data output, we were motivated to develop an automated and scalable analysis pipeline utilizing cloud infrastructure in Microsoft Azure to accelerate HLA genotyping service and improve the efficiency of the workflow at lower cost. In this study, we describe (i) the selection process for suitable virtual machine sizes for computing resources to balance between the best performance versus cost effectiveness; (ii) the building of Docker containers to include all tools in the cloud computational environment; (iii) the comparison of HLA genotype concordance between the in-house manual method and the automated cloud-based pipeline to assess data accuracy. In conclusion, the Microsoft Azure cloud based data analysis pipeline was shown to meet all the key imperatives for performance, cost, usability, simplicity and accuracy. Importantly, the pipeline allows for the on-going maintenance and testing of version changes before implementation. This pipeline is suitable for the data analysis from MinION sequencing platform and could be adopted for other data analysis application processes. Public Library of Science 2022-12-02 /pmc/articles/PMC9718390/ /pubmed/36459531 http://dx.doi.org/10.1371/journal.pone.0278609 Text en © 2022 Truong et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Lab Protocol
Truong, Linh
Ayora, Felipe
D’Orsogna, Lloyd
Martinez, Patricia
De Santis, Dianne
Nanopore sequencing data analysis using Microsoft Azure cloud computing service
title Nanopore sequencing data analysis using Microsoft Azure cloud computing service
title_full Nanopore sequencing data analysis using Microsoft Azure cloud computing service
title_fullStr Nanopore sequencing data analysis using Microsoft Azure cloud computing service
title_full_unstemmed Nanopore sequencing data analysis using Microsoft Azure cloud computing service
title_short Nanopore sequencing data analysis using Microsoft Azure cloud computing service
title_sort nanopore sequencing data analysis using microsoft azure cloud computing service
topic Lab Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9718390/
https://www.ncbi.nlm.nih.gov/pubmed/36459531
http://dx.doi.org/10.1371/journal.pone.0278609
work_keys_str_mv AT truonglinh nanoporesequencingdataanalysisusingmicrosoftazurecloudcomputingservice
AT ayorafelipe nanoporesequencingdataanalysisusingmicrosoftazurecloudcomputingservice
AT dorsognalloyd nanoporesequencingdataanalysisusingmicrosoftazurecloudcomputingservice
AT martinezpatricia nanoporesequencingdataanalysisusingmicrosoftazurecloudcomputingservice
AT desantisdianne nanoporesequencingdataanalysisusingmicrosoftazurecloudcomputingservice