Cargando…

Practical guide for managing large-scale human genome data in research

Studies in human genetics deal with a plethora of human genome sequencing data that are generated from specimens as well as available on public domains. With the development of various bioinformatics applications, maintaining the productivity of research, managing human genome data, and analyzing do...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tanjo, Tomoya, Kawai, Yosuke, Tokunaga, Katsushi, Ogasawara, Osamu, Nagasaki, Masao
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer Singapore 2020
Materias:	Review Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7728600/ https://www.ncbi.nlm.nih.gov/pubmed/33097812 http://dx.doi.org/10.1038/s10038-020-00862-1

_version_	1783621308429369344
author	Tanjo, Tomoya Kawai, Yosuke Tokunaga, Katsushi Ogasawara, Osamu Nagasaki, Masao
author_facet	Tanjo, Tomoya Kawai, Yosuke Tokunaga, Katsushi Ogasawara, Osamu Nagasaki, Masao
author_sort	Tanjo, Tomoya
collection	PubMed
description	Studies in human genetics deal with a plethora of human genome sequencing data that are generated from specimens as well as available on public domains. With the development of various bioinformatics applications, maintaining the productivity of research, managing human genome data, and analyzing downstream data is essential. This review aims to guide struggling researchers to process and analyze these large-scale genomic data to extract relevant information for improved downstream analyses. Here, we discuss worldwide human genome projects that could be integrated into any data for improved analysis. Obtaining human whole-genome sequencing data from both data stores and processes is costly; therefore, we focus on the development of data format and software that manipulate whole-genome sequencing. Once the sequencing is complete and its format and data processing tools are selected, a computational platform is required. For the platform, we describe a multi-cloud strategy that balances between cost, performance, and customizability. A good quality published research relies on data reproducibility to ensure quality results, reusability for applications to other datasets, as well as scalability for the future increase of datasets. To solve these, we describe several key technologies developed in computer science, including workflow engine. We also discuss the ethical guidelines inevitable for human genomic data analysis that differ from model organisms. Finally, the future ideal perspective of data processing and analysis is summarized.
format	Online Article Text
id	pubmed-7728600
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Springer Singapore
record_format	MEDLINE/PubMed
spelling	pubmed-77286002020-12-17 Practical guide for managing large-scale human genome data in research Tanjo, Tomoya Kawai, Yosuke Tokunaga, Katsushi Ogasawara, Osamu Nagasaki, Masao J Hum Genet Review Article Studies in human genetics deal with a plethora of human genome sequencing data that are generated from specimens as well as available on public domains. With the development of various bioinformatics applications, maintaining the productivity of research, managing human genome data, and analyzing downstream data is essential. This review aims to guide struggling researchers to process and analyze these large-scale genomic data to extract relevant information for improved downstream analyses. Here, we discuss worldwide human genome projects that could be integrated into any data for improved analysis. Obtaining human whole-genome sequencing data from both data stores and processes is costly; therefore, we focus on the development of data format and software that manipulate whole-genome sequencing. Once the sequencing is complete and its format and data processing tools are selected, a computational platform is required. For the platform, we describe a multi-cloud strategy that balances between cost, performance, and customizability. A good quality published research relies on data reproducibility to ensure quality results, reusability for applications to other datasets, as well as scalability for the future increase of datasets. To solve these, we describe several key technologies developed in computer science, including workflow engine. We also discuss the ethical guidelines inevitable for human genomic data analysis that differ from model organisms. Finally, the future ideal perspective of data processing and analysis is summarized. Springer Singapore 2020-10-23 2021 /pmc/articles/PMC7728600/ /pubmed/33097812 http://dx.doi.org/10.1038/s10038-020-00862-1 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle	Review Article Tanjo, Tomoya Kawai, Yosuke Tokunaga, Katsushi Ogasawara, Osamu Nagasaki, Masao Practical guide for managing large-scale human genome data in research
title	Practical guide for managing large-scale human genome data in research
title_full	Practical guide for managing large-scale human genome data in research
title_fullStr	Practical guide for managing large-scale human genome data in research
title_full_unstemmed	Practical guide for managing large-scale human genome data in research
title_short	Practical guide for managing large-scale human genome data in research
title_sort	practical guide for managing large-scale human genome data in research
topic	Review Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7728600/ https://www.ncbi.nlm.nih.gov/pubmed/33097812 http://dx.doi.org/10.1038/s10038-020-00862-1
work_keys_str_mv	AT tanjotomoya practicalguideformanaginglargescalehumangenomedatainresearch AT kawaiyosuke practicalguideformanaginglargescalehumangenomedatainresearch AT tokunagakatsushi practicalguideformanaginglargescalehumangenomedatainresearch AT ogasawaraosamu practicalguideformanaginglargescalehumangenomedatainresearch AT nagasakimasao practicalguideformanaginglargescalehumangenomedatainresearch

Practical guide for managing large-scale human genome data in research

Ejemplares similares