Cargando…

aws-s3-integrity-check: an open-source bash tool to verify the integrity of a dataset stored on Amazon S3

Amazon Simple Storage Service (Amazon S3) is a widely used platform for storing large biomedical datasets. Unintended data alterations can occur during data writing and transmission, altering the original content and generating unexpected results. However, no open-source and easy-to-use tool exists...

Descripción completa

Detalles Bibliográficos
Autores principales: García-Ruiz, Sonia, Reynolds, Regina Hertfelder, Grant-Peters, Melissa, Gustavsson, Emil Karl, Fairbrother-Browne, Aine, Chen, Zhongbo, Brenton, Jonathan William, Ryten, Mina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: GigaScience Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10448181/
https://www.ncbi.nlm.nih.gov/pubmed/37637773
http://dx.doi.org/10.46471/gigabyte.87
_version_ 1785094673540317184
author García-Ruiz, Sonia
Reynolds, Regina Hertfelder
Grant-Peters, Melissa
Gustavsson, Emil Karl
Fairbrother-Browne, Aine
Chen, Zhongbo
Brenton, Jonathan William
Ryten, Mina
author_facet García-Ruiz, Sonia
Reynolds, Regina Hertfelder
Grant-Peters, Melissa
Gustavsson, Emil Karl
Fairbrother-Browne, Aine
Chen, Zhongbo
Brenton, Jonathan William
Ryten, Mina
author_sort García-Ruiz, Sonia
collection PubMed
description Amazon Simple Storage Service (Amazon S3) is a widely used platform for storing large biomedical datasets. Unintended data alterations can occur during data writing and transmission, altering the original content and generating unexpected results. However, no open-source and easy-to-use tool exists to verify end-to-end data integrity. Here, we present aws-s3-integrity-check, a user-friendly, lightweight, and reliable bash tool to verify the integrity of a dataset stored in an Amazon S3 bucket. Using this tool, we only needed ∼114 min to verify the integrity of 1,045 records ranging between 5 bytes and 10 gigabytes and occupying ∼935 gigabytes of the Amazon S3 cloud. Our aws-s3-integrity-check tool also provides file-by-file on-screen and log-file-based information about the status of each integrity check. To our knowledge, this tool is the only open-source one that allows verifying the integrity of a dataset uploaded to the Amazon S3 Storage quickly, reliably, and efficiently. The tool is freely available for download and use at https://github.com/SoniaRuiz/aws-s3-integrity-check and https://hub.docker.com/r/soniaruiz/aws-s3-integrity-check.
format Online
Article
Text
id pubmed-10448181
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher GigaScience Press
record_format MEDLINE/PubMed
spelling pubmed-104481812023-08-25 aws-s3-integrity-check: an open-source bash tool to verify the integrity of a dataset stored on Amazon S3 García-Ruiz, Sonia Reynolds, Regina Hertfelder Grant-Peters, Melissa Gustavsson, Emil Karl Fairbrother-Browne, Aine Chen, Zhongbo Brenton, Jonathan William Ryten, Mina GigaByte Technical Release Amazon Simple Storage Service (Amazon S3) is a widely used platform for storing large biomedical datasets. Unintended data alterations can occur during data writing and transmission, altering the original content and generating unexpected results. However, no open-source and easy-to-use tool exists to verify end-to-end data integrity. Here, we present aws-s3-integrity-check, a user-friendly, lightweight, and reliable bash tool to verify the integrity of a dataset stored in an Amazon S3 bucket. Using this tool, we only needed ∼114 min to verify the integrity of 1,045 records ranging between 5 bytes and 10 gigabytes and occupying ∼935 gigabytes of the Amazon S3 cloud. Our aws-s3-integrity-check tool also provides file-by-file on-screen and log-file-based information about the status of each integrity check. To our knowledge, this tool is the only open-source one that allows verifying the integrity of a dataset uploaded to the Amazon S3 Storage quickly, reliably, and efficiently. The tool is freely available for download and use at https://github.com/SoniaRuiz/aws-s3-integrity-check and https://hub.docker.com/r/soniaruiz/aws-s3-integrity-check. GigaScience Press 2023-08-23 /pmc/articles/PMC10448181/ /pubmed/37637773 http://dx.doi.org/10.46471/gigabyte.87 Text en © The Author(s) 2023. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Release
García-Ruiz, Sonia
Reynolds, Regina Hertfelder
Grant-Peters, Melissa
Gustavsson, Emil Karl
Fairbrother-Browne, Aine
Chen, Zhongbo
Brenton, Jonathan William
Ryten, Mina
aws-s3-integrity-check: an open-source bash tool to verify the integrity of a dataset stored on Amazon S3
title aws-s3-integrity-check: an open-source bash tool to verify the integrity of a dataset stored on Amazon S3
title_full aws-s3-integrity-check: an open-source bash tool to verify the integrity of a dataset stored on Amazon S3
title_fullStr aws-s3-integrity-check: an open-source bash tool to verify the integrity of a dataset stored on Amazon S3
title_full_unstemmed aws-s3-integrity-check: an open-source bash tool to verify the integrity of a dataset stored on Amazon S3
title_short aws-s3-integrity-check: an open-source bash tool to verify the integrity of a dataset stored on Amazon S3
title_sort aws-s3-integrity-check: an open-source bash tool to verify the integrity of a dataset stored on amazon s3
topic Technical Release
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10448181/
https://www.ncbi.nlm.nih.gov/pubmed/37637773
http://dx.doi.org/10.46471/gigabyte.87
work_keys_str_mv AT garciaruizsonia awss3integritycheckanopensourcebashtooltoverifytheintegrityofadatasetstoredonamazons3
AT reynoldsreginahertfelder awss3integritycheckanopensourcebashtooltoverifytheintegrityofadatasetstoredonamazons3
AT grantpetersmelissa awss3integritycheckanopensourcebashtooltoverifytheintegrityofadatasetstoredonamazons3
AT gustavssonemilkarl awss3integritycheckanopensourcebashtooltoverifytheintegrityofadatasetstoredonamazons3
AT fairbrotherbrowneaine awss3integritycheckanopensourcebashtooltoverifytheintegrityofadatasetstoredonamazons3
AT chenzhongbo awss3integritycheckanopensourcebashtooltoverifytheintegrityofadatasetstoredonamazons3
AT brentonjonathanwilliam awss3integritycheckanopensourcebashtooltoverifytheintegrityofadatasetstoredonamazons3
AT rytenmina awss3integritycheckanopensourcebashtooltoverifytheintegrityofadatasetstoredonamazons3