Persistent storage
Last updated on 2024-10-21 | Edit this page
Overview
Questions
- How to create a Google Cloud Storage bucket?
- What are the basic operations?
- What are the cost of storage and download?
Objectives
- Learn to create a Google Cloud Storage bucket
- Learn to list the contents of a bukcet
- Understand the persistant storage costs.
Storage for the output files
The processing workflow writes the output files to a storage from which they can be downloaded afterwards.
For this tutorial, we use a Google Cloud Storage (GCS) bucket.
Callout
The storage is created separately from the cluster resources. You can then delete the cluster just after the processing and avoid unnecessary costs, but keep the output files.
Prerequisites
GCP account and project
Make sure that you are in the GCP account and project that you intend to use for this work. In your Linux terminal, type
The output shows your account and project.
Billing account?
If this is your first project or you created it from the Google Cloud Console Web UI, it will have a billing account linked to it, and you are ready to go.
If you created the project from the command line without specifying the billing account, you must link it to an existing billing account.
First list the billing accounts
Take the account id from the output, and check if your project is linked to it
If not, link your project to this account with
Create the bucket
Create a storage bucket with
You can test copying a file to it with
and list the contents of the bucket with
You can remove the file with
Note that the bucket is tied to your GCP project and you can only access it when authenticated.
Costs
Storage
The storage
cost of a GCS bucket depends on the data volume. The costs may vary
from a region to another. For europe-west4
, the current
(October 2024) monthly cost is $0.020 / GB.
Operations
An operation
is an action that makes changes to or retrieves information from a
bucket and it has a tiny cost: $0.005 per 1000 operations. This applies,
for example, to listing the contents of the bucket on your terminal. The
traffic between GCP computing resources and storage within the same
region (e.g. in europe-west4
) is free.
In the context of this tutorial, the operations costs are insignificant. However, keep this is mind if you plan to write scripts that list the bucket content from your terminal.
Networking and download
Downloading data from the GCS bucket to your computer has a significant cost: the current (October 2024) cost to internet locations (excluding China and Austalia) is $0.12 / GB.
In the context of the example of this tutorial, the resulting output files are approximately 30% of the original MiniAOD dataset volume. For example, downloading the 330 GB ouput from a processing of a 1.1 TB MiniAOD dataset will cost $40.
Key Points
- Google Cloud Storage bucket can be used to store the output files.
- The storage cost depends on the volume stored and for this type of processing is very small.
- The download of big output files from the bucket can be costly.