Cloud Storage Overview
We know that every applications requires to store Data, and these Data can be structured, Unstructured , Relational , Transactional and GCP offers various storage options for these Data Types.
How to Choose Right Storge Option
GCP provides a flow chart, that any Enterprise can refer to select right Storage Options for its different Data Storage requirement.
- Here We need to understand, weather our data is structured or not and based that various services will be recommended.
- If Data is structured, then check weather workload is for Analytics?
- If Data is not used for Analytics, then check weather Data is relational?
- If Data is relational, then we need to know that weather we require Horizontal Scaling
- If yes, Then Enterprise should select Cloud Spanner and if not, Enterprise should select Cloud SQL.
Below is the Snapshot of GCP Storage Service, as high level to understand about product in brief manner
It means how the storage service handles transactions and how data is written to a database. In Data Consistency a term called ACID is very useful. Let’s understand what is ACID.
ACID has following attribute:
- Atomicity: It is used, where any transaction involves two or more piece of information and then all pieces are committed or none at all. Example: When we perform any banking transaction, the debit and credit of the funds will always be treated as single transaction. If either debit or credit fails, then both will fail.
- Consistency: If above transaction fails or any failure occurs, then all data will be returned to state before the transaction begun.
- Isolation: Any transaction will always be isolated to other means we would not debit from bank until transfer was complete.
- Durability: Even in state of failure, data would be available in its correct state.
GCP Cloud Storage is a storage service, used to store object in Google Cloud. This service is fully managed and can scale dynamically.
It is used to store object like: Video transcoding, video streaming, static web pages, backups. This storage is used to provide secure and durable storage and offers optimal pricing and performance for your requirement. For your requirements through different storage class.
Cloud Storage uses buckets, which is a basic container, where your data will reside and is attached to GCP project. Each bucket name is globally unique, and once created, it cannot be changed. In bucket class, there is no minimum or maximum storage size, and we only pay for what we use.
Access to bucket can also be controlled by several ways as described below:
- Cloud Identity & Access Management (IAM): IAM will grant access to buckets and the objects inside them. IAM policies are used throughout GCP, and permissions are applied to all objects in bucket.
- Access Control Lists (ACLs): ACLs are used by Cloud Storage, to grant read and write access for individual objects. It is not recommended to use this method, but there may be occasions when it required.
- Signed URLs: These feature gives, time bound or time limited read or write access to an object inside your bucket throughout a dedicated URL. User using this URL, can access the objects for the time, that was specified when URL was generated.
By default, Cloud Storage will always encrypt your data on server side before it is written to disk. There are three options available for server-side encryption.
- Google-Managed Encryption keys: Here Cloud Storge manages encryption keys on behalf of customer, with no need for further setup.
- Customer-Supplied Encryption Keys (CSEKs): Here customer creates and manages their encryption keys.
- Customer-Managed Encryption Keys (CMEKs): Here Customer generates and manages their encryption keys using GCP’s Key Management Service (KSM).
There is also Client-side encryption option, where encryption occurs before data is sent to Cloud Storage and additional encryption takes place at the server side.
It is used to specify the location of storing data. Below are different location types used:
- Region: It refers to specific location like Mumbai, Sydney etc.
- Dual Region: It is a pair of regions, and are geo-redundant, which means data is stored in at least two separate geography and are separated by 100 miles, ensuring maximum availability for our Data.
- Multi-region: It is a large area, such as EU, and will contain two or more geographic places. Multi-regions are also geo-redundant. Storing data in multiple locations will always cost more.
GCP provides variety of Storage class and these classes are summarized in blow table.
We can also edit Cloud Storage bucket to move between these storage class. There are some other storage class that cannot be set using Cloud Console.
- Multi-region Storage: This is as equal as Standard Storage , but can only be used for objects stored in multi-region or dual regions.
- Regional Storage: It is equal to Standard storage but can only be used for object stored in regions.
- Durable Reduced Availability (DRA) Storage: We can compare this storage to Standard storage, but DRA storage has a higher pricing for operation and has lower performance, especially in terms of SLA, which is 99%.
Cloud Storage Fuse
We know that Cloud storage cannot be mounted on GCP compute engine Instance. But GCP currently offers third-party integration using an open-source FUSE adapter, that allows you to mount a storage bucket as a file system on Linux. This service is available free of charge and is officially not supported.
Versioning is used to retrieve objects that we have deleted or overwritten. But if we overwrite each time, an archive version of object will be created and due to which, additional cost can occur. The Archive version of file will retain the original name but will also be appended by a generation number to identify.
In order to enable versioning, use gsutil command
gsutil versioning set on gs://< bucket name>