Understanding Cloud Spanner & Bigtable
There may be situations where you require horizontal scaling and Cloud SQL will not fit these requirements. Enter Cloud Spanner. Cloud Spanner is a cloud-native, fully managed offering that is designed specifically to combine relational database features, such as support for ACID transactions and SQL queries, with the horizontal scaling of a non-relational database. We should look to use Cloud Spanner when we have requirements for high queries per second or to deliver over multiple regions. Unlike most databases, Cloud Spanner is globally distributed and provides a strongly consistent database service with high performance.
It also offers an availability SLA of >=99.999% when you're using a multi-regional instance and can provide up to 10,000 queries per second of reads or 2,000 queries per second of writes. We must create a Cloud Spanner instance inside our GCP project to do this.
When we create a new instance, we should select an instance configuration.
- Multi-regional: By using this instance, we will gain a higher SLA of 99.999% – or a downtime of approximately 5 minutes per year – but it will be more costly. Multi-regional allows the database's data to be replicated in multiple zones across multiple regions, allowing us to read data with low latency from locations. However, because replicas will be spread across more than one region, our applications will see a small increase in write latency.
- Regional: This will result in a 99.99% SLA, which is still very high and equivalent to approximately only 52 minutes of downtime per year. Regional instances should be selected if users and services are within the same region. This will offer the lowest latency. As you can see, the requirements will dictate the best option. We cannot change the instance's configuration after creation. Regional configurations will contain three read/write replicas that allow us to meet any governance requirements regarding where our data is located.
We are also required to select the number of nodes to allocate to our instance. This will determine the amount of CPU/RAM and storage resources that are available to our instance. Each node will provide up to 2 TB of storage, and it is recommended that a minimum of three nodes be used for production environments.
Cloud Spanner now supports processing units. 1,000 processing units is equivalent to 1 node and each increment of 100 units will behave like 1/10th of a node.
Cloud Spanner is built on will automatically replicate at the byte level. However, to provide additional data availability and geographic locality, Cloud Spanner will also replicate data by creating copies (replicas) of the rows that Cloud Spanner organized data into. These copies are then stored in a different geographic area. One of these replicas is elected to act as the leader and will be responsible for handling writes. Cloud Spanner has three types of replicas:
- Read/Write: This type of replication will maintain a full copy of our data and is eligible to become a leader. This is the only type of replication that's available to single-region instances.
- Read-Only: This type of replication will only support reads and cannot become a leader. It will maintain a full copy of our data, which has been replicated from our read/write replica. Read-Only is only available in multi-regional configurations.
- Witness: This type of replication doesn't support reads, nor does it maintain a full copy of our data. Witness replication makes it easier for us to achieve quorums for writes without the compute resources that are required by a read/write replica. Witness is only available in multi-regional configurations.
By default, all data in Cloud Spanner will use Google-managed default encryption. When Customer-Managed Encryption Keys (CMEK) are enabled, Cloud Spanner will use our Cloud Key Management Service (KMS) keys to protect data at rest. To use CMEK, we must specify the Cloud KMS key when the database is created. As of June 2021, Cloud Spanner supports Cloud External Key Manager (Cloud EKM) when using CMEK.
Cloud Spanner offers protection against accidental write or deletions. This is known as Point-in-Time Recovery (PITR). If, for example, an application rollout corrupts a database, then PITR can recover your data from a point in time from the past 7 days. PITR works by allowing us to configure the database's version_retention_period to retain all the versions of the data and its schema. Retention periods can range from 1 hour to 7 days. We can restore either the entire database or a portion of it, but we should also consider performance when we set longer retention periods as this will use more system resources, particularly on those databases that frequently overwrite data.
IAM for Cloud Spanner
Access to Google Cloud Spanner is secured with IAM. The following is a list of predefined roles, along with a short description for each:
- Spanner Admin: This person has complete rights to all Cloud Spanner resources in a project.
- Spanner Database Admin: This person has the right to list all Cloud Spanner instances and create/list/drop databases in the instances it was created in. They can grant and revoke access to a database in the project, and they can also read and write to all Cloud Spanner databases in the project.
- Spanner Database Reader: This person has the right to read from the Cloud Spanner database, execute queries on the database, and view the schema for the database.
- Spanner Database User: This person has the right to read and write to a Cloud Spanner database, execute SQL queries on the database, and view and update the schema for the database.
- Spanner Viewer: This person has the right to view all Cloud Spanner instances and view all Cloud Spanner databases.
- Restore Admin: This person has the right to restore databases from backups.
- Backup Admin: This person has the right to create, view, update, and delete backups.
Quotas & Limits:
- limit of 2 to 64 characters on the instance ID's length.
- limit of 100 databases per instance.
- limit of 2 to 30 characters on the database ID's length.
- limit of 2 TB storage per node.
- limit of a 10 MB schema size.
- limit of a 10 MB schema change size.
- limit of 5,000 tables per database.
- limit of 1 to 128 characters for the table name's length.
- limit of 1,024 columns per table.
- limit of 1 to 128 characters for the column name's length.
- limit of 10 MB of data per column.
- limit of 16 columns in a table key.
- limit of 10,000 indexes per database.
- limit of 32 indexes per table.
- limit of 16 columns in an index key.
- limit of 1,000 function calls.
- limit of 25 nodes per project, per instance configuration.
Bigtable is GCP's big data NoSQL database service. Bigtable is low latency and can scale to billions of rows and thousands of columns. It's also the database that powers many of Google's core services, such as Search, Analytics, Maps, and Gmail. This makes Bigtable a great choice for analytics and real-time workloads as it's designed to handle massive workloads at low latency and high throughput.
When we discuss Bigtable, we will make references to HBase. HBase is effectively an open source implementation of the Bigtable architecture and follows the same design philosophies. Bigtable stores its data in tables, which are stored in a key/value map. Each table is comprised of rows, which will describe a single entity.