CockroachDB: Row Level Replication Between Regions
By Morgan Winslow at
Using advanced multi-region features in order to control where data lives on a per row basis
Overview
Today I'll be looking at some of the more advanced multi-region features available in CockroachDB. The goal will be to have precise control over where a row lives, as well as where the leaseholder and replicas live.
Once completed, there will be 3 possible scenarios for a row in the same table:
- A row never leaves us-east4. This region (us-east4) will contain the leaseholder and all replicas. Reads and writes to this row are contained to the region.
- A row never leaves us-central1. This region (us-central1) will contain the leaseholder and all replicas. Reads and writes to this row are contained to the region.
- A row has a presence in both regions. The leaseholder and 1 replica will be in us-east4, and 1 replica will exist in us-central1. Follower reads on us-central1 will not impact the nodes in us-east4.
Considerations
- In this example we'll just be using two regions as a proof of concept. Two regions is not a recommended setup for CockroachDB.
- Multi-region abstractions are sufficient for most use cases. ie
REGIONAL BY TABLE
,REGIONAL BY ROW
, etc. - An enterprise license is required to take advantage of these features.
Setup
Init bank DB
For this example we'll be using the bank
database from cockroach workload
. This database only has 1 table, also named bank
, but it's enough to cover the concepts for this article. To initialize the database and table we can run the following:
cockroach workload init bank 'postgresql://your_username@ip_address:26257?sslmode=disable'
ALTER TABLE
At first, I thought a Regional by Row table would be sufficient for this use case, but this restricts data just to the region specified. There is no way to say that you want data to live in both regions. It would not be able to handle item #3 in our list of requirements.
Because of this, we'll need to use the more advanced options. I'm going to add a column called region
to the bank
table. This column will end up consisting of 3 different values: 'us-east4'
, 'us-central1'
, or 'both'
. We'll set the default to 'us-east4'
.
Add region
Column
ALTER TABLE bank ADD COLUMN region STRING NOT NULL DEFAULT 'us-east4';
We can then partition the table by these values by making the region
column as part of the primary key.
ALTER PRIMARY KEY
ALTER TABLE bank ALTER PRIMARY KEY USING COLUMNS (region, id);
Partition
Partition Table
Now we can partition the table based on the values found in the region column
. There's two different partitioning methods, but we will use PARTITION BY LIST. You can read more about partitioning here.
ALTER TABLE bank PARTITION BY LIST (region)
(
PARTITION us_east4 VALUES IN ('us-east4'),
PARTITION us_central1 VALUES IN ('us-central1'),
PARTITION both VALUES IN ('both')
);
With this setup, the value we have in the region
column will end up mapping to the corresponding partition. This doesn't mean too much without also setting up the zone configurations on a per partition basis.
Set Zone Configurations
With replication zone configurations we can have fine grained control of where these partitions live. For cases 1 & 2, it will be pretty straightforward since we want all ranges (leaseholders and replicas) to stay in one spot. This will look like the following:
ALTER PARTITION us_east4 OF TABLE bank CONFIGURE ZONE USING
constraints = '[+region=us-east4]';
ALTER PARTITION us_central1 OF TABLE bank CONFIGURE ZONE USING
constraints = '[+region=us-central1]';
The configuration on the both
partition is a bit more complex. Here we are going to specify that 2 of the 3 replicas will live in us-east4, and 1 replica will live in us-central1. We'll also specify that the leaseholder must be in us-east4. This setup will allow us to get fast reads and write quorum in us-east4, but also do follower reads in us-central1 with no impact to the us-east4 region.
ALTER PARTITION both OF TABLE bank CONFIGURE ZONE USING
constraints = '{"+region=us-east4": 2, "+region=us-central1": 1}',
lease_preferences='[[+region=us-east4]]';
You can read more about configuring replication zones here.
Confirm Configurations
Now that we have everything configured, let's see some different ways to confirm our setup and that data is where we want it to be.
SHOW ZONE CONFIGURATION
The first thing we can do is just check the zone configurations that we just set for the partitions. These SELECTS
will return a few different items (that are all configurable), but we are mostly just curious about the constraints
and lease_preferences
variables.
us_east4 Zone Config
SHOW ZONE CONFIGURATION FROM PARTITION us_east4 OF INDEX bank@bank_pkey;
both Zone Config
SHOW ZONE CONFIGURATION FROM PARTITION both OF INDEX bank@bank_pkey;
SHOW RANGES
This command can show the actual ranges for a database, table, or index and will show where these ranges actually are.
SHOW RANGES FROM TABLE bank.bank WITH DETAILS;
You could whittle down the information returned a bit with the following:
SELECT start_key, end_key, range_size_mb, lease_holder_locality, replica_localities
FROM [SHOW RANGES FROM TABLE bank.bank WITH DETAILS];
You can read more about the SHOW RANGES
syntax here. FYI...this syntax changed slightly in 23.1, make sure you are looking at the correct version number for the syntax.
SHOW RANGE FOR ROW
Next, we can use the SHOW RANGE FOR ROW
syntax. There's a couple caveats with this feature:
- The feature is technically still in preview and is subject to change.
- This syntax will show where a row would live based on the info you provide. It does not actively go and check the existing row. You could feed this syntax a non-existent row, and it will tell you where it will end up but not that it doesn't exist.
All that is needed for this feature is to build up the primary key. Since the bank table PK is made up of a region and an id, that is all we need to provide.
You can read more about this feature here.
us-east4: leaseholder
SELECT lease_holder_locality from [SHOW RANGE FROM TABLE bank FOR ROW ('us-east4', 0)];
us-east4: replicas
SELECT replica_localities from [SHOW RANGE FROM TABLE bank FOR ROW ('us-east4', 0)];
both: leaseholder
SELECT lease_holder_locality from [SHOW RANGE FROM TABLE bank FOR ROW ('both', 0)];
both: replicas
SELECT replica_localities from [SHOW RANGE FROM TABLE bank FOR ROW ('both', 0)];
In the both
example, you can see that there would be 2 replicas in us-east4, and 1 in us-central1.
Replication Reports
Finally, we can also utilize Replication Reports. These reports can be run to see if you have any constraint violations across your system. The following statement should return 0
results.
SELECT * FROM system.replication_constraint_stats WHERE violating_ranges > 0;
You can read more about replication reports here
Conclusion
This is just starting to scratch the surface of configuration options available, but should provide a decent starting point for the fundamentals. The docs pages provided in each section will go into much more detail about the additional manipulation you can do.