CockroachDB: Row Level Replication Between Regions

#CockroachDB#multi-region#row level replication#leaseholders#replicas#regional by row

By Morgan Winslow at

Using advanced multi-region features in order to control where data lives on a per row basis

Overview

Today I'll be looking at some of the more advanced multi-region features available in CockroachDB. The goal will be to have precise control over where a row lives, as well as where the leaseholder and replicas live.

Once completed, there will be 3 possible scenarios for a row in the same table:

  1. A row never leaves us-east4. This region (us-east4) will contain the leaseholder and all replicas. Reads and writes to this row are contained to the region.
  2. A row never leaves us-central1. This region (us-central1) will contain the leaseholder and all replicas. Reads and writes to this row are contained to the region.
  3. A row has a presence in both regions. The leaseholder and 1 replica will be in us-east4, and 1 replica will exist in us-central1. Follower reads on us-central1 will not impact the nodes in us-east4.

Considerations

  1. In this example we'll just be using two regions as a proof of concept. Two regions is not a recommended setup for CockroachDB.
  2. Multi-region abstractions are sufficient for most use cases. ie REGIONAL BY TABLE, REGIONAL BY ROW, etc.
  3. An enterprise license is required to take advantage of these features.

Setup

Init bank DB

For this example we'll be using the bank database from cockroach workload. This database only has 1 table, also named bank, but it's enough to cover the concepts for this article. To initialize the database and table we can run the following:

cockroach workload init bank 'postgresql://your_username@ip_address:26257?sslmode=disable'

ALTER TABLE

At first, I thought a Regional by Row table would be sufficient for this use case, but this restricts data just to the region specified. There is no way to say that you want data to live in both regions. It would not be able to handle item #3 in our list of requirements.

Because of this, we'll need to use the more advanced options. I'm going to add a column called region to the bank table. This column will end up consisting of 3 different values: 'us-east4', 'us-central1', or 'both'. We'll set the default to 'us-east4'.

Add region Column

ALTER TABLE bank ADD COLUMN region STRING NOT NULL DEFAULT 'us-east4';

We can then partition the table by these values by making the region column as part of the primary key.

ALTER PRIMARY KEY

ALTER TABLE bank ALTER PRIMARY KEY USING COLUMNS (region, id);

Partition

Partition Table

Now we can partition the table based on the values found in the region column. There's two different partitioning methods, but we will use PARTITION BY LIST. You can read more about partitioning here.

ALTER TABLE bank PARTITION BY LIST (region) 
(
    PARTITION us_east4 VALUES IN ('us-east4'), 
    PARTITION us_central1 VALUES IN ('us-central1'), 
    PARTITION both VALUES IN ('both')
);

With this setup, the value we have in the region column will end up mapping to the corresponding partition. This doesn't mean too much without also setting up the zone configurations on a per partition basis.

Set Zone Configurations

With replication zone configurations we can have fine grained control of where these partitions live. For cases 1 & 2, it will be pretty straightforward since we want all ranges (leaseholders and replicas) to stay in one spot. This will look like the following:

ALTER PARTITION us_east4 OF TABLE bank CONFIGURE ZONE USING 
constraints = '[+region=us-east4]';

ALTER PARTITION us_central1 OF TABLE bank CONFIGURE ZONE USING 
constraints = '[+region=us-central1]';

The configuration on the both partition is a bit more complex. Here we are going to specify that 2 of the 3 replicas will live in us-east4, and 1 replica will live in us-central1. We'll also specify that the leaseholder must be in us-east4. This setup will allow us to get fast reads and write quorum in us-east4, but also do follower reads in us-central1 with no impact to the us-east4 region.

ALTER PARTITION both OF TABLE bank CONFIGURE ZONE USING 
constraints = '{"+region=us-east4": 2, "+region=us-central1": 1}',
lease_preferences='[[+region=us-east4]]';

You can read more about configuring replication zones here.

Confirm Configurations

Now that we have everything configured, let's see some different ways to confirm our setup and that data is where we want it to be.

SHOW ZONE CONFIGURATION

The first thing we can do is just check the zone configurations that we just set for the partitions. These SELECTS will return a few different items (that are all configurable), but we are mostly just curious about the constraints and lease_preferences variables.

us_east4 Zone Config

SHOW ZONE CONFIGURATION FROM PARTITION us_east4 OF INDEX bank@bank_pkey;

us_east4 zone config

both Zone Config

SHOW ZONE CONFIGURATION FROM PARTITION both OF INDEX bank@bank_pkey;

both zone config

SHOW RANGES

This command can show the actual ranges for a database, table, or index and will show where these ranges actually are.

SHOW RANGES FROM TABLE bank.bank WITH DETAILS;

You could whittle down the information returned a bit with the following:

SELECT start_key, end_key, range_size_mb, lease_holder_locality, replica_localities 
FROM [SHOW RANGES FROM TABLE bank.bank WITH DETAILS];

You can read more about the SHOW RANGES syntax here. FYI...this syntax changed slightly in 23.1, make sure you are looking at the correct version number for the syntax.

SHOW RANGE FOR ROW

Next, we can use the SHOW RANGE FOR ROW syntax. There's a couple caveats with this feature:

  1. The feature is technically still in preview and is subject to change.
  2. This syntax will show where a row would live based on the info you provide. It does not actively go and check the existing row. You could feed this syntax a non-existent row, and it will tell you where it will end up but not that it doesn't exist.

All that is needed for this feature is to build up the primary key. Since the bank table PK is made up of a region and an id, that is all we need to provide.

You can read more about this feature here.

us-east4: leaseholder

SELECT lease_holder_locality from [SHOW RANGE FROM TABLE bank FOR ROW ('us-east4', 0)];

us-east4 leaseholder

us-east4: replicas

SELECT replica_localities from [SHOW RANGE FROM TABLE bank FOR ROW ('us-east4', 0)];

us-east4 replicas

both: leaseholder

SELECT lease_holder_locality from [SHOW RANGE FROM TABLE bank FOR ROW ('both', 0)];

both leaseholder

both: replicas

SELECT replica_localities from [SHOW RANGE FROM TABLE bank FOR ROW ('both', 0)];

both: replicas

In the both example, you can see that there would be 2 replicas in us-east4, and 1 in us-central1.

Replication Reports

Finally, we can also utilize Replication Reports. These reports can be run to see if you have any constraint violations across your system. The following statement should return 0 results.

SELECT * FROM system.replication_constraint_stats WHERE violating_ranges > 0;

You can read more about replication reports here

Conclusion

This is just starting to scratch the surface of configuration options available, but should provide a decent starting point for the fundamentals. The docs pages provided in each section will go into much more detail about the additional manipulation you can do.