CockroachDB: v19.1 to v23.1 Migration via CDC-Sink

Using the cdc-sink project to migrate data in CockroachDB v19.1 to v23.1

Overview

In this blog I'll be going over the steps required to use the cdc-sink project to migrate a "Self-Hosted" CockroachDB cluster running v19.1, to a "Dedicated" cluster running 23.1.

Using cdc-sink is an alternative route to the more traditional upgrade path, which would require incremental upgrades. It is not possible to go straight from 19.1 -> 23.1. The complete upgrade path would look like the following:

19.1 -> 19.2 -> 20.1 -> 20.2 -> 21.1 -> 21.2 - > 22.1 -> 22.2 -> 23.1

While this is certainly doable, it's a number of steps with risk involved. An alternative path could be cdc-sink to move the data only. This tool can leverage Change Data Capture (CDC) to pipe data out from a source CockroachDB to a target CockroachDB. I would encourage you to read more about the tool, its feature set, and its limitations in the documentation linked above.

Disclaimer: cdc-sink is a powerful tool but not officially supported by Cockroach Labs except by prior arrangement

High-Level Steps

Create v19.1 Self-Hosted Cluster on AWS
Create v23.1 Dedicated Cluster
Download cdc-sink project
Alter one line in cdc-sink project and create binary
Initialize source database with data and start running a workload (tpcc)
Start cdc-sink in 'immediate' mode and start changefeed in source cluster
Monitor progress in target cluster
After initial dump has made significant progress, stop changefeed
Start new changefeed in 'transactional' mode
Stop workload and cutover to target cluster. Compare source and target cluster data to ensure consistency

Create Clusters

Self-Hosted - v19.1

For my self-hosted setup, I started a 3 nodes cluster with all nodes running on a single EC2 instance in AWS. This isn't a true production setup, but will get the job done for the purposes of this test. You can follow the steps outlined here for this portion of the exercise.

Dedicated - 23.1

The Dedicated offering is a fully managed offering of CockroachDB that is deployed on cloud infrastructure. You can follow the steps outlined here to create a Dedicated cluster. This exercise could also be completed using a Serverless cluster.

I used a Dedicated GCP cluster with 3 nodes, 8vCPU per node, and 400GiB disk per node.

Download and Installing cdc-sink

cdc-sink is a golang binary and can be installed from source:

go install github.com/cockroachdb/cdc-sink@latest
$HOME/go/bin/cdc-sink version

You can read more about installing cdc-sink here.

Alter Regex

Here I will outline one change I did have to make for this process to work. Instead of doing this process manually, feel free to grab the updated binary here.

The file output format underwent a change between v19 and v23, which caused an error in how staging tables were named. I won't go too into the weeds here, but we were able to get around this by changing a regex in the ndjson.go file.

Before:

ndjsonRegex = regexp.MustCompile(`^/(?P<targetDB>[^/]+)/(?P<targetSchema>[^/]+)/(?P<date>\d{4}-\d{2}-\d{2})/(?P<uniquer>.+)-(?P<topic>[^-]+)-(?P<schema_id>[^-]+).ndjson$`)

After:

ndjsonRegex = regexp.MustCompile(`^/(?P<targetDB>[^/]+)/(?P<targetSchema>[^/]+)/(?P<date>\d{4}-\d{2}-\d{2})/(?P<timestamp>[^-]+)-(?P<topic>[^-]+)-(?P<schema_id>[^-]+)-(?P<uniquer>.+).ndjson$`)

Now that we have updated the file, we can build a new binary.

go build .

Special shout out to my colleague, and primary contributor to cdc-sink, Bob Vawter for spoon feeding me this fix.

Prepare Target Database

Create _cdc_sink staging database.

create database _cdc_sink;
ALTER DATABASE _cdc_sink CONFIGURE ZONE USING gc.ttlseconds=300;

Create empty tpcc database and drop foreign key constraints.

cockroach workload init tpcc 'postgresql://root@0.0.0.0:26257?sslmode=disable' 

alter table customer drop constraint customer_c_w_id_c_d_id_fkey;
alter table district drop constraint district_d_w_id_fkey;
alter table history drop constraint history_h_c_w_id_h_c_d_id_h_c_id_fkey;
alter table history drop constraint history_h_w_id_h_d_id_fkey;
alter table new_order drop constraint new_order_no_w_id_no_d_id_no_o_id_fkey;
alter table "order" drop constraint order_o_w_id_o_d_id_o_c_id_fkey;
alter table order_line drop constraint order_line_ol_w_id_ol_d_id_ol_o_id_fkey;
alter table order_line drop constraint order_line_ol_supply_w_id_ol_i_id_fkey;
alter table stock drop constraint stock_s_w_id_fkey;
alter table stock drop constraint stock_s_i_id_fkey;

Prepare Source Database

The source database needs rangefeeds enabled and an enterprise license.

SET CLUSTER SETTING kv.rangefeed.enabled = true;
SET CLUSTER SETTING cluster.organization = 'your-org-here';
SET CLUSTER SETTING enterprise.license = 'your-key-here'

Create tpcc database and start loading data.

cockroach workload init tpcc 'postgresql://root@0.0.0.0:26257?sslmode=disable' --warehouses=100
cockroach workload run tpcc 'postgresql://root@0.0.0.0:26257?sslmode=disable' --warehouses=100 --duration=60m

Start CDC Sink

./cdc-sink start --bindAddr :30004 --disableAuthentication --targetConn "postgresql://root:password@morgan-cdc-sink-k2h.gcp-us-east1.cockroachlabs.cloud:26257/?sslmode=verify-full&sslrootcert=morgan-cdc-sink-ca.crt" --logDestination "cdc-sink-log" --immediate -v  &

Create Changefeed on Source

create changefeed for table tpcc.customer, tpcc.district, tpcc.history, tpcc.history, tpcc.item, tpcc.item, tpcc.new_order, tpcc.order, tpcc.order_line, tpcc.stock, tpcc.warehouse into 'experimental-http://127.0.0.1:30004/tpcc/public?insecure_tls_skip_verify=true' WITH updated, resolved='10s';