SECRET OF CSS

Using CockroachDB Dedicated With Starburst Galaxy


Previous Articles

I’ve written about Presto and CockroachDB in the past, you may find the article below:

Data federation with CockroachDB and Presto

Motivation

I had a customer evaluate CockroachDB Serverless for their workload and one of the requirements was connecting it to Starburst Galaxy. In this article, we’re going to discuss the challenges and the workarounds necessary to integrate Galaxy with Cockroach Cloud.

High-level Steps

  • Start a 9-node multi-region cluster (CockroachDB Dedicated)
  • Create a Galaxy Account
  • Spin up a Trino cluster
  • Conclusion

Step-by-Step Instructions

Start a 9-Node Multi-region Cluster (CockroachDB Dedicated)

I am using a multi-region CockroachDB Dedicated cluster spanning us-east-1, us-east-2 and us-west-2 regions in AWS. You can get a 30-day trial of CockroachDB Dedicated following this link.

While we’re in the Cockroach Cloud console, let’s capture the required info for the next step:

Click “Connect” > select “IP Allowlist” or “AWS PrivateLink” > select user, region and database > click “Next”

Connection info

On the next page, select “Connection parameters”

Connection parameters

The host displayed is pointing to a regional load-balanced endpoint. If this is a multi-region cluster, you need to repeat the step for each region where you want Galaxy to communicate with the CockroachDB cluster.

You can click “Back” once and from the “Region” drop-down, select the region of interest

Region

Once you select the region, you can repeat the steps above to capture the connection parameters

IP allowlist

Here’s what the West 2 region connection parameters look like

Connection parameters

Create a Galaxy Account

You can get a free Galaxy account using the following link. Once you sign up for an account and log in, you will be greeted with an option to create a catalogue.

Connect your data

We are going to use CockroachDB Dedicated for catalogue. Click on configure a catalog button.

Select a data source

Select PostgreSQL

PostgreSQL

Here you will enter your CockroachDB Dedicated information

Enter your desired catalogue name and description > host, which is the host you captured in the previous step, i.e. artem-mr-7xw.aws-us-east-1.cockroachlabs.cloud > port, i.e. 26257 > database name > user > password

PostgreSQL cloud provider

Once complete, click “Test connection”

If the next step fails and you’re greeted with an error like below

test connection

You must add the IP CIDR from Galaxy to the CockroachDB IP Allowlist

add network

Once that’s complete, click “Test connection” again

test connection

Click “Connect catalog”

On the next page, set access roles

set permissions

Click “Save access control”

On the next page, you must choose an existing cluster or create a new cluster

Add to cluster

I don’t have an existing cluster and will create one

Create a new cluster

When you click “Add to cluster” you will get a pop up

Catalog added

I will click “Query my data” to start browsing my data from within Galaxy

At this point, you may see a spinning wheel. I prefer to filter my criteria and will select a specific schema in the query editor

Select schema

At this point, you can query the data

Query the data

At this point, you can create additional catalogues for each participating region in CockroachDB and have the associated Galaxy endpoint talk with the regional CockroachDB cluster.

Conclusion

In conclusion, I’d like to say this is a fairly straightforward setup except for one hang-up. Galaxy requires CIDR IP Allowlisted database services and it is a hard requirement which disqualifies our CockroachDB Serverless offering as there is currently no IP Allowlisting option available. Hopefully, you will find this tutorial useful.



News Credit

%d bloggers like this: