Cassandra is a highly scalable database with an architecture that makes it well suited for multi-region workloads. As companies are standardizing on Kubernetes, they often want their clusters to span across multiple zones within a single region. However, multi-region Kubernetes clusters have serious technical challenges to overcome to be used in production. The challenge of handling stateful workloads on a multi-cluster Kubernetes setup, and more specifically with Kubernetes operators, was heavily discussed at KubeCon EU 2022.
In this presentation, DataStax Software Engineer John Sanda introduces K8ssandra Operator, which has been designed specifically to address the issue of running Cassandra on a Kubernetes multi-cluster deployment. John discusses how it was designed, how to reconcile objects across multiple clusters, how to manage secrets, pitfalls to avoid, and testing strategies. All of this knowledge can be ported to build other multi-cluster-friendly Kubernetes operators.
Bart Farrell 00:00
Welcome, everyone to another Data on Kubernetes live stream. Not just another one, but a very special one because we’ve got a hot topic today. I’ve got rhythms. I’ve got rhymes. If operators are your thing, then now is the time. We are joined today by John from DataStax. Before we get into that, just a couple of announcements that I want to make. If you have not seen that, all of our KubeCon, co-located events, DoK day talks are on YouTube. I will drop the playlist right here. Please feel free to check that out. Another thing that if you haven’t checked out, you need to check out is the Data on Kubernetes Community’s report. We interviewed 500 organizations about how they perceive this opportunity as well as the challenges of running data on Kubernetes, exactly what’s going on when we’re thinking about end-users. So, take a look there, lots of interesting insights. We’re going to be talking about that more, extending that information going deeper with some of the members in our community as well as bringing on other end-users to hear about their experiences. Like I said in the beginning, this is live stream number 97. This topic of operators has been a recurring theme that’s come up again and again. In terms of tackling this issue of running stateful workloads on Kubernetes via databases, storage, etc., operators are one of the best solutions that we seem to find up until now. We’re always curious to see how these operators come about what’s necessary to build them – we’re talking about the skills and the team, the time that’s necessary to get it going, then also perhaps the fine-tuning that will come into that. Cassandra has not been a stranger to our community. We’re happy to have it back on today with K8ssandra, but to talk about that further is going to be John. John Sanda, very nice to have you with us today. Could you just give us a little bit of background about your experience? I know we were talking before we got started; I don’t want to make you repeat too much, but about – how you got introduced to this whole concept of data on Kubernetes, and then your experience with operators from beyond that.
John Sanda 01:48
I got involved with Cassandra early on in a number of years of my career with Red Hat, a lot of people working on a system that ran into scale issues and then turned to Cassandra as a solution. Later, that found its way into that solution in OpenShift, which was even pre-StatefulSets. With some of the work that I did after Red Hat, I saw that there were some of the problems that people were trying to solve with other solutions: Puppet, Chef, Ansible. It became clear that they are hard problems and Kubernetes offered potentially a better solution. Not necessarily that easier, but better. Around this time, I saw a lot of work in the operator space for different databases in Cassandra. And it just became clear to me that it’s a question more of when, not if it would be possible to make this a reality. And that got me excited. Eventually, it led me to DataStax into the K8ssandra project.
Bart Farrell 03:02
Very good. Can you just give us a little bit of background, if some folks don’t know about how K8ssandra got started; what’s the deal with that?
John Sanda 03:08
Yes. DataStax has an operator for Cassandra called Cass Operator, which is open-sourced. It was sometime last year. It turns out, there are quite a few other operators for Cassandra. One of the ones that was more noteworthy was developed by Orange, from some folks based out of France called Casskop. They presented at some conferences; it’s great documentation, just groundbreaking work with it. In the Cassandra community, there are a lot of people who are already running Cassandra Kubernetes or who are looking to do it but aren’t sure what operator to use. So, I hooked up with Patrick McFadin and started a working group to try to figure out what is the best solution for the community. I think there’s some consensus that we could consolidate and come up with a community-based operator. That was the genesis of K8ssandra. It wasn’t just the operator, but also there are other tools involved for managing and monitoring Cassandra. In fact, some of that came out of my work at The Last Pickle, where we’re telling – ‘well, we recommend tools for managing K8ssandra to make your life easier, and all which are open-source tools.’ I thought, these tools should exist in Kubernetes and should be complements to those operators. And that led to the creation of K8ssandra.
Bart Farrell 04:50
Very good. That being said, I don’t want to steal your thunder or anything. If you want to start sharing your presentations, we don’t spoil anything. Go for it. Folks, as usual, feel free to ask questions in the chat. John’s also going to be showing some code. Once he starts switching to that, if for whatever reason, you’d like him to zoom in a little bit more, just let us know in the chat. We’ll do that immediately. If you’d like to share your screen, go for it.
John Sanda 05:10
Hey, everybody. I’m stoked to be here today. I’m going to be presenting ‘Building a Multi-cluster Operator’ with an early look at K8ssandra Operator. For the agenda, I’ll give a brief background on myself, talk about why multi-cluster, do a little bit of intro into what are operators, then we’re going to take a look at controller-runtime, get into the details and some of the various components. And then we’re going to look at specifically what we need to do to go from a single or in-cluster controller to a multi-cluster. Then, we’ll dig into a little bit of the K8ssandra Operator and then wrap up.
So again, my name is John Sanda. I’m located in North Carolina. I’ve been with DataStax since March of last year. I started working on the K8ssandra project about a year ago. Prior to that, I was at The Last Pickle working as an Apache Cassandra consultant, for about a year. And then, prior to that, I was at Red Hat for a number of years. It was while I was at Red Hat, when I was first introduced to or started working with K8ssandra on a management monitoring system. That was early on Cassandra 1.2. Eventually, a future iteration of that project wound up being used in OpenShift for an early monitoring system. That was my first experience with OpenShift as well as Kubernetes; it was actually pre-StatefulSets, so that was a bit of a rocky road.
For folks, I guess, in the US and I’m sure well celebrated Halloween is right around the corner. So, I thought I would give a trick or treat. It’s my kids on the left and on the right here is the handiwork from my daughters. Finally, found a good use for their Barbies. Now with that said, talk a little bit about K8ssandra.
K8ssandra is a cloud-native distribution of Cassandra built specifically for Kubernetes. That’s a bit of a mouthful. It includes Cassandra, but it also includes a number of tools to get you up and running with the things that you need: tools for observability, your service monitors to integrate with Prometheus or any other Prometheus compatible back-end, Grafana dashboards, Reaper for running repairs, which is to make sure your data is consistent across the cluster, Medusa for backup restore, and Stargate, which is an API gateway for Cassandra. K8ssandra includes multiple operators, namely Cass Operator, which is the operator for managing and provisioning your Cassandra nodes. K8ssandra is packaged as a collection of multiple Helm charts, and there’s a good bit of engineering and logic in the Helm templates themselves. We felt that we were kind of stretching the boundaries with Helm; that coupled with the desire to support multi-region Cassandra clusters led to the decision to sooner rather than later start work on an operator. And that’s going to be the basis of K8ssandra 2.0. Lastly, check out the project at k8ssandra.io. We’re trying to build the community, it’s open-source, and I’d love for you to get involved.
Let’s talk briefly, why multi-cluster? Cassandra is designed from the ground up for multi-region. It’s partition tolerant. Nodes are smart and they will automatically route traffic to nearby neighbors. The cluster is homogenous; your clients can send requests to any node in the cluster, and the data will be automatically and asynchronously replicated. It’s considered a good idea to configure clients throughout traffic to the local data center. If you’re not familiar with Cassandra, a data center is a logical grouping of nodes, where you can configure replication for each data center. If I have a data center – East, data center – West, for example, and a client’s sending rights to East. Cassandra will automatically replicate those rights in the background to the West nodes in the West data center.
Kubernetes, on the other hand, really wasn’t designed for being multi-region. One of the big challenges is increased latencies. Looking at the etcd FAQ, there’s a question about running etcd across multi-region or multi-data centers. I have part of the answer here; it says, ‘the cost is higher consensus request latency from crossing data center boundaries’. There’s a number of tickets dealing with problems with latencies and performance for an etcd. As you know, etcd is the datastore for Kubernetes. So, if you start having a lot of latency spikes and performance problems in etcd, that’s going to have a cascading domino effect. As a result, these and other challenges… We’ve seen there’s a lot of exciting work going on in the multi-cluster space in Kubernetes today.
We’ll do a quick intro to operators. Some of this may be familiar territory; I’ll try to be quick. But I think it’s important to lay the foundation for some of the things that we’re going to discuss later on. So, what’s an operator? An operator is a Kubernetes native application. What I mean by that, it’s specifically designed to run in Kubernetes, working with Kubernetes APIs. It has domain-specific knowledge for an application. I like to think of it this way, I have some Runbooks for my database for performing upgrades, rolling out configuration updates. And if I’m going to build an operator, I want the operator to automate those Runbooks. Operator is going to be typically built with Operator SDK or Kubebuilder. And it consists of custom controllers and Custom Resource Definitions. The diagram below is the Operator Capability Model; it shows the different capability levels of an operator. I included this just to give you a sense of what types of functional you might expect from an operator. If you’re not familiar with operators, it can be a pretty useful diagram. I mentioned controllers and Custom Resource Definitions. Well, Kubernetes itself is comprised of a lot of controllers, and a controller manages objects of one or more Kubernetes types. For example, you have a deployment controller and a StatefulSet controller. There’s a well-known controller pattern, which is depicted in this diagram. This control loop, where the controller is just continually checking the desired state and the actual state, doing whatever it can to make sure that the actual state of the object matches that desired state.
Now let’s see, what do we mean by the desired state? Here’s a manifest for an NGINX deployment. This represents the desired state. I want to deploy NGINX 1.14.2 with three replicas. In addition to what I’m declaring here in this spec, the desired state would also include any default values that would get initialized when I create that deployment. When I create the deployment, we’ll see that it has a status; that gives me a summary of the current state of the object. I’m usually the person or there’s some other actor in the system, whether it’s the end-user or some other actor that’s creating the deployment and modifying the spec. Typically, it’s going to be a controller that will update the status of an object. It’s also worth noting that the status is something that could be recreated at any point, just by observing the actual current objects in the system. That’s something important to be aware of, as you’re developing or implementing controllers.
So next, I’m going to take a look at what I find to be a really good diagram. Just let me know if this is hard to see, and I’ll zoom in on it. This diagram here – there’s a lot going on, and I only want to highlight a few things that are relevant for what we’re going to discuss later on. I’m going to go through everything here. In the upper right here, we can see we have the API server. This diagram just describes some of the components of all the client-go and the interactions with it and our controllers. Initially, the client will send a ListAndWatch request to the API server for objects that it’s interested in, let’s say deployments. And there’s a cache that’s going to get populated with the objects that are of interest. When there are events of interest, objects created or modified, the API server is going to notify the client. On the client-side, there’s a component called an informer and it does a couple of things. It updates the cache with that object, and then it’s going to notify the event handler. And that event handler is going to take a key, which will identify the object, and add that key to a queue. That queue is managed by a controller. When the controller sees something on the queue, it’s going to pop that item off the queue. We have the ProcessItems, which is a function where actual work is done to make sure that the actual state machine matches the desired state.
Now, I want to take a look at a concrete example of that ProcessItem function. This example is taken directly from the controller implementation in the controller-runtime library. And there’s not a whole lot going on. The function pops an object off the queue, it checks to see if it’s shut down. The next thing I just want to highlight is it calls this reconciler handler method to do the actual work of making sure the desired state and actual state match. And that’s it.
I also mentioned the control loop pattern, so I wanted to show the corresponding code for it. Not terribly exciting, that’s it. This is a tight for loop, we’re just going to process the next work item. It’s also worth noting that controllers that are written… that don’t use controller-runtime or predate controller-runtime. They’re going to follow this same control loop pattern, even if the code is a bit different. So, a little bit of background on controllers and also mentioned, that operator consists of Custom Resource Definitions. Custom Resource Definition is a way to extend Kubernetes API; they can be dynamically added and removed. You can add a Custom Resource Definition to a Kubernetes cluster at any point. Operators typically define one or more CRDs. It’s usually a standard practice that you’re going to implement a controller for each Custom Resource Definition.
Bart Farrell 17:43
One thing quickly on that, we had a speaker one time, who said that his favorite feature in Kubernetes is CRDs. Would you agree? Or do you have something even better?
John Sanda 17:52
Yes, I would agree. The two words that came to mind when I started learning about CRDs, I said, ‘game-changer’. It’s so much of everything we do now with Kubernetes; it’s a total game-changer for Kubernetes, in my opinion. Here’s an example of a Customer Resource of a K8ssandraCluster object, which is installed with the K8ssandra Operator. So, here’s why I say it’s a game-changer. We’ll get more into this a bit later. This Custom Resource declares the desired state of these resources and creates a multi data center Cassandra cluster that has two data centers: dc1 and dc2. With dc1 being in the East cluster and dc2 being in the West cluster. Here, the declarative model of Kubernetes with Custom Resources, an extremely powerful combination. As a user, this gives me a really powerful set of tools without having to necessarily understand all the plumbing that’s going on underneath to make that happen.
Now, I’m gonna shift gears. We’re gonna look at controller-runtime; get into some of the implementation details of controller-runtime so that we can understand what’s needed to make a controller multi-cluster. First, what is controller-runtime? It’s a library built on top of client-go for building controllers. It’s used by both Operator SDK and Kubebuilder. If you’re writing an operator, controller-runtimes provides the primary set of APIs that you’re going to be dealing with in your operator code. In fact, you may not even need to use client-go directly. In addition, controller-runtime also provides some really good tools for integration testing.
Now, I want to go through some of the important types in controller-runtime. Everything I’m going to mention here, these are all Go interfaces. Then, we’re going to walk through some code. I’m gonna highlight these types, and we’ll see how this ties in with the multi-cluster controller.
- First is Runnable. Runnable is a simple interface that defines the start method. You look at the controller runtime code, you’ll see that pretty much everything is a Runnable.
- Then, we have Cache. If you think back to that diagram that I pulled up a little bit ago – we have the caches corresponding to that, and it stores Kubernetes objects, and it provides the informers, which enable the event notifications for our controllers.
- Then, there’s a Client object that provides APIs for doing CRUD operations: creating my objects, doing the updates and deletes. The clients can be configured to either read from the cache or directly from the API server. By default, configured to read from a cache; that’s generally what you want to do.
- Then, we have a Cluster object, which is an abstraction for the actual Kubernetes cluster that you’re talking to. The Cluster will initialize the Cache and the Client.
- Now, we have the Controller. I mentioned before that it has a work queue that it manages: creates a queue and manages that queue, and it calls the Reconciler.
- When you’re implementing an operator using Operator SDK or Kubebuilder, you’re going to be implementing Reconcilers; you’re not going to be implementing the actual controller function itself. And that is going to contain the actual domain-specific logic for making sure that the actual desired state matches.
- Then, we have a Source, which represents the source of events. In the code example, we’re gonna look at, it involves MemCached. So, the source example would be a deployment. Source also has a very important role of registering those event handlers to push work onto the controller’s work queue.
- Lastly, there’s a Manager. Manager is responsible for bootstrapping everything. It initializes dependencies that are used across all your controllers in the operator, most notably the Cache and the Client. So, the components that you need for interacting with the API server… and will wire up the dependencies. It will also start your controllers. For starting up the various components: the controllers, as well as the caches, is a pretty non-trivial operation. There are a lot of concurrencies involved. Manager abstracts all of that for us.
Now do a quick brief code walkthrough. Here’s the link for the code. I don’t have anything on GitHub, I simply just followed the first few steps of the Operator SDK tutorial to generate the scaffolding for the sample project for MemCached Operator. We’ll take a look at that now. Let me make this bigger. So again, I have not written any of this code; this is all just generated from the scaffolding and is everything that we need for the purposes of the discussion. The main function here is the entry point for starting the operator. The first thing I want to highlight is the call to this NewManager function, which creates the Manager object, as its name implies. When the Manager is initialized, that will also take care of initializing the Cluster object, which will create the Cache and the Client. That Cache and the Client are exposed to us through the Manager. Next, I have a MemCachedReconciler type that we’re initializing. We see here, again, I’m initializing a Client field and accessing that through the Manager. We’re calling the SetupWithManager function or method. We’ll take a look at that. Now, there’s only a few lines of code here, using a builder API from the controller-runtime. A few things that are happening here are really key. The first thing is, this is where the actual controllers being created, the controllers being added to the manager. Then, we’re setting up a watch for MemCached objects. Now, I missed something. I’m gonna go back briefly and I mentioned that when creating the Manager, it’s creating the Cache and the Client. Then here, we’re creating the controller. They’re created, but they’re not usable at this point until we call the manager.Start. That start method is responsible for making sure that everything is started. There’s a secret, a sync that has to happen with the caches; all that has to be done in the controllers, has to be started through the manager. So, that’s a really key piece.
Now, look a little bit more at what we mean by setting up a watch. When a MemCached object is created or updated, I want my controller to get notified and have this reconcile method called, to do whatever work it needs to do. I’m going to jump over here into the builder code. I just want to highlight a couple key lines of code to show you what exactly is going on under the covers. If you haven’t seen this code before, don’t worry if it looks a little confusing. Just these two lines here that I’ve highlighted are what I want to discuss. We’re creating a Source, which I mentioned before, represents a source of events. And an actual source for creating is a Kind object. The documentation here says a Kind provides the source of events that originate inside the cluster. Then, we’re creating our event handler for the particular object – in this case, it’s MemCached. Then, it’s calling the Watch function of the controller. The Watch function injects the cache into the source object, and it checks if the controller isn’t started, it just adds the source to a list. Otherwise, we’ll call the start method on the source. So, one way or another, that start method will get called. And then there’s just a couple things I want to point out from the source method. I have a Kind object here for my memcached. And I have a Go routine, where the work is done. And there’s two things I want to just draw your attention to. Right here, I’m getting the informer for MemCached type. Remember that the informer is responsible for the tied in with the event handlers and updating the cache. And at line 133, I’m registering an event handler with the informer. We see here, the event handler is being initialized with the queue, which is my controller’s work queue. So, when a MemCached object is created or updated or deleted, the informer will get notified. And it’s going to call this event handler which will then create a request that is added to the work queue of the controller. Then, the controller will call this Reconcile function, where the actual work is done. That’s a kind of a quick walkthrough of the various components involved with setting up a controller and some of the components in the controller and the watches.
Now, let’s talk about multi-cluster controller. What do we need to do if we wanted to take that controller or and make it capable of being multi-cluster? First, let’s talk about the design for a multi-cluster controller based on some of the things we’ve discussed. These are some of the things that we were trying to figure out during the early development with K8ssandra Operator. With K8ssandra Operator, we could be dealing with an arbitrary number of clusters. We don’t want to have to deal with creating a separate controller per cluster, because that would mean that we would have multiple work queues that just don’t scale and it would get really messy. And we’d have to coordinate between those controllers and work queues. Similarly, I don’t wanna have to deal with creating multiple Manager objects. I want to have a single Manager object. Whatever I need for multi-cluster, I want that single manager to be able to manage the lifecycle of everything – keep things easier to maintain. As I mentioned, the other Cluster object provides the Cache. For each Kubernetes cluster that I want to interact with, there’s going to be a Cluster object, and I want that one Manager to be capable of initializing and starting the caches for each of them. I need to be able to configure watches across those clusters, so that if I have clusters: East and West and the MemCached is modified in West, I want my controller to be notified regardless of where it’s running. Everything should be able to be done with a single work queue in the reconciler. I want all of the events and all the work to fall into a single reconciler.
Next, I want to point out this design document. I’ll pull this up real quick. I’m not going to go through this; this is from the controller-runtime library. If you are interested in building a multi-cluster operator controller, this is definitely worth the read. So, I just really wanted to highlight and point this out. This document also provides a really good example of what is needed to do to create a multi-cluster controller. Here is an example – there are two clusters: a reference cluster and a mirror cluster. And this is managing secrets. This controller will copy secrets from the reference cluster to the mirror cluster. In the reconcile function here, when this is invoked, it first looks up the secret in the reference cluster. If the secret exists, you go down here and then check to see if it exists in the mirror cluster. If it doesn’t, we create a copy of the secret and write it to the mirror cluster. So, it’s pretty straightforward. And the next one is NewSecretMirrorReconciler function that is responsible for wiring up the controller. First here, it creates the controller objects, it sets up the watch for secrets. And here is the interesting part. It’s creating a source calling this NewKindWithCache method. I showed you code that walkthrough for the MemCached example that was creating a Kind. Here, it’s similar except there’s a second argument, where we’re passing the cache for the mirror cluster. This effectively allows us to set up a watch to get notifications from a different cluster – an event handler for a different cluster. And here’s the main function that starts up the cluster. What this example doesn’t show is how we get access to the mirror cluster. We’ll all get into that, as we start looking at the K8ssandra Operator, which is coming up now.
K8ssandra Operator is an operator for K8ssandra, as the name implies. In addition to Cassandra, there are a number of components included with K8ssandra and the operator. In addition to installing and configuring and managing those components, one of the primary drivers for the operator is to support multi data center, multi-region Cassandra clusters. The operator consists of a control plane and a data plane. The control plane only deals with objects that exist in the API server; it’s not spinning up pods. For now, the control plane can only be installed in a single cluster. The data plane can be installed in any number of clusters. The control plane cluster can function as the data plane. For example, automated tests will spin-up two clusters: one cluster’s a data plane, and the other one functions as the data plane and the control plane.
Here’s the architecture diagram of the operator. We have three clusters here. The first thing to note is we have the operator deployed in each cluster. In the top, I have the control plane and the data plane clusters on the bottom. You can see that the operator is configured to be running as the data plane. There’s a couple other things to point out here. First is that what was in these two data plane clusters look to be the same, there are no restrictions that things need to be… it can be completely heterogeneous. The topology in cluster1 here can be completely different from cluster2. So I could have a K8ssandra data center in dc1 that might have 10 nodes, which are spread across so many racks and a completely different number of nodes with different resource requirements in dc2. We see we have Stargate in each, so it’s completely flexible in that regard. One of the requirements though, for the operator for K8ssandra is we need routable IPs. We assume required routable IPS between pods, in order for K8ssandra pods for the gossip protocol they need to be able to communicate with one another.
Let’s revisit the K8ssandraCluster object and look at a little bit more detail. The first thing to note is I have this datacenters property and the operator will create a CassandraDatacenter object for each object in that datacenters array. The CassandraDatacenter is another Custom Resource that comes from Cass Operator. Then, there’s a stargate property. The operator will create a Stargate object for each one of those Stargate properties. Stargate is another Custom Resource that’s installed by the K8ssandra Operator. Lastly, I have this k8sContext field that tells the operator in which cluster to create the CassandraDatacenter and the Stargate instances.
In this diagram here, there’s a couple things that I wanted to highlight or start to think about. First, there is a parent-child relationship between the K8ssandraCluster object and those other objects, namely the CassandraDatacenter and Stargate objects. There are other objects that are created as well, but for now, focus on these. Normally, if I’m dealing with… earlier in the slides, I showed an example of a deployment. When I create a deployment for NGINX, the deployment controller creates a replica set, the replica set will create a pod. There is a parent-child relationship there; the owner reference is set on the replica set and then on the pod. Where this comes into play is with deletion and garbage collection. I delete the deployment, there’s a cascading delete that will happen, the replica set, and the pods will get deleted. Well, there’s a stipulation; owner references can only be used for objects that reside in the same namespace. Here, the objects can be in different namespaces/clusters. So, their owner reference doesn’t really help us out. It’s not applicable here. The other thing that I want to mention is, when there’s a change to the K8ssandraCluster object, the operator wants to be notified and perform any work that’s necessary. Also, if there’s some modification or a change to the child objects: the CassandraDatacenter or Stargate object, the operator wants to be notified and make any changes necessary. So, we want to set up watches on those objects. Here’s how I would normally set up a watch for those objects. The top shows an example using the builder API, and it’s in bold here. The Owns method will succinctly set up that watch. That’s equivalent to the code down below, calls the Watches function, which creates the source. Then, specifying the CassandraDatacenter type. Then, creating the event handler, which says, ‘Create enqueue requests when the owner of the object is a K8ssandraCluster’. This doesn’t work though, because the owner’s needs to be in the same namespace. As it turns out, you can actually do this if the objects exist in the same namespace and different clusters; but I learned the hard way. Things can go bad, and it doesn’t work as expected. It really only works when the objects exist in the same namespace within the same cluster. So, we need a way then to set up those watches on those child objects, so that events get triggered for the parent K8ssandraCluster object.
In the example that documented the controller-runtime, didn’t really show how to make that happen. So, there’s a couple of things we can do to achieve this – with labels and a mapping function. The operator will add a label named k8ssandra.io/cluster to every object that it creates – every CassandraDatacenter and Stargate object. Then, it’s going to use EnqueueRequestsFromMapFunc objects to provide a transformation on the source of the event. The output of that transformation can be zero or more reconciliation requests.
Now, let’s take a look at what that looks like in practice. First, I have a map – my mapping function here. And it takes as an argument a client.Object, which is another interface from controller-runtime, which represents an arbitrary Kubernetes object. We get the labels on the objects and check to see if it has that cluster label. If it does, I’m creating a request that will be returned. After that, we see this NewKindWithCache, where the Kind here is the CassandraDatacenter, and then passing a remote cache. Then, calling the Watches function, which passes it the source and the EnqueueRequestsFromMapFunc event handler. The K8ssandra Operator code… this is actually done in a loop for each of the remote clusters, and not only doing it for the CassandraDatacenter, but for the target objects; and then as we add other objects that we need to create, manage, that would be done there as well. It’s also worth noting here that, in this mapping function, we’re not looking to see if the object is a CassandraDatacenter; we could, but this mapping function can be used for any of the child objects, regardless of whether it’s CassandraDatacenter or Stargate.
Now let’s shift gears and talk about accessing remote clusters. First, let’s talk about what happens… what’s for accessing a cluster from internally within a pod. And all this is done within a pod. Service account tokens in the API’s root certificate are automatically mounted in the pod at those locations. Environment variables for the API servers URL are injected into the pod as well. This is all the information that’s needed to configure a client connection. So, client-go knows to automatically look for the settings. If you remember, the example code I showed for MemCached; when I called the function to create the Manager, internally will create the Cluster object, which then initializes the Client. The Client, by default, assumes running in Cluster and will look for the service account token and cert at these locations. It happens pretty seamlessly. So, that’s great for running in cluster.
We need to figure out how to make this work when we’re accessing a cluster that’s remote. When we’re talking about accessing a cluster externally, we usually think about using a Kubeconfig; this is how clients like Kubectl work. Kubeconfig provides all the information that we need to access our own cluster. We’ll look at a few examples of what we’ve done, or different approaches we’ve looked at for K8ssandra Operator. We’ll look at example that uses a client certificate: one using the service account token and one using an OAuth token. The first example here shows a Kubeconfig using a client certificate. This is an example I just pulled from earlier on in the project where we used automated tests. I created a Kind cluster. Kind will go ahead and generate the Kubeconfig. Then, we would use that in the project. As I mentioned, we’re no longer using this. If you’re interested, let me know. I’ll be happy to explain why later on.
Next is an example for Kubeconfig from a gke_cluster using the OpenID connect token or OAuth token. The problem with this is… see in the bottom, we have the Auth provider section with an access token and so that’s an Auth provider-specific token that expires with gke for every hour. So if I use this with the operator, after an hour, it expires; and it’s not going to be able to talk to the remote cluster. In order to get the new access token, it would need to call the GCloud CLI tool to get a new access token. Well, this isn’t really practical, because that means in my operator’s container. It needs to have GCloud and I need to have a bunch of logic there for each cloud provider, for their authentication mechanisms with whatever tools or third party dependencies. That’s really not a viable solution. Lastly, we have a Kubeconfig using a service account token. Recall, I mentioned with the in-cluster configuration, each pod has the service account token automatically mounted. This approach is nice because it’s agnostic of any particular cloud provider and any cloud provider-specific authentication mechanism. The downside with this though is that we need to create the operator’s service account upfront in advance in the remote cluster, so we have that service account token generated. In K8ssandra Operator repo, we have a script that tries to make this a little bit more seamless and helps in that. It will extract the service account token from the remote cluster, create the Kubeconfig file and then store it, and create a secret with the Kubeconfig in the control plane cluster.
We have a script that generates the Kubeconfig, creates a secret and stores it in a control plane cluster. Now the question is, how does the operator find out about that? And the answer is a ClientConfig object, another Custom Resource that is installed by K8ssandra Operator. A ClientConfig is basically a pointer to one of those Kubeconfig secrets. When the operator starts up, it queries for the ClientConfig objects. For each one, it will create a cluster object. Then, it’ll grab the ClientConfig and the corresponding secret may extract the Kubeconfig and parse out the service account token, in order to create the Cluster object. Then, it adds the cluster object to the Manager. For each remote client, it stores in a cache. Those cache entries are keyed off of the context name, which we see over here in the spec, we have the contextName property. Right now, this has to be done at startup. In other words, if I have my operator running, and I want to create a new K8ssandraCluster with some other cluster that I just spun up in the EKS cluster, in order for the operator to be aware of that. And I create the ClientConfig, I have to restart off the restart the operator for to be aware of it. That’s certainly not ideal. The reason for that is, when I add the cluster to the Manager, then we start the Manager, the manager will start the cache. Once the Manager has been started, it won’t start the cache. So, if I create the Cluster object after the Manager’s already started, and I subsequently add the cluster to it, the cache won’t get started, which means that the remote cluster watches won’t be created and wired up. So, I created a ticket in controller-runtime for this. I wasn’t initially sure if this was just by design, or if this was a bug. It turned out to be a bug. So, I had submitted a patch for this. Hopefully, we’ll see this fixed in an upcoming release of controller-runtime. And then we’ll make the corresponding changes in K8ssandra Operator.
Now go ahead and wrap up with some closing thoughts. K8ssandra Operator is still early on in development. But hopefully, with the example I showed from the controller-runtime design doc, it’s really not that difficult to implement. There’s just a few lines of code you need to make so that your controller can watch objects, react to and get event notifications for a remote cluster, which is really awesome. Are there other changes that we might want to see in controller-runtime to better support multi-cluster? I’m probably going to say yes. I just mentioned the issue I raised – the Manager could start caches after it’s already been started. They probably need complimentary support for stopping the cache. There are probably other things as well. Managing remote clusters can be challenging. There’s went through a lot of work. And I think it’s still safe to say that for K8ssandra Operator is still ongoing for managing remote clusters. Not only just configuring the access, and then also making sure things are secure and locked down. But what do you do in a situation where clusters are maybe ephemeral – they may come and go. So, how do we determine whether a cluster is just as gone versus there’s some network partition and how do we deal with that? Lastly, take the time and invest the time and effort with automated testing. This is specific to multi-cluster controllers. It’s worth noting that you can get a lot of mileage out of testing locally. With K8ssandra Operator, we have full end-to-end tests that run with Kind clusters locally. For multi-cluster tests, that will spin up multiple Kind clusters; and that works really well. So, you don’t necessarily need to get multiple clusters running in the cloud to test multi-cluster; you can do things locally, which really helps speed things up with development and testing. It also helps for contributors. So, that is it.
Bart Farrell 55:41
That is it? As if you only talked about two things today.
John Sanda 55:47
Well, I am sorry. The future work. I said we’re still early on with the operator. There’s a lot more, in terms of workflows, that we need to support: adding and removing data centers, taking up components. So, there’s a lot there, we’ve really just scratched the surface. Then, of course, we want to release K8ssandra 2.0. That’s the big goal – get it out, get adoption, start collaborating. We definitely need more testing and documentation around cloud providers and client tooling. I looked at a colleague at one point, showed me a nice demo with some client tools with Linkerd. Linkerd does pre-flight checks, making sure that we have connectivity across clusters. Things like that provide an easy user experience with getting everything set up. Then lastly, if you attended KubeCon North America, there’s definitely a lot of buzz around multi-cluster, multi-clustered services. I think that’s something that we’re definitely looking to get into post 2.0 and see how we can leverage that for the operator. That’s it.
Bart Farrell 57:05
That’s it, once again. One thing I want to ask them is that something that’s come up a fair amount in our community has been – we’ve seen all these different operators, do you expect that standards will start to emerge for operators?
John Sanda 57:23
That’s a really good question. I’ve seen there’s been different projects. I think it would certainly be beneficial in a number of ways. When I think about standards for the operator in terms of just the CRDs and how they’re structured, what types of fields, and so forth; but I can also see a downside because what I might need for Cassandra might be entirely different than what I need for some other systems.
Bart Farrell 58:08
Hence the difficulty. I suppose, because we see this with certain companies, they’re like – do we build our own; do we buy it; how do we go about doing this. In some ways to perhaps make the entry point a little bit easier, because it seems easy for right now, for the short term. This is going to be one of the best solutions out there with relative ease to get data on Kubernetes relatively seamlessly. I think that in order to make that approach… but that’s like, that’s our job as a community. Let’s get these conversations out there; let’s see the experience of K8ssandra. We’ve heard from folks at Percona. We’ve talked to folks that have a test plan at scale, etc. So, you know how these things are being approached. I think there’s this sort of urge or desire to get there. But as you said, the intricacies and ins and outs of every single database are going to provide challenges that probably have to be responded in a more tailor-made fashion, I suppose.
John Sanda 58:59
Yes. This goes back to my slide in the operator section about domain-specific knowledge. I think, with the controller runtime, Operator SDK, and Kubebuilder, they’ve done a phenomenal job of trying to distill things down so that somebody wants to write an operator. I looked at that scaffolding, and a lot of those details we were covering… If I want to write an operator for – I’ve got some data store and I want to work, I don’t have to worry about any of that; I can get started now. Eventually, yes, it helps to understand and know what’s going on under the covers. But if I’m interested in just getting started with – okay, I want to create a StatefulSet for my data store, make sure there are some persistent volumes, I don’t have to worry about – how I write a controller, how I create the client connection and manage the cache. That’s all done. So, it’s a huge step forward and allows me to focus on my problem domain and I think that gets us a long way.
Bart Farrell 1:00:08
Very good point. With that, can you stop sharing your screen quickly just because of one final thing I got to share? I think we can all agree that this was a pretty awesome masterclass on the topic of Operator. As I said, we’ve had more than a couple of conversations about this, but in this we got a lot of depth and breadth. That being said, we have a little bit of a ritual in our community. While you were talking, we had our amazing graphic recorder. Let me know if you can see my screen. So, we have our amazing graphic recorder, Angel, creating a visual description of all the stuff that was mentioned. He even was able to squeeze in a Barbie doll dressed as a witch in honor of the first picture that you showed, which I really liked. That was a nice way to start. Obviously, a lot of topics are covered here. A lot of stuff to unpack. I was talking to John before we got started… If you arrived a little bit late and wanted to check out the slides, we’re going to drop those in Slack. So, no worries. Those will be there too. And John, definitely, we’re going to be having you back. The next thing we want to kind of spin up is to get a panel just focused on operators to compare and contrast the experiences. Maybe we can’t standardize operators, but we can at least standardize the conversation that we’re having about them, to make it a regular thing and see what’s going on out there. As we get more folks onboarded with running Data on Kubernetes. John, the best way to follow you Twitter/LinkedIn, any places where we should find you?
John Sanda 1:01:36
Bart Farrell 1:01:55
Yes, drop it in Slack. Anyway, John’s easy to find, great speaker. Really appreciate your time today with John and hopefully see you soon. Like I said, We gotta get that panel going. We’ll definitely let you know.
John Sanda 1:02:06
Thank you. This is awesome.
Bart Farrell 1:02:07
Good. Glad you enjoyed it. Take care everyone.