Open Source Multicloud Control Plane

This post originally appeared in a doc on December 4, 2018.

By: Bassam Tabbara, @bassam, Illya Chekrygin, @ichekrygin, and Jared Watts, @jbw976

https://github.com/crossplaneio/crossplane

Abstract

We introduce Crossplane -- an open source multicloud control plane. Crossplane enables workload portability across disparate environments, clusters, regions, and clouds. We model workloads and their resources which include the managed services of existing cloud providers as well as independent cloud offerings. A workload scheduler can optimize running workloads across cloud providers. Crossplane offers a clear separation of concern between developers and administrators. It’s based on the declarative resource model of the popular Kubernetes project, and applies many of the lessons learned in container orchestration to multicloud workload and resource orchestration.

Status: work in progress

This document contains the work-in-progress design for Crossplane. The content of this document is subject to change as we evolve the design. It does not completely match what’s in the codebase but the two are converging quickly. Your feedback, wisdom and contributions are most welcome.

Overview

Introducing Crossplane
Crossplane and Kubernetes
Use Cases
Goals
Project Status and Roadmap

Crossplane Walk-through

Configuring Crossplane as an Administrator
Deploying Wordpress as a Developer
Peeking behind the scenes

Proposed Architecture

Resources, Claims, Classes and Pools
Resources
Resource Claims
Resource Classes
Resource Lifecycle
Resource Controllers
Resource Pools and Auto-Scalers
Workloads
Workload Scheduler

Appendix A: Related Projects

Appendix B: Challenges and Open Questions

Appendix C: Frequently Asked Questions

Overview

Over the last decade, we’ve witnessed the emergence of cloud computing as the predominant IT paradigm. Cloud computing enables organizations to focus on their core business competencies and quickly respond to changing demands without expending significant resources on infrastructure and maintenance. Organizations can instantly take advantage of a set of world-class managed services across the infrastructure, platform and software layers. They can scale their businesses globally and do so with an efficient pay-per-use cost model.

Yet despite its predominance, cloud-computing remains completely under the control of a small set of cloud providers that have achieved the massive economies of scale required to run a cloud-computing business. Amazon, Microsoft and Google are at the front of the race and compete aggressively for market share and talent. While these cloud providers are themselves heavy adopters of open source technologies, cloud-computing remains predominantly proprietary and closed source.

In recent years, there has been an increased demand for running across cloud providers. By adopting a multicloud strategy, organizations can reduce costs, use differentiated managed services, increase geographic presence, take advantage of credits, avoid vendor lock-in, and meet strict compliance policies. However, running on multiple cloud providers requires a significant amount of engineering which can be prohibitive for the majority of organizations. No standard has emerged for workload portability across cloud providers, and management tools are sparse and fragmented. Furthermore, numerous approaches to solving this problem at the UI, tools, languages, and libraries level have either failed or did not reach wide adoption.

One notable exception is Kubernetes, which has emerged as the standard container orchestration platform. Kubernetes has always had a focus on workload portability, and defined a set of portable abstractions for running containers and their related networking and storage. An application that can be confined to just these abstractions can run without change across many different cloud providers, platforms, and distributions. Kubernetes has seen tremendous success and its wide adoption is indicative of the pent-up demand for more workload portability.

Yet, Kubernetes represents a small fraction of the workloads organizations are running in the cloud. Organizations are running more than just container workloads, and rely on many managed services for serverless, databases, object storage, analytics, big data, AI, ML, and others for which Kubernetes has limited support. While many of these managed services are based on popular open source software or have widely adopted interfaces, they are managed today behind proprietary cloud-provider APIs.

With the momentum behind Kubernetes, and the emergence of a strong extensibility story (CRDs and custom controllers or operators), we’ve seen an effort to run platform software like databases, big data, and analytics directly on-top of Kubernetes. While doing so helps achieve a higher degree of portability, it does mean that organizations return to managing platform software themselves and not using services managed by a cloud provider. Kubernetes custom controllers (operators) can decrease the management burden but they do not eliminate it, and they offer no SLA. Achieving multicloud portability should not require trading off the use of managed services.

We believe there is a need for a new portability layer on top of existing managed services and cloud offerings. We need a control plane that exposes a universal declarative-style API for cloud computing and offers full lifecycle management, orchestration, scheduling, and policy enforcement. A control plane that can span cloud providers, regions and offerings. A control plane that is driven by the open source community.

Introducing Crossplane

Crossplane is an open source multicloud control plane. It introduces a set of workload and resource abstractions on-top of existing managed services and cloud offerings that offers a high degree of workload portability across different cloud providers and vendors. Crossplane focuses primarily on managed services that are based on open source software or have widely adopted open interfaces, but does not exclude closed-source and proprietary managed services.

Crossplane presents a declarative management style API that covers a wide range of portable abstractions including databases, message queues, buckets, data pipelines, serverless, clusters, and many more. A single crossplane enables the provisioning and full-lifecycle management of infrastructure across a wide range of providers, vendors, regions, and offerings. Crossplane strives to become the universal API for cloud computing, and a control plane for smart controllers that can work across clouds.

As a control plane, Crossplane does not have any active components that are on the data path. It’s responsible for maintaining records for all objects managed by it, and runs controllers that automate the management of workloads across cloud providers. These controllers use the declarative configuration and are able to make changes, react to failures, and optimize services without involving an admin. Controllers span cloud providers and offerings, and are able to automate tasks across them.

Crossplane models multiple types of workloads including containers and serverless (coming!). Workloads consume resources which can be based on existing managed services of a cloud provider or independent cloud offerings. Resources can be statically or dynamically provisioned, and are automatically bound to the workloads that are consuming them. Crossplane along with specialized resource controllers automate the binding of resources to workloads, passing connection information, and dealing with network and security access.

Crossplane supports a clean separation of concerns between developers and administrators. Developers define workloads without having to worry about implementation details, environment constraints, and policies. Administrators can define environment specifics, and policies. The separation of concern leads to a higher degree of reusability and reduces complexity.

Crossplane includes a workload scheduler that can factor a number of criteria including capabilities, availability, reliability, cost, regions, and performance while deploying workloads and their resources. The scheduler works alongside specialized resource controllers to ensure policies set by administrators are honored.

Finally, Crossplane is designed for extensibility at every level. Adding new APIs, resource controllers, schedulers, and other components can be done easily and without requiring code in-tree. A big goal of Crossplane is to be a platform for multicloud development and to empower the open source community to build on-top of it.

Crossplane and Kubernetes

Crossplane is based on the declarative resource management architecture of the popular Kubernetes project. Crossplane uses the unmodified Kubernetes API server (kube-apiserver), etcd, and a few of the core controllers. For convenience, Crossplane can run directly on-top of an existing Kubernetes cluster without requiring any changes, even though Crossplane does not necessarily schedule or run any containers on the host cluster.

Despite the similarities, Crossplane should not be confused with Kubernetes. The two operate at different layers and solve different problems. Kubernetes is a container orchestrator and is responsible for managing containers (pods), and the resources they consume (storage volumes, and networking) across a set of nodes. Crossplane is a multicloud workload and resource orchestrator and manages workloads (container, serverless, others) and resources they consume (databases, message queues, buckets, data pipelines, and others) across a set of cloud providers or on-premise environments.

Crossplane can be thought of as a higher-order orchestrator across cloud providers. It’s similar in spirit to the federation project in Kubernetes, but is not limited to just containers and clusters, and varies significantly in scope and approach.

While Crossplane has great support for container workloads, it does not mandate their use. You can use Crossplane for scenarios and workloads that do not involve containers at all. Crossplane supports managed Kubernetes services among cloud providers just as it does other managed services for Databases, and Big Data. Crossplane can orchestrate the running of container workloads directly on-top of managed Kubernetes clusters.

By adopting the Kubernetes declarative resource management model, we hope that Crossplane will be instantly familiar to the cloud-native community. Using the same model and API also means we can leverage the healthy ecosystem of tools, UI, and language bindings. While some of these tools might work out of the box with crossplane (like kubectl, helm, and many others), some that assume Kubernetes constructs will need to be adapted. We believe that still is a big win. Our approach helps us focus on building the things that are important for multicloud scenarios instead of reinventing machinery. Finally, we see Crossplane as strong validation of the modularity and extensibility story of Kubernetes.

Use Cases

The following use cases have motivated the Crossplane project, we list them here to help scope the project and define its direction. Crossplane is at its infancy and while these scenarios might seem a bit far-reaching we believe there are enough pieces in place to make them a reality soon.

Multicloud Workload Portability

“I would like to author an application that runs without modifications on multiple environments, and cloud-providers”

Today most applications and workloads bake in many assumption of the cloud provider at development time. Changing cloud providers is costly and sometimes prohibitive because it’s a decision that was made at development time instead of at runtime. Alternatively an in-house abstraction can be created that allows for (limited) portability with a high cost of development, maintenance, and training of staff. We would like a way to author an application or workload that is free of the details of where and how it will be deployed. The developer would focus primarily on their workload and its requirements. When the workload is deployed the details about cloud providers, environment settings, and policies can take effect, and not require any changes to the workload.

Taking advantage of differentiated features across cloud provider

“I would like to take advantage of the differentiated features and offering across multiple cloud providers”

Cloud providers have differentiated service offerings especially in areas of machine learning, data processing and artificial intelligence. There is a natural cycle of competition between cloud providers, where they are able to differentiate temporarily before the others catch up. Organizations don’t want to lock themselves into only a single cloud provider’s innovation cycle, and instead want to take advantage of the most advanced cloud services no matter which cloud provider is offering it. Another example, is the availability of different hardware instance types or availability in geographic regions.

Furthermore, there are plenty of independent cloud offerings that are not offered as managed services of a cloud provider, or offer more differentiated features above and beyond a cloud provider’s managed service. For example, Google’ Spanner is only available within the Google Cloud, but CockroachDB is available across all cloud providers. ElasticSearch by the Elastic company offers more features than Amazon’s managed service. By utilizing a control plane that is not tied to a single cloud provider, an organization can have more choice and utilize best of breed services whether inside cloud providers or not.

Migration of Workloads

“I would like to migrate workloads from one environment to another, and even across cloud providers”

Migrating workloads across cloud providers typically means that you would need to engineer workloads to work on another cloud provider, and then deal with migration. Instead of going from one cloud provider to another, you can go from one cloud provider to a portable workload which is likely to be a smaller engineering effort. Migration of data and content can also be achieved by utilizing automated controllers that can move data across regions, clusters or clouds. Having a single control plane that can span multiple cloud providers means that controllers can do their work across clouds and use tried and tested methods for completing long running migration tasks.

Utilizing Multiple Cloud Providers Simultaneously

“I would like to utilize multiple cloud providers at the same time for improved availability, geographic presence and reach.”

By having portable workloads we can schedule work across multiple regions and cloud providers at the same time. You can utilize all the different regions of multiple cloud providers to improve availability, geographic presence, and reach. Workloads could move closer to users and customers.

Hybrid Workloads

“I would like to run my workload across my on-premise infrastructure as well as cloud providers.”

Many organizations still have large on-premise and datacenter footprints. By adopting portable workloads they can normalize on the same infrastructure and tools across their on-premise and cloud environments.

Automating Tasks across multiple regions and cloud providers

“I would like to automate complex tasks across cloud providers or regions”

Many tasks today are hard to achieve across cloud providers or even regions of a single cloud provider. For example, setting up replication across databases, or setting up a data protection solution. By having a single control plane that can span regions and cloud providers, and offers an extensibility model for writing custom controllers, we can automate many of these tasks.

Cost Optimization and Arbitrage

“I would like to run my services on the cloud with the lowest cost, and move workloads when the cost improves elsewhere.”

With portable workloads and clean separation of concern, workloads can move across cloud providers and environments. Picking a cloud provider can become a runtime scheduling decision, and can factor in cost, proximity, and other criteria. By having a scheduler that is able to span cloud providers and regions, we can implement advanced scheduling scenarios.

Cloud providers are famous for offering significant credits to win over new business. Some organizations have given their team mandates to run on different cloud platforms so they can take advantage of these credits.

Goals

Crossplane has the following goals:

Enable a higher degree of workload portability via abstractions.
Define a clean separation of concern between administrators and developers.
Use existing managed services of a cloud provider.
Use new cloud offerings from independent companies, like CockroachDB, Mongo, Confluent, Elastic and others that are not necessarily tied to a cloud provider.
Support static and dynamic provisioning of resources.
Give administrators full control of their clouds using policy, limits and quotas.
Enable full lifecycle management of resources via controllers.
Expose rich extensibility points enabling others to easily build on-top of Crossplane.
Full extensibility even for out-of-tree code.
Leverage as much of the Kubernetes declarative management machinery as possible.
Leverage as much of the cloud-native ecosystem as possible.

The following are non-goals:

Directly orchestrate container workloads on nodes.
Replace managed services in cloud providers.
Replace operators and controllers written for platform services.
Favor running platform software on Kubernetes in containers.
Improve the developer experience around YAML and configuration.
Improve the workflow of building, packaging, and deploying applications.
Define a packaging format for workloads or applications.

Project Status and Roadmap

Here’s a proposed roadmap for the project. The project is at its infancy so this is likely to change dramatically as our engagement with the community expands.

Version 0.1 - Proof of Concept - Wordpress demo running across three cloud providers

AWS, GCP and Azure
Managed Database Services (MySQL and PostgreSql)
Managed Object Storage Services (S3)
Resources, Claims, and Classes operational
Example container workloads and resource usage
NoOp Workload Scheduler

Version 0.2 - Real World Applications running on Crossplane, More Stateful Services

Pick a few real world applications to run on Crossplane like GitLab.
Additional Stateful Services (Redis, ActiveMQ, Memcached, ElasticSearch, others)
Support for In-Cluster Services (like Rook)
Container Workload Scheduling

Version 0.3 - Serverless and Big Data

Early support for Serverless Workloads
Early Policy support on Resource Classes
Flesh out Networking and Security for Cloud Providers

Crossplane Walk-through

Let’s walk-through an example of running a portable workload using Crossplane. We’ll use a container workload of the familiar Wordpress application. Wordpress runs in a container and requires a MySQL database instance.

We picked this example as a starting point since it will be the focus of our v0.1 release of Crossplane. In the future we will show more complex workloads including serverless and other managed services. Also the syntax used here is going to change in future releases.

Crossplane supports a clear separation of concerns between developers and administrators, and we will consider both perspectives in this walk-through. Let’s start with the administrators’ perspective.

Configuring Crossplane as an Administrator

To start with we need to install Crossplane which can run on an existing Kubernetes cluster for convenience. Crossplane can be installed using a helm chart as follows:

helm install --name crossplane crossplane

We’ll assume that we want to use AWS for this example, even through the same example can run on GCP and Azure too. Let’s add credentials for AWS in a Secret object. We will also create a Provider that has a reference to the secret and specifies the AWS region we’d like to use.

apiVersion: v1
data:
 credentials: BASE64ENCODED_AWS_PROVIDER_CREDS
kind: Secret
metadata:
 name: aws-creds
type: Opaque
----
apiVersion: aws.crossplane.io/v1alpha1
kind: Provider
metadata:
 name: aws-provider
spec:
 credentialsSecretRef:
   key: credentials
   name: aws-creds
 region: us-east-1

Most of the configuration created by administrators goes in a system namespace, by default crossplane-system. You can apply this configuration by using kubectl:

> kubectl --namespace crossplane-system apply -f aws-creds.yaml

Next we will create a ResourceClass that acts as a template with implementation details and policy for resources that will be dynamically provisioned by the workload. Since the Wordpress example requires a MySQL database instance, we’ll create a ResourceClass that supports dynamically provisioning a database instance using Amazon RDS service:

apiVersion: core.crossplane.io/v1alpha1
kind: ResourceClass
metadata:
 name: standard-mysql
 namespace: crossplane-system
parameters:
 class: db.t2.small
 masterUserName: masteruser
 vpcId: vpc-234311da3f
 size: "100"
default: true
resource: mysqlinstance.database.crossplane.io/v1alpha1
provisioner: rdsinstance.database.aws.crossplane.io/v1alpha1
providerRef:
 name: aws-provider
reclaimPolicy: Delete

The standard-mysql resource class specifies that the RDS provisioner is to be used when a developer requests an instance of MySQLInstance. It’s marked as the default resource class for resource claims of type MySqlInstance. The class also specifies a set of parameters that are specific to the provisioner, implementation or environment. In the future a resource class will also include policy that can constrain the kinds of resources created.

Next we will create another ResourceClass, this time for a Kubernetes cluster:

apiVersion: core.crossplane.io/v1alpha1
kind: ResourceClass
metadata:
name: standard-cluster
namespace: crossplane-system
parameters:
vpcId: vpc-234311da3f
default: true
resource: kubernetes.compute.crossplane.io/v1alpha1
provisioner: eks.compute.aws.crossplane.io/v1alpha1
providerRef:
name: aws-provider
reclaimPolicy: Retain

The standard-cluster resource class specifies a Kubernetes cluster that uses the EKS provisioner, and configures an AWS VPC for it.

Now let’s create an instance of a Kubernetes cluster for use by container workloads:

apiVersion: compute.crossplane.io/v1alpha1
kind: KubernetesCluster
metadata:
name: demo-cluster
namespace: crossplane-system
spec:
clusterVersion: 1.10
nodePools:
- name: default
machineType: m4.xlarge
autoScale: true
minNodes: 1
maxNodes: 10

Using the information in the standard-cluster resource class and the demo-cluster resource claim, a new EKS cluster is dynamically provisioned and includes the configuration from both. The EKS cluster will be setup with a default node pool that includes the minimum number of worker nodes. We can inspect the EKS object as follows:

> kubectl --namespace crossplane-system get ekscluster
NAME               VERSION   STATUS  RESOURCE                         RESOURCECLASS
eks-h33sc          1.10      Bound   crossplane-system/demo-cluster   standard-cluster

We can see that this EKS cluster is bound and ready to use. Also if we look at the AWS management console, here’s the EKS cluster that was provisioned:

At this point Crossplane is configured and we can move on to the developer perspective. The administrator has set credentials, defaults, implementation details, and policies. We created a single cluster to host workloads, and in the future even the cluster will be dynamically provisioned using an autoscaler.

Deploying Wordpress as a Developer

With Crossplane, a developer writes portable configuration, and is generally not concerned about the cloud provider, credentials, and other details of the environment. Let's start by creating the configuration for a MySQL database instance that is needed by wordpress. This configuration acts as a request for a MySQL database server instance, we refer to it as a resource claim.

apiVersion: database.crossplane.io/v1alpha1
kind: MySQLInstance
metadata:
name: wordpress-db
namespace: demo
spec:
version: 5.7
highly-available: true
autoUpgradePolicy: minor
encrypted: true
size: 100Gi

The spec defines the requirements for the MySQL instance independent of any implementation. For example, our example wordpress requires version 5.7 of MySQL, a highly-available instance that is spread across failure domains, encrypted storage, automatic upgrade policy on based on minor versions, and a 100 GB of storage. The rest of the implementation or environment specific details come from the default resource class as defined by the administrator. To provision the database, we can run the following:

kubectl apply -f wordpress-db.yaml

Next let's create a workload configuration that defines a container workload to run wordpress. A workload is a schedulable unit of work and contains a payload and requirements for where and how the workload should run. A workload scheduler is responsible for deploying the workload and in this case it will be scheduled to run on the EKS Kubernetes cluster provisioned by the administrator.

apiVersion: container.crossplane.io/v1alpha1
kind: Workload
metadata:
 name: wordpress
 namespace: demo
spec:
 # the version of Kubernetes desired
 clusterVersion: 1.10
 # the namespace to create on a target cluster
 targetNamespace: wordpress
 # this is a template for a deployment to be created on a target cluster
 targetDeployment:
   metadata:
     name: wordpress
     labels:
       app: wordpress
   strategy:
     type: Recreate
   template:
     metadata:
       labels:
         app: wordpress
     spec:
       containers:
       - name: wordpress
         image: wordpress:4.6.1-apache
         env:
         - name: WORDPRESS_DB_HOST
           valueFrom:
             secretKeyRef:
              name: wordpress-db-connection
              key: endpoint
         - name: WORDPRESS_DB_USER
           valueFrom:
            secretKeyRef:
             name: wordpress-db-connection
             key: username
         - name: WORDPRESS_DB_PASSWORD
           valueFrom:
            secretKeyRef:
             name: wordpress-db-connection
             key: password
         ports:
         - containerPort: 80
           name: wordpress
 # this is a template for a service to be created on a target cluster
 targetService:
   metadata:
     name: wordpress
   spec:
     ports:
       - port: 80
     selector:
       app: wordpress
     type: LoadBalancer
 # these are the resource used by this workload. The secret holding the connection
 # information will be generated and propagated to the target cluster.
 resources:
 - name: wordpress-db
   kind: MySqlInstance
   secretName: wordpress-db-connection

This workload specifies a Kubernetes deployment, service, and namespace to propagate to the target cluster. It also requests a minimum version of Kubernetes, but can include other requests that will be matched against the capabilities of the cluster.

The workload also references all resources it consumes in the resources: section. This helps Crossplane setup connectivity between the workload and resource, and create objects that hold connection information. In this example, a secret with name wordpress-db-connection is created and contains a connection string, user and password. This secret will be propagated to the target cluster so that it can be used by the wordpress container. Let’s go ahead and create the workload:

kubectl apply -f wordpress-workload.yaml

At this point the scheduler will assign this workload and after a few minutes, wordpress should be up and running in an EKS cluster, and consuming an RDS database.

Note that the developer created a portable configuration of wordpress. There was no reference to AWS, RDS or EKS in any of the configurations authored by the developer. They did not have to setup VPC, security groups, IAM policies, and other details of the environment and cloud provider. They were solely concerned with their workload and its requirements. This same configuration runs Azure or GCP without any changes.

Peeking behind the scenes

Let’s peek behind the scenes to understand all the different persistent objects and their interactions:

As soon as the wordpress-db instance was created, a resource controller noticed it and along with the default standard-db resource class configured by the administrator provisioned an instance of an RDSInstance object rds-34faea. We can look at the RDS instance as follows:

> kubectl --namespace crossplane-system get rdsinstance
NAME                  ENGINEVERSION   STATUS  RESOURCE           RESOURCECLASS
rds-34faea            5.7             Bound   demo/wordpress-db  standard-db

The RDS resource controller is responsible for creating the corresponding external resource in AWS and managing it from this point. If we look at the AWS management console, here’s the corresponding external resource:

Similarly, when the wordpress workload was created, the workload scheduler noticed it and scheduled it to run on the only available kubernetes cluster. The scheduler will propagate the workload payload along with connection information to the target cluster. If we connect directly to the EKS cluster we can see the deployment, service and secret that was generated and propagated:

> kubectl --namespace wordpress get all
NAME                              READY   STATUS    RESTARTS   AGE
pod/wordpress-74649b4db6-p4gmh    1/1     Running   0          6d

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/wordpress    ClusterIP   10.96.0.1    <none>        443/TCP   84d

The usage object will reference a secret the holds the connection information for the RDS instance, and will ultimately be propagated to the target cluster along with the workload. The resource usage also enables us to modify security groups in AWS to enable the wordpress container to connect to the RDS instance.

Proposed Architecture

Crossplane is based on the resource management model of Kubernetes. We use a declarative management approach in which we model workloads and their resources as portable abstractions that capture the user’s desired state, and rely on asynchronous controllers to continuously reconcile desired and observed state. All interactions between clients and controllers go through the same API server, which exposes CRUD operations on persistent objects that are stored in an etcd store.

Resources, Claims, Classes and Pools

Crossplane supports a rich model for managing resources that support a clean separation of concerns between developers or workload-owners that want to consume resources without knowing the details of how they are provisioned and managed, and administrators or infrastructure-owners that want to tightly control the details of the implementation and define policies. This separation of concern enables a higher degree of reusability and portability.

In Crossplane, a resource (orange rectangle above) represents an external piece of infrastructure ranging from low level services like clusters and servers, to higher level infrastructure like databases, message queues, buckets, and more. Resources are represented as persistent object within the crossplane, and they typically manage one or more pieces of external infrastructure (green rectangle above) within a cloud provider or cloud offering. For example, the RDSInstance resource in the crossplane would correspond to an external RDS instance within AWS.

To support workload portability we expose the concept of a resource claim and a resource class. A resource claim is a persistent object that captures the desired configuration of a resource from the perspective of a workload or application. It’s configuration is cloud-provider and cloud-offering independent and it’s free of implementation and/or environmental details. A resource claim can be thought of as a request for an actual resource and is typically created by a developer or application owner. We use the term “claim” because it also acts as a “claim check” for the actual resource and controls its lifetime. When the “claim” is deleted, the corresponding resource is deleted or reclaimed.

A resource class is configuration that contains implementation details specific to a certain environment or deployment, and policies related to a kind of resource. A resource class is typically created by an admin or infrastructure owner.

The spec of the resource claim and the information in the resource class are needed to create a resource, and once a resource is created, a resource controller is responsible for provisioning and managing the full lifecycle of the external resource. Resources can also be statically provisioned by administrators and in that case do not need a resource claim or class.

Finally, a resource pool can pool together a set of similar resource claims, and manage the scaling of them based on set of criteria.

Resources

A resource is a persistent object that has a “spec” that represents its desired configuration, and a “status” that represents its actual or observed state. This configuration is typically created in a system namespace. For example, let's look at the resource for an RDS instance in AWS:

apiVersion: database.aws.crossplane.io/v1alpha1
kind: RDSInstance
metadata:
 name: rds-445fab
 namespace: crossplane-system
spec:
 engine: mysql
 version: 5.9
 masterUsername: masteruser
 instance-type: db.m4.xlarge
 preferredMaintenanceWindow: weekly
 autoMinorVersionUpgrade: true
 multizone: true
 size: 50Gi
 vpc: vpc-223-551
 vpcSecurityGroups:
   - sg-2323-4445
   - sg-2323-4445

The “spec” of the resource represents the desired configuration and typically includes all the configuration required to provision and manage the full lifecycle of the external resource.

By convention we organize claims in API groups by category corresponding to the cloud provider or cloud offering, like “database.aws.crossplane.io”, and give them names that match the specific product or offering, for example, RDSInstance.

Resource Claims

A resource claim is a persistent object that captures the desired configuration of a resource from the perspective of a workload or application. It represents a portable resource that is free of implementation and environmental details. A resource claim can be thought of as a request for an actual resource and is typically created by a developer or application owner. This configuration can be created in a system or user namespace.

apiVersion: database.crossplane.io/v1alpha1
kind: MySQLInstance
metadata:
 name: wordpress-db
spec:
 version: 5.7
 highly-available: true
 autoUpgradePolicy: minor
 encrypted: true
 size: 100Gi

A resource claim is a persistent object that has a “spec” that represents its desired configuration, and a “status” that represents its actual or observed state. The “status” of a resource claim is closely correlated with the “status” of the bound resource.

By convention we organize claims in API groups by category, like “database.crossplane.io”, and give them descriptive names that are not tied to a cloud provider or offering, for example, MySQLInstance. We create a new kind of resource claim when the consumption requirements from the perspective of the workload are different. For example, even though RDS supports MySQL and PostgreSQL in the same managed service, their consumption from an application is very different. As a result we define two independent resource claims, MySQLInstance and PostgreSQLInstance.

As stated above, we use the term “claim” because the resource claim also acts as a “claim check” for the actual resource and controls its lifetime. When the “claim” is deleted, the corresponding resource is deleted or reclaimed. Resource claims are on a 1:1 basis with the resources they are bound to.

Resource Classes

A resource class is configuration that contains implementation details specific to a certain environment or deployment, and policies related to a kind of resource. Different classes might map to quality of service levels, or other classes of service as determined by the administrator or infrastructure owner. This configuration is typically created in a system namespace.

For example, let's look at the resource class for MySQL database instances that configures them to use RDS instances in AWS:

apiVersion: core.crossplane.io/v1alpha1
kind: ResourceClass
metadata:
 name: standard-mysql
 namespace: crossplane-system
parameters:
 class: db.t2.small
 masterUserName: masteruser
 size: "100"
default: true
claim: mysqlinstance.database.crossplane.io/v1alpha1
provisioner: rdsinstance.database.aws.crossplane.io/v1alpha1
providerRef:
 name: aws-provider
reclaimPolicy: Delete

The claim references the kind of resource claim supported, and provisioner references the kind of resource that will be provisioned. The parameters define configuration that is specific to a provisioner.

Resource Lifecycle

Resources adhere to the following lifecycle:

Provisioning

A resource can be statically or dynamically provisioned. Static provisioning is when an administrator creates the resource manually. They set the configuration required to provision and manage the corresponding external resource within a cloud provider or cloud offering. Once provisioned, resources are available to be bound to resource claims.

Dynamic provisioning is when an resource claim does not find a matching resource and provisions a new one instead. The newly provisioned resource is automatically bound to the resource claim. To enable dynamic provisioning the administrator needs to create one or more resource class objects.

Binding

A resource claim can bind to a resource explicitly by setting the resourceRef configuration in it's spec, or it can be bound by matching the resource claims’ config to the available resource configs. A control loop will attempt this matching and bind the resource.

We support delayed binding of resources to enable topology and region aware workload scheduling. This is accomplished by setting the bindingMode to WaitForFirstConsumer.

Usage

Workloads use resource claims and can connect to them. Depending on the resource, usage might involve external infrastructure to be provisioned such as ACLs, or security groups, for example.

Reclaiming and Deletion

When the user is done with a resource they must delete the resource claim. This allows the reclamation of the resource to proceed. We support two reclaim policies: Delete and Retain. Delete deletes the resource and the associated external resource. Retain supports manual reclamation of the resource by an administrator.

Resource Controllers

A resource controller is responsible for the entire lifecycle of a resource including provisioning, health, scaling, failover, and actively responding to external changes that deviate from the desired configuration.

A resource controller watches the resource claim and can dynamically provision a resource based on the configuration in the claim, and corresponding class. The controller is responsible for matching the requests in the resource claim to the parameters in the resource class, and performing mappings as necessary. If there are any conflicts, or if the resource claim can not be satisfied, the controller will not bind it.

Multiple resource controllers can be running at the same time, and can work collaboratively. A resource controller only watches resource claims it supports, and resource classes with a supported provisioner set. A single controller can support multiple resource claims and resources, or just support one. Also resource controllers can live out-of-tree and rely only on the API to communicate and do their work. No need for plugin schemes.

Resource controllers run reconcile loops to ensure that the desired state of the resource is reflected in the external resource. They are able to react to external changes and misconfigurations, as well as handle complex scenarios for failover, or recovery. For example, a MySQLInstance controller can notice a slow replica and recreate it if needed.

Finally, resource controllers participate in enabling workloads to connect to the resources they are consuming. This is accomplished with the resource usage object, which the controller watches and can configure the external resources for connectivity from a given workload.

Resource Pools and Auto-Scalers

A resource pool is a named template for a resource claim, and criteria for automatically provisioning the resource claims and scaling them. Resource pools and auto-scalers can be used to automatically provision Kubernetes clusters that host workloads, as well as other frameworks for hosting other types of workloads.

Auto-scalers can notice that a workload is unschedulable, or that we’ve exhausted existing capacities or quota, and respond by creating a new resource claim for a cluster to host the workload.

Workloads

We model workloads as a schedulable units of work that the user intends to run on a cloud provider. Crossplane supports multiple types of workloads including container and serverless (coming!).

Every type of workload has a different kind of payload. For example, a container workload can include a set of objects that will be deployed on a managed Kubernetes cluster, or a reference to helm chart, etc. A serverless workload could include a function that will run on a serverless managed service.

Workloads can contain requirements for where and how the workload can run, including regions, providers, affinity, cost, and others that the scheduler can use when assigning the workload.

For example, the following is a container workload that includes Kubernetes deployment and service. This is model is not complete at this point and we are still working out how to best model container workloads. Obviously a single deployment and service approach in a workload is no universal, and excludes other objects that people use like config maps and such. More to come in this area.

apiVersion: container.crossplane.io/v1alpha1
kind: Workload
metadata:
 name: wordpress
 namespace: demo
spec:
 # the version of Kubernetes desired
 clusterVersion: 1.10
 # the namespace to create on a target cluster
 targetNamespace: wordpress
 # this is a template for a deployment to be created on a target cluster
 targetDeployment:
   ...
 # this is a template for a service to be created on a target cluster
 targetService:
   ...
 # these are the resource used by this workload. The secret holding the connection
 # information will generated and propagated to the target cluster.
 resources:
 - name: wordpress-db
   kind: MySqlInstance
   secretName: wordpress-db-connection

A workload can consume a set of resource claims like databases, queues, buckets, and others. Workloads and the resources they consume are presented as portable abstractions that are not tied to a specific cloud provider or implementation. We use resource claims and classes to implement workload portability and enable an extensible model for different resource implementations.

Workload Scheduler

A workload scheduler is responsible for scheduling workloads on workload hosts. The workload hosts are themselves resource claims in our world, and can represent managed services that are dynamically provisioned within cloud providers.

For example, a container workload can be scheduled to run on a KubernetesCluster which is itself a resource claim, and can map to a managed Kubernetes service in a cloud provider like EKS, GKE and AKS. A scheduler deals with portable workloads and assigns them to portable workload claims. It’s a generic scheduler that does not understand the implementation details of any given resource.

The workload scheduler can include numerous criteria during scheduling that includes workload affinity/anti-affinity, taints/tolerations, selectors, capacity, quotas, policy, topology and cost. Many of these techniques are based on lessons learned from pod scheduling in Kubernetes. The scheduler is extensible and multiple schedulers can run within the same crossplane.

The workload scheduler also understands resource usage, and is also able to schedule the resources consumed by workloads. This is accomplished by delayed binding of resource claims to resources. You can set the bindingMode to WaitForFirstConsumer on a resource class to enable delayed binding. With this mode the scheduler is able to affect where resources are placed, for example, use the same region or zone for RDS as the EKS instance that hosts the workload consuming it.

The workload scheduler generates resource usage as it schedules workloads, and waits for them to be provisioned. This ensures that workloads do not start until the resources and their connection details have been provisioned.

In this section we will walk through a number of related projects and products. Our goal is not to provide a deep analysis of each, but instead to help motivate why we created Crossplane.

Open Service Broker and Service Catalog

The Open Service Broker and the Kubernetes Service Catalog are able to dynamically provision managed services in multiple cloud providers from Kubernetes. As a result it shares similar goals with Crossplane. However, service broker is not designed for workload portability, does not have a good separation of concern, and does not offer any integration with workload and resource scheduling. Service brokers can not span multiple cloud providers at once.

Kubernetes Federation

The federation-v2 project offers a single control plane that can span multiple Kubernetes clusters. It’s being incubated in SIG-multicluster. Crossplane shares some of the goals of managing multiple Kubernetes clusters and also the core principles of creating a higher level control plane, scheduler and controllers that span clusters. While the federation-v2 project is scoped to just Kubernetes clusters, Crossplane supports non-container workloads, and orchestrating resources that run as managed services including databases, message queues, buckets, and others. The federation effort focuses on defining Kubernetes objects that can be templatized, and propagated to other Kubernetes clusters. Crossplane focuses on defining portable workload abstractions across cloud providers and offerings. We have considered taking a dependency on the federation-v2 work within Crossplane, although it’s not clear at this point if this would accelerate the Crossplane effort.

AWS Service Operator

The AWS Service Operator is a recent project that implements a set of Kubernetes controllers that are able to provision managed services in AWS. It defines a set of CRDs for managed services like DynamoDB, and controllers that can provision them via AWS CloudFormation. It is similar to Crossplane in that it can provision managed services in AWS. Crossplane goes a lot further by offering workload portability across cloud multiple cloud providers, separation of concern, and a scheduler for workload and resources.

AWS CloudFormation, GCP Deployment Manager, and Others

These products offer a declarative model for deploying and provisioning infrastructure in each of the respective cloud providers. They only work for one cloud provider and do not solve the problem of workload portability. These products are generally closed source, and offer little or no extensibility points. We have considered using some of these products as a way to implement resource controllers in Crossplane.

Terraform

Terraform is a popular tool for provisioning infrastructure across cloud providers. It offers a declarative configuration language with support for templating, composability, referential integrity and dependency management. Terraform can dynamically provision infrastructure and perform changes when the tool is run by a human. Unlike Crossplane, Terraform does not support workload portability across cloud providers, and does not have any active controllers that can react to failures, or make changes to running infrastructure without human intervention. Terraform attempts to solve multicloud at the tool level, while Crossplane is at the API and control plane level. Terraform is open source under a MPL license, and follows an open core business model, with a number of its features closed source. We are evaluating whether we can use Terraform to accelerate the development of resource controllers in Crossplane.

Pulumi

Pulumi is a product that is based on terraform and uses most of its providers. Instead of using a configuration language, Pulumi uses popular programming languages like Typescript to capture the configuration. At runtime, Pulumi generates a DAG of resources just like terraform and applies it to cloud providers. Pulumi has an early model for workload portability that is implemented using language abstractions. Unlike Crossplane, it does not have any active controllers that can react to failures, or make changes to running infrastructure without human intervention, nor does it support workload scheduling. Pulumi attempts to solve multicloud scenarios at the language level, while Crossplane is at the API and control plane level. Pulumi is open source under a APL2 license but a number of features require using their SaaS offering.

Appendix B: Challenges and Open Questions

Modeling non-portable resource

Not all resources are portable or have open wire protocols. For example, DynamoDB is a popular database in AWS. Supporting DynamoDB within Crossplane makes a lot of sense for a number of reasons, 1) it enables the Crossplane to be the universal API for cloud-computing regardless of which managed services or offering are utilized, and 2) it makes the task of migration to other kinds of services easier.

Removing unnecessary core types from API machinery

Crossplane uses the Kubernetes API server today without any modifications. The API server has a number of APIs that do not apply in the Crossplane context, for example, Deployment and ReplicaSet. Ideally we would use a slimmer Kubernetes API server that does not have these in the future. There is already good work within the Kubernetes community in this direction.

Normalizing regions, zones, and units

Each cloud provider uses different terminology for defining their regions, and zones. We would like to create a common language across all of them, such that a workload could specify its requirements on regions and zone in a portable way. A similar thing needs to be done for machine and instance types, although Kubernetes has already done some great initial work here.

Authentication and Authorization

Each cloud provider uses different user authentication and authorization mechanisms. In addition, there also sometimes a significant degree of variance for authorized access between managed resources. For example, AWS relies heavily on the security groups, whereas, Google leverages Authorized Networks in addition to some Google-specific components like `cloudsql-proxy`. We would like to provide a uniform abstraction for authentication/authorization in terms of user credentials and/or service accounts to ensure that client applications and services access managed resources in the same manner, operating on the same abstract constructs.

Appendix C: Frequently Asked Questions

Question: Where did the name Crossplane come from?

Crossplane is the fusing of cross-cloud control plane. We wanted to use a noun that refers to the entity responsible for connecting different cloud providers and acts as control plane across them. Cross implies “cross-cloud” and “plane” brings in “control plane”.

Question: What's up with popsicle?

We believe in a multi-flavor cloud.

Question: Why is Upbound open sourcing this project? What are Upbound’s monetization plans?

Upbound’s mission is to create a more open cloud-computing platform, with more choice and less lock-in. We believe the Crossplane as an important step towards this vision and that it’s going to take a village to solve this problem. We believe that multicloud control plane is a new category of open source software, and it will ultimately disrupt closed source and proprietary models. Upbound aspires to be a commercial provider of a more open cloud-computing platform.

Question: What kind of governance model will be used for Crossplane?

Crossplane will be an independent project and we plan on making a community driven project and not a vendor driven project. It will have an independent brand, github organization, and an open governance model. It will not be tied to single organization or individual.

Question: Will Crossplane be donated to an open source foundation?

We don’t know yet. We are open to doing so but we’d like to revisit this after the project has gotten some end-user community traction.

Question: Does using multicloud mean you will use the lowest common denominator across clouds?

Not necessarily. There are numerous best of breed cloud offerings that run on multiple clouds. For example, CockroachDB and ElasticSearch are world class implementations of platform software and run well on cloud providers. They compete with managed services offered by a cloud provider. We believe that by having a open control plane for they to integrate with, and providing a common API, CLI and UI for all of these services, that more of these offerings will exist and get first-class experience in the cloud.

Question: How are resources and claims related to PersistentVolumes in Kubernetes?

We modeled resource claims and classes after PersistentVolumes and PersistentVolumeClaims in Kubernetes. We believe many of the lessons learned from managing volumes in Kubernetes apply to managing resources within cloud providers. One notable exception is that we avoided creating a plugin model within Crossplane.

Question: How is workload scheduling related to pod scheduling in Kubernetes?

We modeled workload scheduling after the Pod scheduler in Kubernetes. We believe many of the lessons learned from Pod scheduling apply to scheduling workloads across cloud providers.

Question: Can I use Crossplane to consistently provision and manage multiple Kubernetes clusters?

Crossplane includes an portable API for Kubernetes clusters that will include common configuration including node pools, auto-scalers, taints, admission controllers, etc. These will be applied to the specific implementations within the cloud providers like EKS, GKE and AKS. We see the Kubernetes Cluster API to be something that will be used by administrators and not developers.

Question: Other attempts at building a higher level API on-top of a multitude of inconsistent lower level APIs have not been successful, will Crossplane not have the same issues?

We agree that building a consistent higher level API on top of multitudes of inconsistent lower level API's is well known to be fraught with peril (e.g. dumbing down to lowest common denominator, or resulting in so loosely defined an API as to be impossible to practically develop real portable applications on top of it).

Crossplane follows a different approach here. The portable API extracts the pieces that are common across all implementations, and from the perspective of the workload. The rest of the implementation details are captured in full fidelity by the admin in resource classes. The combination of the two is what results in full configuration that can be deployed. We believe this to be a reasonable tradeoff that avoids the dumbing down to lowest common denominator problem, while still enabling portability.