In-Cluster Building
One of Garden's most powerful features is the ability to build images in your Kubernetes development cluster, thus avoiding the need for local Kubernetes clusters. This guide covers the requirements for in-cluster building and how to set it up.
This guide assumes you've already configured the Remote Kubernetes plugin.
tl;dr
If in doubt, use the following setup for builds:
kaniko
build mode, which works well for most scenarios.Use the project namespace for build pods.
Connect a remote deployment registry to use for built images. Note: You can also skip this and use the included in-cluster registry while testing, but be aware that you may hit scaling issues as you go.
Here's a basic configuration example:
The only tricky bit would be connecting the remote registry, so we suggest reading more about that below.
Security considerations
First off, you should only use in-cluster building in development and testing clusters! Production clusters should not run the builder services for multiple reasons, both to do with resource and security concerns.
You should also avoid using in-cluster building in clusters where you don't control/trust all the code being deployed, i.e. multi-tenant setups (where tenants are external, or otherwise not fully trusted).
General requirements
In-cluster building works with most Kubernetes clusters, provided they have enough resources allocated and meet some basic requirements. We have tested it on GKE, AKS, EKS, DigitalOcean, and various other custom installations.
The specific requirements vary by the build mode used, and whether you're using the optional in-cluster registry or not.
In all cases you'll need at least 2GB of RAM on top of your own service requirements. More RAM is strongly recommended if you have many concurrent developers or CI builds.
For the cluster-docker
mode, and the (optional) in-cluster image registry, support for PersistentVolumeClaim
s is required, with enough disk space for layer caches and built images. The in-cluster registry also requires support for hostPort
, and for reaching hostPort
s from the node/Kubelet. This should work out-of-the-box in most standard setups, but clusters using Cilium for networking may need to configure this specifically, for example.
You can—and should—adjust the allocated resources and storage in the provider configuration, under resources and storage. See the individual modes below as well for more information on how to allocate resources appropriately.
We also strongly recommend a separate image registry to use for built images. Garden can also—and does by default—deploy an in-cluster registry. The latter is convenient to test things out and may be fine for individual users or small teams. However, we generally recommend using managed container registries (such as ECR, GCR etc.) since they tend to perform better, they scale more easily, and don't need to be operated by your team. See the Configuring a deployment registry section for more details.
Build modes
Garden supports multiple methods for building images and making them available to the cluster:
cluster-buildkit
— A BuildKit deployment created for each project namespace.cluster-docker
— (Deprecated) A single Docker daemon installed in thegarden-system
namespace and shared between users/deployments. It is no longer recommended and we will remove it in future releases.local-docker
— Build using the local Docker daemon on the developer/CI machine before pushing to the cluster/registry.
The local-docker
mode is set by default. You should definitely use that when using Docker for Desktop, Minikube and most other local development clusters.
The other modes—which are why you're reading this guide—all build your images inside your development/testing cluster, so you don't need to run Docker on your machine, and avoid having to build locally and push build artifacts over the wire to the cluster for every change to your code.
The remote building options each have some pros and cons. You'll find more details below but here are our general recommendations at the moment:
kaniko
is a solid choice for most cases and is currently our first recommendation. It is battle-tested among Garden's most demanding users (including the Garden team itself). It also scales horizontally and elastically, since individual Pods are created for each build. It doesn't require privileged containers to run and requires no shared cluster-wide services.cluster-buildkit
is a new addition and replaces the oldercluster-docker
mode. A BuildKit Deployment is dynamically created in each project namespace and much like Kaniko requires no other cluster-wide services. This mode also offers a rootless option, which runs without any elevated privileges, in clusters that support it.
We recommend picking a mode based on your usage patterns and scalability requirements. For ephemeral namespaces, kaniko
is generally the better option, since the persistent BuildKit deployment won't have a warm cache anyway. For long-lived namespaces, like the ones a developer uses while working, cluster-buildkit
may be a more performant option.
Let's look at how each mode works in more detail, and how you configure them:
kaniko
This mode uses an individual Kaniko Pod for each image build.
The Kaniko project provides a compelling alternative to a Docker daemon because it can run without special privileges on the cluster, and is thus more secure. It also scales better because it doesn't rely on a single daemon shared across multiple users and/or builds; builds are executed in individual Pods and thus scale horizontally and elastically.
In this mode, builds are executed as follows:
Your code (build context) is synchronized to a sync service in the cluster, which holds a cache of the build context, so that each change can be uploaded quickly.
A Kaniko pod is created, which pulls the build context from the sync service, and performs the build.
Kaniko pulls caches from the deployment registry, builds the image, and then pushes the built image back to the registry, which makes it available to the cluster.
Configuration and requirements
As of Garden v0.12.22, the kaniko
build mode no longer requires shared system services or an NFS provisioner, nor running cluster-init
ahead of usage.
Enable this by setting buildMode: kaniko
in your kubernetes
provider configuration.
As of Garden v0.12.22, we also recommend setting kaniko.namespace: null
in the kubernetes
provider configuration, so that builder pods are started in the project namespace instead of the garden-system
namespace, which is the current default. This will become the default in Garden v0.13.
Note the difference in how resources for the builder are allocated between Kaniko and the other modes. For this mode, the resource configuration applies to each Kaniko pod. See the builder resources reference for details.
If you're using ECR on AWS, you may need to create a cache repository manually for Kaniko to store caches.
That is, if you have a repository like, my-org/my-image
, you need to manually create a repository next to it called my-org/my-image/cache
. AWS ECR supports immutable image tags, see the announcement and documentation. Make sure to set the cache repository's image tag mutability setting to mutable
. By default, Kaniko's TTL on old cache layers is two weeks, and every layer of the image cache must be rebuilt after that if the image tags are immutable
.
You can also select a different name for the cache repository and pass the path to Kaniko via the --cache-repo
flag, which you can set via the extraFlags
field. See this GitHub comment in the Kaniko repo for more details.
This does not appear to be an issue for GCR on GCP. We haven't tested this on other container repositories.
You can provide extra arguments to Kaniko via the extraFlags
field. Users with projects with a large number of files should take a look at the --snapshotMode=redo
and --use-new-run
options as these can provide significant performance improvements. Please refer to the official docs for the full list of available flags.
The Kaniko pods will always have the following toleration set:
This allows you to set corresponding Taints on cluster nodes to control which nodes builder deployments are deployed to. You can also configure a nodeSelector
to serve the same purpose.
cluster-buildkit
With this mode, a BuildKit Deployment is dynamically created in each project namespace to perform in-cluster builds.
Much like kaniko
(and unlike cluster-docker
), this mode requires no cluster-wide services or permissions to be managed, and thus no permissions outside of a single namespace for each user/project.
In this mode, builds are executed as follows:
BuildKit is automatically deployed to the project namespace, if it hasn't already been deployed there.
Your code (build context) is synchronized directly to the BuildKit deployment.
BuildKit imports caches from the deployment registry, builds the image, and then pushes the built image and caches back to the registry.
Configuration and requirements
Enable this mode by setting buildMode: cluster-buildkit
in your kubernetes
provider configuration.
In order to enable rootless mode, add the following to your kubernetes
provider configuration:
Note that not all clusters can currently support rootless operation, and that you may need to configure your cluster with this in mind. Please see the BuildKits docs for details.
You should also set the builder resource requests/limits. For this mode, the resource configuration applies to each BuildKit deployment, i.e. for each project namespace. See the builder resources reference for details.
The BuildKit deployments will always have the following toleration set:
This allows you to set corresponding Taints on cluster nodes to control which nodes builder deployments are deployed to. You can also configure a nodeSelector
to serve the same purpose.
Caching
By default, cluster-buildkit will have two layers of cache
A local file cache, maintained by the cluster-buildkit instance. The cache is shared for all builds in the same namespace
A
_buildcache
image tag in the configured deploymentRegistry will be used as an external cache. This is useful for fresh namespaces, e.g. preview environments
You can customize the cache configuration with the cache
option. You can list multiple cache layers, and it will choose the first one that generates any hit for all following layers.
In a large team it might be beneficial to use a more complicated cache strategy, for example the following:
With this configuration, every new feature branch will benefit from the main branch cache, while not polluting the main branch cache (via export: false
). Any subsequent builds will use the feature branch cache.
Please keep in mind that you should also configure a garbage collection policy in your Docker registry to clean old feature branch tags.
Multi-stage caching
If your Dockerfile
has multiple stages, you can benefit from mode=max
caching. It is automatically enabled, if your registry is not in our list of unsupported registries. Currently, those are AWS ECR and Google GCR. If you are using GCR, you can switch to the Google Artifact Registry, which supports mode=max
.
You can also configure a different cache registry for your images. That way you can keep using ECR or GCR, while having better cache hit rate with mode=max
:
For this mode of operation you need secrets for all the registries configured in your imagePullSecrets
.
cluster-docker
The cluster-docker
build mode has been deprecated and will be removed in an upcoming release. Please use kaniko
or cluster-buildkit
instead.
The cluster-docker
mode installs a standalone Docker daemon into your cluster, that is then used for builds across all users of the clusters, along with a handful of other supporting services.
In this mode, builds are executed as follows:
Your code (build context) is synchronized to a sync service in the cluster, making it available to the Docker daemon.
A build is triggered in the Docker daemon.
The built image is pushed to the deployment registry, which makes it available to the cluster.
Configuration and requirements
Enable this mode by setting buildMode: cluster-docker
in your kubernetes
provider configuration.
After enabling this mode, you will need to run garden plugins kubernetes cluster-init --env=<env-name>
for each applicable environment, in order to install the required cluster-wide services. Those services include the Docker daemon itself, as well as an image registry, a sync service for receiving build contexts, two persistent volumes, an NFS volume provisioner for one of those volumes, and a couple of small utility services.
Optionally, you can also enable BuildKit to be used by the Docker daemon. This is not to be confused with the cluster-buildkit
build mode, which doesn't use Docker at all. In most cases, this should work well and offer a bit of added performance, but it remains optional for now. If you have cluster-docker
set as your buildMode
you can enable BuildKit for an environment by adding the following to your kubernetes
provider configuration:
Make sure your cluster has enough resources and storage to support the required services, and keep in mind that these services are shared across all users of the cluster. Please look at the resources and storage sections in the provider reference for details.
Local Docker
This is the default build mode. It is usually the least efficient one for remote clusters, but requires no additional services to be deployed to the cluster. For remote clusters, you do however need to explicitly configure a deployment registry, and to have Docker running locally. For development clusters, you may in fact get set up quicker if you use the in-cluster build modes.
When you deploy to your environment (via garden deploy
or garden dev
) using the local Docker mode, images are first built locally and then pushed to the configured deployment registry, where the K8s cluster will then pull the built images when deploying. This should generally be a private container registry, or at least a private project in a public registry.
Similarly to the below TLS configuration, you may also need to set up auth for the registry using K8s Secrets, in this case via the kubectl create secret docker-registry
helper. You can read more about using and setting up private registries here.
Note that you do not need to configure the authentication and imagePullSecrets when using GKE along with GCR, as long as your deployment registry is in the same project as the GKE cluster.
Once you've created the auth secret in the cluster, you can configure the registry and the secrets in your garden.yml
project config like this:
You also need to login to the docker
CLI, so that images can be pushed to the registry. Please refer to your registry's documentation on how to do that (for Docker Hub you simply run docker login
).
Configuring a deployment registry
To deploy a built image to a remote Kubernetes cluster, the image first needs to be pushed to a container registry that is accessible to the cluster. We refer to this as a deployment registry. Garden offers two options to handle this process:
An in-cluster registry.
An external registry, e.g. a cloud provider managed registry like ECR or GCR. (recommended)
The in-cluster registry is a simple way to get started with Garden that requires no configuration. To set it up, leave the deploymentRegistry
field on the kubernetes
provider config undefined, and run garden plugins kubernetes cluster-init --env=<env-name>
to install the registry. This is nice and convenient, but is not a particularly good approach for clusters with many users or lots of builds. When using the in-cluster registry you need to take care of cleaning it up routinely, and it may become a performance and redundancy bottleneck with many users and frequent (or heavy) builds.
So, for any scenario with a non-trivial amount of users and builds, we strongly suggest configuring a separate registry outside of your cluster. If your cloud provider offers a managed option, that's usually a good choice.
To configure a deployment registry, you need to specify at least the deploymentRegistry
field on your kubernetes
provider, and in many cases you also need to provide a Secret in order to authenticate with the registry via the imagePullSecrets
field:
Now say, if you specify hostname: my-registry.com
and namespace: my-project-id
for the deploymentRegistry
field, and you have a container module named some-module
in your project, it will be tagged and pushed to my-registry.com/my-project-id/some-module:v:<module-version>
after building. That image ID will be then used in Kubernetes manifests when running containers.
For this to work, you in most cases also need to provide the authentication necessary for both the cluster to read the image and for the builder to push to the registry. We use the same format and mechanisms as Kubernetes imagePullSecrets for this. See this guide for how to create the secret, but keep in mind that for this context, the authentication provided must have write privileges to the configured registry and namespace.
See below for specific instructions for working with ECR.
Note: If you're using the kaniko
or cluster-docker
build mode, you need to re-run garden plugins kubernetes cluster-init
any time you add or modify imagePullSecrets, for them to work.
Using in-cluster building with ECR
For AWS ECR (Elastic Container Registry), you need to enable the ECR credential helper once for the repository by adding an imagePullSecret
for you ECR repository.
First create a config.json
somewhere with the following contents (<aws_account_id>
and <region>
are placeholders that you need to replace for your repo):
Next create the imagePullSecret in your cluster (feel free to replace the default namespace, just make sure it's correctly referenced in the config below):
Finally, add the secret reference to your kubernetes
provider configuration:
Configuring Access
To grant your service account the right permission to push to ECR, add this policy to each of the repositories in the container registry that you want to use with in-cluster building:
To grant developers permission to push and pull directly from a repository, see the AWS documentation.
Using in-cluster building with GCR
To use in-cluster building with GCR (Google Container Registry) you need to set up authentication, with the following steps:
Create a Google Service Account (GSA).
Give the GSA the appropriate permissions.
Create a JSON key for the account.
Create an imagePullSecret for using the JSON key.
Add a reference to the imagePullSecret in your Garden project configuration.
First, create a Google Service Account:
Then, to grant the Google Service account the right permission to push to GCR, run the following gcloud commands:
Next create a JSON key file for the GSA:
Then prepare the imagePullSecret in your Kubernetes cluster. Run the following command, if appropriate replacing gcr.io
with the correct registry hostname (e.g. eu.gcr.io
or asia.gcr.io
):
Finally, add the created imagePullSecret to your kubernetes
provider configuration:
Using in-cluster building with Google Artifact Registry
To use in-cluster building with Google Artifact Registry you need to set up authentication, with the following steps:
Create a Google Service Account (GSA).
Give the GSA the appropriate permissions.
Create a JSON key for the account.
Create an imagePullSecret for using the JSON key.
Add a reference to the imagePullSecret to your Garden project configuration.
First, create a Google Service Account:
The service account needs write access to the Google Artifacts Registry. You can either grant write access to all repositories with an IAM policy, or you can grant repository-specific permissions to selected repositories. We recommend the latter, as it follows the pattern of granting the least-privileged access needed.
To grant access to all Google Artifact Registries, run:
To grant access to one or more repositories, run for each repository:
Next create a JSON key file for the GSA:
Then prepare the imagePullSecret in your Kubernetes cluster. Run the following command and replace docker.pkg.dev
with the correct registry hostname (e.g. southamerica-east1-docker.pkg.dev
or australia-southeast1-docker.pkg.dev
):
Finally, add the created imagePullSecret and deploymentRegistry to your kubernetes
provider configuration:
Publishing images
You can publish images that have been built in your cluster, using the garden publish
command. See the Publishing images section in the Container Modules guide for details.
Note that you currently need to have Docker running locally even when using remote building, and you need to have authenticated with the target registry. When publishing, we pull the image from the remote registry to the local Docker daemon, and then go on to push it from there. We do this to avoid having to (re-)implement all the various authentication methods (and by extension key management) involved in pushing directly from the cluster, and because it's often not desired to give clusters access to directly push to production registries.
Cleaning up cached images
In order to avoid disk-space issues in the cluster when using the in-cluster registry and/or either of the kaniko
or cluster-docker
build modes, the kubernetes
provider exposes a utility command:
The command does the following:
Looks through all Pods in the cluster to see which images/tags are in use, and flags all other images as deleted in the in-cluster registry and.
Restarts the registry in read-only mode.
Runs the registry garbage collection.
Restarts the registry again without the read-only mode.
When using the
cluster-docker
build mode, we additionally untag in the Docker daemon all images that are no longer in the registry, and then clean up the dangling image layers by runningdocker image prune
.
There are plans to do this automatically when disk-space runs low, but for now you can run this manually or set up your own cron jobs.
You can avoid this entirely by using a remote deployment registry and the cluster-buildkit
build mode.
Pulling base images from private registries
The in-cluster builder may need to be able to pull base images from a private registry, e.g. if your Dockerfile starts with something like this:
where my-private-registry.com
requires authorization.
For this to work, you need to create a registry secret in your cluster (see this guide for how to create the secret) and then configure the imagePullSecrets field in your kubernetes
provider configuration:
This registry auth secret will then be copied and passed to the in-cluster builder. You can specify as many as you like, and they will be merged together.
Note: If you're using the kaniko
or cluster-docker
build mode, you need to re-run garden plugins kubernetes cluster-init
any time you add or modify imagePullSecrets, for them to work when pulling base images!
Last updated