DevOps

Linkerd—Gitops Deployment with ArgoCD and Cert-Manager, Automatic Certificate Rotation

Last Updated on: 20th December 2024, 11:57 am

Service mesh in Kubernetes is a useful tool. It offers efficient networking, connection routing, encryption, and enhanced monitoring of communication inside the cluster. However, deployment and certificate handling can be a pain sometimes, especially in the GitOps era. This article will describe how to deploy Linkerd without using the CLI tool, with full Cert-Manager and Trust-Manager integration, using ArgoCD in the Kubernetes cluster.

Background—Argo Rollouts

Any DevOps that I’ve talked with always asks the same question: “Why do we even need service mesh?! Just for show?” In our case, no, we have an actual, valid reason: traffic management. In the ContextAI project, we decided to introduce a canary deployment pipeline using Argo Rollouts. From the Argo Rollouts website:

“Argo Rollouts is a Kubernetes controller and set of CRDs which provide advanced deployment capabilities such as blue-green, canary, canary analysis, experimentation, and progressive delivery features to Kubernetes.”

Argo Rollouts is using service mesh in Kubernetes to route connections to new deployments according to a configured deployment strategy. The decision about the next stage of deployment is made based on response code metrics. Each step of deployment propagation is defined by the developer in the application manifest.

Argo Rollouts works both with the native Kubernetes Ingress Controller and with multiple types of service meshes. Native traffic management in Kubernetes doesn’t have the tools needed to handle fine-gained traffic routing, like redirecting some percentage of traffic to specific pods. For that type of task, we’ve decided to implement service mesh.

Service mesh—Linkerd

From Wikipedia:

“Linkerd is Cloud Native Computing Foundation’s fifth member project and the project that coined the term “service mesh.” Linkerd adds observability, security, and reliability features to applications by adding them to the platform rather than the application layer, and features a “micro-proxy” to maximize speed and security of its data plane. Linkerd graduated from CNCF in July 2021.”

Linkerd is one of the most well-known service meshes. It’s confirmed by multiple tests to be the fastest and least resource-hungry service mesh available. Meshing new services is as easy as adding new annotations to the service manifest. Linkerd Viz offers a dashboard that can be used to monitor traffic in a cluster.

Installation by CLI is also simple and straightforward, but nowadays, with GitOps and tools like Argo CD, this kind of installation may feel a little bit out of date. Another thing is in-cluster-certificate handling, which can be more effectively managed by Cert-Manager. 

Cert-Manager and Trust-Manager are designed to simplify issuing and renewal of certificates in the Kubernetes cluster. Most widely used in combination with LetsEncript certificates.

Deployment with Argo CD

Argo CD is a declarative GitOps continuous delivery tool for Kubernetes. In short, Argo CD keeps your cluster in sync with your repositories. It also offers monitoring features. In the case of RTB House, it’s a default way to deploy services to Kubernetes.

All charts used in this article are from the https://helm.linkerd.io/stable and https://charts.jetstack.io repositories. Our Linkerd deployment is split into three namespaces: 

  • Cert-manager—contains deployments for cert-manager and trust-manager.
  • Linkerd—contains deployments from linkerd-control-plane and linkerd-jaeger charts.
  • Linkerd-crd—contains deployments from linkerd-crds chart.
  • Linkerd-viz—contains deployments from linkerd-viz chart.

Each namespace is a separate application in Argo CD. Each application has its own chart, which includes Linkerd charts as a dependency and additional objects for this namespace (like cert-manager certificates).

apiVersion: v2
name: cert-manager
description: Cert-Manager chart for Kubernetes
type: application
version: 1.12.3
appVersion: "1.12.3"
dependencies:
  - name: cert-manager
    version: "v1.16.1"
    repository: "https://charts.jetstack.io"
  - name: trust-manager
    version: "v0.12.0"
    repository: "https://charts.jetstack.io"
apiVersion: v2
name: linkerd
description: Linkerd chart for Kubernetes
type: application
version: 1.16.11
appVersion: "1.16.11"

dependencies:
  - name: linkerd-control-plane
    version: "1.16.11"
    repository: "https://helm.linkerd.io/stable"
  - name: linkerd-jaeger
    version: "30.12.11"
    repository: "https://helm.linkerd.io/stable"
apiVersion: v2
name: linkerd-crds
description: Linkerd CRDs chart for Kubernetes
type: application
version: 1.8.0
appVersion: "1.8.0"

dependencies:
  - name: linkerd-crds
    version: "1.8.0"
    repository: "https://helm.linkerd.io/stable"
apiVersion: v2
name: linkerd-viz
description: Linkerd Viz chart for Kubernetes
type: application
version: 30.12.11
appVersion: "30.12.11"

dependencies:
  - name: linkerd-viz
    version: "30.12.11"
    repository: "https://helm.linkerd.io/stable"

All of the applications are deployed at once, with an auto-sync option, using application-set CRD from Argo CD. Initial deployment may take a few minutes because Argo first needs to finish applying Linkerd CRDs and then retry deploying control-plane again. Any updates after the first deployment are instant and seamless.

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: infra-prod
  namespace: argocd
spec:
 generators:
  - git:
     repoURL: https://**************/charts.git
     revision: _prod
  template:
    metadata:
      name: '{{path.basename}}'
      finalizers:
      - resources-finalizer.argocd.argoproj.io
    spec:
      project: cai
      source:
        repoURL: https://**************/charts.git
        targetRevision: _prod
        path: '{{.path.path}}'
        helm:
          releaseName: "{{path.basename}}"
          valueFiles:
          - values.yaml
          - values-prod.yaml
          ignoreMissingValueFiles: true
      destination:
        server: https://**************
        namespace: '{{path.basename}}'
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
          allowEmpty: false
        retry:
          limit: 3
          backoff:
            duration: 5s
            factor: 2

Cert-Manager certificates and Trust-Manager bundle for Linkerd

Linkerd depends on a master certificate called Trust Anchor. Normally, it would be fetched to the CICD machine from secret storage and used in deployment. In our setup, we didn’t want to put secrets on random machines; we wanted this to happen inside the cluster.

First, we’ll need to create new Cluster Issuers and Certificates in the cert-manager namespace. This specific part of our system doesn’t use external certificates, we can rely on self-signed ones. 

If you have a Trust Anchor certificate that you want to use, store it in Kubernetes secret and pass it as Certificate Authority directly to linkerd-trust-anchor Cluster Issuer, that will be created in next steps.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: linkerd-self-signed
  namespace: cert-manager
spec:
  selfSigned: {}

With a new, self-signed issuer, we can now create a Trust Anchor certificate that will be used by Linkerd. We’ll also create a Webhook Issuer certificate.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: linkerd-trust-anchor
  namespace: cert-manager
spec:
  isCA: true
  duration: 336h
  renewBefore: 24h
  issuerRef:
    name: linkerd-self-signed
    kind: ClusterIssuer
 secretName: linkerd-trust-anchor
 secretTemplate:
    labels:
      app.kubernetes.io/part-of: Linkerd
  commonName: identity.linkerd.cluster.local
  dnsNames:
    - identity.linkerd.cluster.local
  privateKey:
    algorithm: ECDSA
    size: 256
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: linkerd-webhook-issuer
  namespace: cert-manager
spec:
  isCA: true
  duration: 336h
  renewBefore: 24h
  issuerRef:
    name: linkerd-self-signed
    kind: ClusterIssuer
  secretName: linkerd-webhook-issuer
  secretTemplate:
    labels:
      app.kubernetes.io/part-of: Linkerd
commonName: webhook.linkerd.cluster.local
privateKey:
   algorithm: ECDSA
   size: 256

New certificates will be generated and stored in secret objects in K8s. With a Trust Anchor certificate, we can now create a Trust Anchor Custer Issuer. The same needs to be done for the Webhook Issuer.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: linkerd-trust-anchor
  namespace: cert-manager
spec:
  ca:
    secretName: linkerd-trust-anchor
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: linkerd-webhook-issuer
  namespace: cert-manager
spec:
  ca:
    secretName: linkerd-webhook-issuer

Now, we need to generate an identity issuer certificate using the new linkerd-trust-anchor Cluster Issuer.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: linkerd-identity-issuer
  namespace: cert-manager
spec:
  secretName: linkerd-identity-issuer
  secretTemplate:
    labels:
      app.kubernetes.io/part-of: Linkerd
  duration: 336h
  renewBefore: 24h
  issuerRef:
    name: linkerd-trust-anchor
    kind: ClusterIssuer
  commonName: identity.linkerd.cluster.local
  dnsNames:
  - identity.linkerd.cluster.local
  isCA: true
  privateKey:
     algorithm: ECDSA
  usages:
  - cert sign
   - crl sign
   - server auth
   - client auth

The last part of the cert-manager setup is creating a certificate bundle. This can be done with a Trust Manager. Installation is straightforward, when a Trust Manager will be installed we’ll need to declare a Identity Trust Roots certificate bundle. This bundle will be injected into all namespaces and used to validate certificates.

apiVersion: trust.cert-manager.io/v1alpha1
kind: Bundle
metadata:
  name: linkerd-identity-trust-roots
   namespace: cert-manager
spec:
  sources:
  - secret:
      name: linkerd-identity-issuer
      key: "ca.crt"
  target:
    configMap:
      key: "ca-bundle.crt"

That completes the list of objects needed to be created in the cert-manager namespace. Is that all the certificates we need?

Nope.

Now we need to declare certificates that will be used by Linkerd itself. We’ll need to create five certificates in linkerd namespace and two in linkerd-viz namespace. The important part is to add the “app.kubernetes.io/part-of: Linkerd” label to all secret objects, otherwise Linkerd won’t be able to access them.

Certificates in linkerd namespace:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: linkerd-identity-issuer
  namespace: linkerd
spec:
  secretName: linkerd-identity-issuer
  secretTemplate:
    labels:
      app.kubernetes.io/part-of: Linkerd
  duration: 168h
  renewBefore: 24h
  issuerRef:
    name: linkerd-trust-anchor
    kind: ClusterIssuer
  commonName: identity.linkerd.cluster.local
  dnsNames:
  - identity.linkerd.cluster.local
  isCA: true
  privateKey:
    algorithm: ECDSA
  usages:
  - cert sign
  - crl sign
  - server auth
  - client auth
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: linkerd-policy-validator
  namespace: linkerd
spec:
  secretName: linkerd-policy-validator-k8s-tls
  secretTemplate:
    labels:
      app.kubernetes.io/part-of: Linkerd
  duration: 168h
  renewBefore: 24h
  issuerRef:
    name: linkerd-webhook-issuer
    kind: ClusterIssuer
  commonName: linkerd-policy-validator.linkerd.svc
  dnsNames:
    - linkerd-policy-validator.linkerd.svc
  isCA: false
  usages
    - server auth
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: linkerd-proxy-injector
  namespace: linkerd
spec:
  secretName: linkerd-proxy-injector-k8s-tls
  secretTemplate:
    labels:
      app.kubernetes.io/part-of: Linkerd
  duration: 168h
  renewBefore: 24h
  issuerRef:
    name: linkerd-webhook-issuer
    kind: ClusterIssuer
  commonName: linkerd-proxy-injector.linkerd.svc
  dnsNames:
    - linkerd-proxy-injector.linkerd.svc
  isCA: false
  usages:
    - server auth
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: linkerd-sp-validator
  namespace: linkerd
spec:
  secretName: linkerd-sp-validator-k8s-tls
  secretTemplate:
    labels:
      app.kubernetes.io/part-of: Linkerd
  duration: 168h
  renewBefore: 24h
  issuerRef:
    name: linkerd-webhook-issuer
    kind: ClusterIssuer
  commonName: linkerd-sp-validator.linkerd.svc
  dnsNames:
    - linkerd-sp-validator.linkerd.svc
  isCA: false
  usages:
    - server auth
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: jaeger-injector
  namespace: linkerd
spec:
  secretName: jaeger-injector-k8s-tls
  secretTemplate:
    labels:
      app.kubernetes.io/part-of: Linkerd
  duration: 168h
  renewBefore: 24
  issuerRef:
    name: linkerd-webhook-issuer
    kind: ClusterIssuer
  commonName: jaeger-injector.linkerd-jaeger.svc
  dnsNames:
    - jaeger-injector.linkerd-jaeger.svc
  isCA: false
  usages:
    - server auth

Certificates in linkerd-viz namespace:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: tap-injector
  namespace: linkerd-viz
spec:
  secretName: tap-injector-k8s-tls
  secretTemplate:
    labels:
      app.kubernetes.io/part-of: Linkerd
  duration: 168h
  renewBefore: 24h
  issuerRef:
    name: linkerd-webhook-issuer
    kind: ClusterIssuer
  commonName: tap-injector.linkerd-viz.svc
  dnsNames:
    - tap-injector.linkerd-viz.svc
  isCA: false
  usages:
    - server auth
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: tap
  namespace: linkerd-viz
spec:
  secretName: tap-k8s-tls
  secretTemplate:
    labels:
      app.kubernetes.io/part-of: Linkerd
  duration: 168h
  renewBefore: 24h
  issuerRef:
    name: linkerd-webhook-issuer
    kind: ClusterIssuer
  commonName: tap.linkerd-viz.svc
  dnsNames:
    - tap.linkerd-viz.svc
  isCA: false
  usages:
    - server auth

Ok! At this point, we have all required certificates in place, in a structure required by Linkerd. Now, let’s use them in Linkerd itself.

Linkerd and Linkerd Viz integration with Cert-Manager

Linkerd has an option to use external certificates, which we’ll use in this case. To use certificates created by cert-manager, we need to point all components of Linkerd to corresponding secrets. 

Values.yaml for linkerd deployment:

linkerd-control-plane:
  identityTrustAnchorsPEM: ~
  identity:
    externalCA: true
    issuer:
      scheme: kubernetes.io/tls

  proxyInit:
    runAsRoot: true

  enablePodDisruptionBudget: true

  deploymentStrategy:
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 25%

  proxyInjector:
    externalSecret: true
    injectCaFrom: linkerd/linkerd-proxy-injector

  profileValidator:
    externalSecret: true
    injectCaFrom: linkerd/linkerd-sp-validator

  policyValidator:
    externalSecret: true
    injectCaFrom: linkerd/linkerd-policy-validator
  enablePodAntiAffinity: true
  controllerReplicas: 2
  webhookFailurePolicy: Fail

linkerd-jaeger:
  webhook:
    externalSecret: true
    injectCaFrom: linkerd/jaeger-injector

Values.yaml for linkerd-viz deployment:

linkerd-viz:
  dashboard:
    enforcedHostRegexp: .*

  tap:
    externalSecret: true
    injectCaFrom: linkerd-viz/tap

  tapInjector:
    externalSecret: true
    injectCaFrom: linkerd-viz/tap-injector

Certificate rotation in Linkerd

Here’s a part with a little hack. All certificates should have a defined end date. And all certificates should be rotated regularly. Usually, it’s done using the Linkerd CLI tool or by helm deployment (with new certificate in values). In both cases, new certificate secrets are created, and Linkerd pods are restarted to load those certificates. Without restarting, Linkerd pods will not load new certificates. After certificates are loaded by Linkerd, they are stored in pod memory.

But we can do the same set of tasks automatically inside the cluster.

Just to be sure, we’ll also force the recreation of cert-manager certificate secrets by deleting them. Cert-manager will issue fresh certificates instantly and store it in secrets. After that, the last thing to do is a rolling restart of Linkerd deployments. This way, we achieved zero-downtime certificate rotation.

This cronjob needs to be created both in the linkerd and linkerd-viz namespace.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: linkerd-restarter
  namespace: {{ .Release.Namespace }}
spec:
  concurrencyPolicy: Forbid
  schedule: '25 */6 * * *'
  jobTemplate:
    spec:
      backoffLimit: 2
      activeDeadlineSeconds: 600
      template:
        spec:
          serviceAccountName: linkerd-restarter
          restartPolicy: Never
          containers:
           - name: rotate-secrets
              image: bitnami/kubectl
              command:
                - 'kubectl'
                - 'delete'
                - 'secret'
                - '--selector'
                - 'app.kubernetes.io/part-of=Linkerd'
                - '-n'
                - '{{ .Release.Namespace }}'
            - name: restart-linkerd
              image: bitnami/kubectl
              command:
                - 'kubectl'
                - 'rollout'
                - 'restart'
                - 'deployment'
                - '--selector'
                - 'app.kubernetes.io/part-of=Linkerd'
                - '-n'
                - '{{ .Release.Namespace }}'

Restarter cronjob requires a dedicated service account, role, and role binding.

kind: ServiceAccount
apiVersion: v1
metadata:
  name: linkerd-restarter
  namespace: {{ .Release.Namespace }}
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: linkerd-restarter
  namespace: {{ .Release.Namespace }}
rules:
  - apiGroups: ["apps"]
    resources: ["deployments"]
    verbs: ["list", "patch"]
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["list", "delete"]
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: linkerd-restarter
  namespace: {{ .Release.Namespace }}
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: linkerd-restarter
subjects:
  - kind: ServiceAccount
    name: linkerd-restarter
    namespace: {{ .Release.Namespace }}

That’s all! Commit all objects to the repository, let ArgoCD deploy them, and verify that all Linkerd pods are up. Restart occurs every six hours, but it won’t be noticeable.

Final thoughts

With a few tricks, we were able to create a “fire-and-forget” deployment of Linkerd service mesh. Our certificates are temporary and are rotated regularly, without any intervention from operators. All operations happen in the background. We’ve been using this setup for a few months now without any significant issues. Linkerd works great for our canary and blue/green deployments and gives us additional insight into communication inside the cluster.

 

 

Comments are closed.

More in DevOps