🚨 The Problem

After installing the aws-ebs-csi-driver via Helm, the ebs-csi-node DaemonSet shows 0 pods running:

kubectl get daemonset ebs-csi-node -n kube-system -o wide

Bad output:

ebs-csi-node   0         0         0       0            0           kubernetes.io/os=linux   ...

🔍 Root Cause

This usually means the pods are not scheduling on any nodes. One common reason is that node taints are not tolerated by the DaemonSet.

Check current tolerations

kubectl get pod -n kube-system -l app=ebs-csi-node -o jsonpath='{.items[0].spec.tolerations}' | jq

Expected:

[
  {"operator": "Exists"},
  {"effect": "NoExecute", "key": "node.kubernetes.io/not-ready", "operator": "Exists"},
  {"effect": "NoSchedule", "key": "node.kubernetes.io/disk-pressure", "operator": "Exists"}
]

If the toleration list is missing or incomplete, the pods won’t get scheduled.

✅ Solution

Patch the Helm release values to explicitly tolerate all taints:

resource "helm_release" "ebs_csi_driver" {
  name       = "aws-ebs-csi-driver"
  chart      = "aws-ebs-csi-driver"
  repository = "https://kubernetes-sigs.github.io/aws-ebs-csi-driver"
  version    = local.ebs_csi_driver_version
  namespace  = "kube-system"

  set {
    name  = "node.tolerateAllTaints"
    value = true
  }

  set {
    name  = "node.tolerations[0].operator"
    value = "Exists"
  }
}

Then upgrade the release:

terraform apply

🔎 Validation

Re-check the daemonset:

kubectl get daemonset ebs-csi-node -n kube-system

Expected:

ebs-csi-node   1         1         1       1            1           kubernetes.io/os=linux   ...

And verify the tolerations:

kubectl get pod -n kube-system -l app=ebs-csi-node -o jsonpath='{.items[0].spec.tolerations}' | jq

✅ Outcome

Once the tolerations are set properly, the node pods start running and volume provisioning should begin to work as expected.

Make sure you also verify volume attachment behavior afterwards!