Friday, August 18, 2023

Breaking/fixing my K8S controller

Just a bit of blowing my own horn...

I managed to break the home lab's K8S config while attempting to troubleshoot a friend's cluster, a week or so back. The primary symptom (other than Multus not working) was showing up as a "NoExecute" status for the controller, when listing taints for the nodes. There were also log entries, complaining about not being able to delete sandboxes. This was also causing issues with Falco, which was deploying only 4 of an expected 6 pods (i.e., the DS wasn't installing on the controller), when trying to deploy it with Helm (a story for another time, I think).

In any case, after a number of Google searches and using "kubectl describe" against a few resources, I backtraced it to "Network plugin returns error: cni plugin not initialized". This turned out to be Multus.

Uninstalling and re-installing Multus corrected the issue. K8S then woke up and destroyed the old sandboxes, fired up the missing Falco pods, and the taint on the controller went back to its normal "NoSchedule" status.

Two things learned today:

  1. Piping "kubectl describe ..." into /bin/less is a good troubleshooting tool.
  2. The same YAML file, that you use to install something, can be used to delete it. In other words: "kubectl create -f multus-thick.yaml" for installing and "kubectl delete -f multus-thick.yaml" for uninstalling.

No comments:

Post a Comment