Sunday, August 4, 2024

Kublet refuses to start because of hidden swap partition

I was experiencing issues where coredns refused to start even after I installed a CNI. Works, kubelet refused to start after a reboot. Logs stated that swap was still enabled, even after I'd removed it via normal means (run swapoff and edit /etc/fstab).

The problem turned out to be still having a swap partitiion in the file system. It was created as part of the OS install and deleting it from /etc/fstab didn't cause the partition to be removed. In other words, somewhere along the line the system is checking the partition table, and alerting on swap, even though it's not actually being used (e.g., it doesn't show up in the /etc/fstab or via "swapon --show").

You can view your disk table via the "blkid" command. Example:

root@cf1:~# blkid
/dev/nvme0n1p1: UUID="8F39-9A16" BLOCK_SIZE="512" TYPE="vfat" \
  PARTUUID="b218aaa5-0fe2-4c78-8962-08ef25d7a85c"
/dev/nvme0n1p2: UUID="0f363b61-a996-4a19-9bee-9280360a68b0" \
  BLOCK_SIZE="4096" TYPE="ext4" PARTUUID=\
  "b30c1cff-9030-4211-bd58-ba7b07c6ef96"
/dev/nvme0n1p3: UUID="71988c4b-914e-4022-8b47-7e90423c9e60" \
  TYPE="swap" PARTUUID="608469c4-05"

The fix is to edit the file system and remove the partition, using fdisk. Then reboot.

In my case, since the machine was NVME-only, I had to run:

fisk /dev/nvme0n1
and delete the /dev/nvmr0n1p3 particition.

Don't forget to press "w" at the end to write the new fs table to disk.

Note: this may be caused by deleting the line from /etc/fstab, instead of just commenting it out. When I just commented out the fstab line in the worker nodes, they did not experience the same problem.

Sunday, April 28, 2024

Fix yer own jank, Google!

Just eff'in awesome. Google is sending me complaints that about how my blog is indexed. Since Google owns Blogger (which hosts this blog), you'd think that Google would recognize where the problem lies.

I've had enough. Between the new unfriendliness with email (new as in Cox moved their email to Yahoo and Google doesn't play nice with Yahoo), I'm looking at alternatives to Google's products. At this point, I'll pay for subscriptions rather than contribute any more time to develop work-arounds.

List me as: Meh.

Friday, December 29, 2023

Tailscale switch

As always, the documentation for something leave a bit unexplained. I was interested in using "tailscale switch" to switch between a small non-shared tailnet (managed by Tailscale) and a shared cyberclub tailnet (managed by Headscale). The unmentioned part is to never use "tailscale logout", which expires the authentication key. Instead, use the following procedure for setting up the multiple networks:

    tailscale login
    tailscale status
    tailscale down
    tailscale login --login-server=[headscale URL]
    tailscale status

In other words, first authenticate to the Tailscale-hosted network. Then run "tailscale down" and authenticate to the second network.

You can then run the following to list the available networks:

    tailscale switch --list

The output will look something like:

    ID    Tailnet     Account
    cde0  bob.github  bob@github*
    41da  othernet    othernet

The currently active network will be denoted by the asterisk at the end of the line. You can switch between the two with:

    tailscale switch ACCOUNTNAME

My reasoning for needing the Tailscale-hosted account: I periodically need access to a less-technical family member's network for troubleshooting. I gave them a GL-Net Slate AX wifi router, which has runs a Tailscale client (you have to add it). You can configure the physical switch (on the side of the router) to turn the tailnet on and off. End result: if they're having troubles with something in their network, they turn the switch on, call me, and I can remotely troubleshoot their house network.

Tuesday, November 28, 2023

Tasking for self...

Note to self: certs for house cluster expires in early March. You'll want the info from: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/

Sunday, October 22, 2023

Ouch

I'm guessing that there are others whose muscle-memory, when typing, is a bit munged. I finally did a full re-indexing of the document search engine (a little over 34K of docs). then then tested the engine but mispelled the search ("epbf" instead of "ebpf"). This produced 4 answers. I then mispelled "falco" by typing in "flaco" (which Google indicates is Spanish for "skinny" or "thin"). This produced two documents with "falco" mispelled and two Spanish langage files. I'm thinking I need to research if Recoll can do fuzzy searches.

Friday, August 18, 2023

Breaking/fixing my K8S controller

Just a bit of blowing my own horn...

I managed to break the home lab's K8S config while attempting to troubleshoot a friend's cluster, a week or so back. The primary symptom (other than Multus not working) was showing up as a "NoExecute" status for the controller, when listing taints for the nodes. There were also log entries, complaining about not being able to delete sandboxes. This was also causing issues with Falco, which was deploying only 4 of an expected 6 pods (i.e., the DS wasn't installing on the controller), when trying to deploy it with Helm (a story for another time, I think).

In any case, after a number of Google searches and using "kubectl describe" against a few resources, I backtraced it to "Network plugin returns error: cni plugin not initialized". This turned out to be Multus.

Uninstalling and re-installing Multus corrected the issue. K8S then woke up and destroyed the old sandboxes, fired up the missing Falco pods, and the taint on the controller went back to its normal "NoSchedule" status.

Two things learned today:

  1. Piping "kubectl describe ..." into /bin/less is a good troubleshooting tool.
  2. The same YAML file, that you use to install something, can be used to delete it. In other words: "kubectl create -f multus-thick.yaml" for installing and "kubectl delete -f multus-thick.yaml" for uninstalling.

Sunday, August 13, 2023

Prototyping my Falco install

Just spent a couple hours getting Falco + Sidekick + UI + Redis figured out. Following works. Next up: getting it to work in K8s.

#!/bin/bash

docker run -d -p 6379:6379 redislabs/redisearch:2.2.4

docker run -itd --name falco \
           --privileged \
           -v /var/run/docker.sock:/host/var/run/docker.sock \
           -v /proc:/host/proc:ro \
           -e HTTP_OUTPUT_URL=http://192.168.2.22:2801 \
           falcosecurity/falco-no-driver:latest falco --modern-bpf

docker run -itd --name falcosidekick -p 2801:2801 \
           -e WEBUI_URL=http://192.168.2.22:2802 \
           falcosecurity/falcosidekick

docker run -itd --name fs-ui -p 2802:2802 \
           -e FALCOSIDEKICK_UI_REDIS_URL=192.168.2.22:6379 \
           falcosecurity/falcosidekick-ui falcosidekick-ui