Reimagine Kubernetes networking, let's get a KNI (Kubernetes Networking Interface/reImagined) demo going!

Today we’re going to get a demo for Kubernetes Networking Interface (KNI) rolling! If you wanna skip the preamble and get to the terminal, skip down to the requirements.

What is KNI? It’s a “Foundational gRPC Network API specific for Kubernetes”, and it “revisit[s] past decisions for location of pod network setup/teardown”, and in my words, it’s an amazing way for us to think about deeper integration for network plugins in the context of Kubernetes, a kind of next level for what we can only solve today using CNI (the container networking interface).

Mike Zappa, sig-network chair, CNI maintainer, containerd contributor and sport climber/alpinist and trail runner, has been spearheading an effort to bridge the gap for network implementations for Kubernetes, and if you want to see some more of what Zappa is thinking, check out this presentation: “KNI [K8s Network Interface]”.

And, maybe if I’m lucky, and Mike likes crack climbing someday I can get Mike to climb Upper West in the “tough schist” my neighborhood, I’ll just hike to the base though, I’m just your average granola crunching telemark hippy, but I love alpine travel myself.

Remember when I gave a talk on “CNI 2.0: Vive la revolution!”, I wrote that:

Did you know CNI is container orchestration agnostic? It’s not Kubernetes specific. Should it stay that way? People are looking for translation layers between Kubernetes and CNI itself.

What I’m talking about is that Container Networking Interface (CNI) (which I know, and love!), it’s not purpose built for Kubernetes. It’s orchestration engine agnostic – remember when people talked about different orchestration engines for containers? Like Mesos, or, wait? I can’t think of more for the life of me… It’s for a good reason I can’t think of another right now: Kubernetes is the container orchestration engine. CNI predates the meteoric rise of Kubernetes, and CNI has lots of great things going for it – it’s modular, it has an ecosystem, and it’s got a specification that I think is simple to use and to understand. I love this. But, I’m both a network plugin developer as well as a Kubernetes developer, I want to write tools that both do the networking jobs I need to do, but also integrate with Kubernetes. I need a layer that enables this, and… KNI sure looks like just the thing to bring the community forward. I think there’s a lot of potential here for how we think about extensibility for networking in Kubernetes with KNI, and it might be a great place to do a lot of integrations for Kubernetes, such as Kubernetes Native Multi-networking [KEP], dynamic resource allocation, and maybe even gateway API, and my gears are turning on how to use to further the technology created by the Network Plumbing Working Group community.

As a maintainer of Multus CNI, which can provide multiple interfaces to pods in Kubernetes by allowing users the ability to specify CNI configurations in Kubernetes custom resources, we have a project which does both of these things:

  • It can execute CNI plugins
  • It can operate within Kubernetes

Creative people that are looking to couple richer Kubernetes interaction with their CNI plugins look at Multus as a way to potentially act as a Kubernetes runtime. I love this creative usage, and I encourage it as much as it makes sense. But, it’s not really what Multus is designed for, Multus is designed for multi-networking specifically (e.g. giving multiple interfaces for a pod). It just happens to do both of these things well. What we really need is something that’s lofted up another layer with deeper Kubernetes intergration – and that something… Is KNI! And this is just the tip of the iceberg.

But on to today: Let’s get the KNI demo rocking and rolling.

Disclaimer! This does use code and branches that could see significant change. But, hopefully it’s enough to get you started.

Requirements

  • A machine running Fedora 38 (should be easy enough to pick another distro, though)
  • A basic ability to surf around Kubernetes.
  • A roobois latte (honestly you don’t need coffee for this one, it’s smooth sailing)

What we’re gonna do…

For this demo, we actually replace a good few core components with modified versions…

  • The Kubelet as part of Kubernetes
  • We’ll replace containerd with a modified one.
  • And we’ll install a “network runtime”

Under KNI, a “network runtime” is your implementation where you do the fun stuff that you want to do. In this case, we just have a basic runtime that Zappa came up with that calls CNI. So, it essentially exercises stuff that you should already have, but we’ll get to see where it’s hooked in when we’ve got it all together.

Ready? Let’s roll.

System basics setup

First I installed Fedora 38. And then, you might want to install some things you might need…

dnf install -y wget make task

Then, install go 1.21.6

sudo tar -C /usr/local -xzf go1.21.6.linux-amd64.tar.gz

Setup your path, etc.

Install kind

go install sigs.k8s.io/kind@v0.20.0

Install docker

…From their steps.

And add yourself as a user from the post install docs

Install kubectl

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo mv kubectl /usr/bin/
sudo chmod +x /usr/bin/kubectl

Make sure kind can run with:

kind create cluster
kubectl cluster-info --context kind-kind
kind delete cluster

Now let’s spin up Zappa’s Awesomeness

Alright, next we’re going to install and build from a bunch of repositories.

Demo repo

Mike’s got a repo together which spins up the demo for us… So, let’s get that first.

Update! There’s a sweet new way to run this that saves a bunch of manual steps, so we’ll use that now.

Thanks Tomo for making this much easier!!

git clone https://github.com/MikeZappa87/kni-demo.git
cd kni-demo
task 01-init 02-build 03-setup

Let it rip and watch the KNI runtime go!

Then create a pod…

cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: samplepod
spec:
  containers:
  - name: samplepod
    command: ["/bin/ash", "-c", "trap : TERM INT; sleep infinity & wait"]
    image: alpine
EOF

Let it come up, and you can see the last log item from kni…

$ docker exec -it test1-worker systemctl status kni
● kni.service
[...]
Jan 12 18:20:16 test1-worker network-runtime[576]: {"level":"info","msg":"ipconfigs received for id: e42ffb53c0021a8d6223bc324e7771d31910e6973c7fea708ee3f673baac9a1f ip: map[cni0:mac:\"36:e0:1f:e6:21:bf\" eth0:ip:\"10.244.1.3\" mac:\"a2:19:92:bc:f1:e9\" lo:ip:\"127.0.0.1\" ip:\"::1\" mac:\"00:00:00:00:00:00\" vetha61196e4:mac:\"62:ac:54:83:31:31\"]","time":"2024-01-12T18:20:16Z"}

Voila! We can see that KNI processed, it has all that information about the pod networking which it’s showing us!

But, this is only the tip of the iceberg! While we’re not doing a lot here other than saying “Cool, you can run Flannel”, for the next episode… We’ll look at creating a Hello World for a KNI runtime!

Using Robocniconfig to generate CNI configs with an LLM

Time to automate some secondary networks automagically by having configurations produced using a large language model (LLM)!

I saw ollama at Kubecon and I had to play with it :D It’s really fun and I had good luck. I love the idea of having a containerized LLM and then hooking up my own toys to it! …I’ve also played with doing such myself, maybe a little bit that I’ve mentioned on this blog or you’ve seen a Dockerfile or two from me. But! It’s really nice to have an opinionated installation.

Here’s the concept of Robo CNI

  • You (the user) give an instruction as a short blurb about what kind of CNI configuration you want, like “I’ll take a macvlan CNI with an eth0 master, and IP addresses in the 192.0.2.0/24 range”
  • RoboCNI runs that through a large language model, with some added context to help the LLM figure out how to produce a CNI configuration
  • Then, we test that a bunch of times to see how effectively it works.

I’m doing something I heard data scientists tell me not do with it (paraphrased): “Don’t go have this thing automatically configure stuff for you”. Well… I won’t be doing in production. I’ve had enough pager calls at midnight in my life without some robot making it worse, but, I will do it in a lab like… whoa!

So I wrote an application Robo CNI Config! It basically generates CNI configurations based on your prompts, so you’d use the core application like:

$ ./robocni --json "dude hook me up with a macvlan mastered to eth0 with whereabouts on a 10.10.0.0/16"
{
    "cniVersion": "0.3.1",
    "name": "macvlan-whereabouts",
    "type": "macvlan",
    "master": "eth0",
    "mode": "bridge",
    "ipam": {
        "type": "whereabouts",
        "range": "10.10.0.225/28"
    }
}

I basically accomplished this by writing some quick and dirty tooling to interface with it, and then a context for the LLM that has some CNI configuration examples and some of my own instructions on how to configure CNI. You can see the context itself here in github.

It has stuff like you’d imagine something like ChatGPT is kind of “pre-prompted” with, like:

Under no circumstance should you reply with anything but a CNI configuration. I repeat, reply ONLY with a CNI configuration.
Put the JSON between 3 backticks like: ```{"json":"here"}```
Respond only with valid JSON. Respond with pretty JSON.
You do not provide any context or reasoning, only CNI configurations.

And stuff like that.

I also wrote a utility to automatically spin up pods given these configs and test connectivity between them. I then kicked it off for 5000 runs over a weekend. Naturally my kube cluster died from something else (VMs decided to choke), so I had to spin up a new cluster, and then start it again, but… It did make it through 5000 runs.

Amazingly: It was able to get a ping across pods that were automatically approximately 95% of the time. Way better than I expected, and I even see some mistakes that could be corrected, too.

It’s kinda biased towards macvlan, bridge and ipvlan CNI plugins, but… You gotta start somewhere.

So, let’s do it!

Pre-reqs…

There’s a lot, but I’m hopeful you can find enough pointers from other stuff in my blog, or… even a google search.

  • A working Kubernetes install (and access to the kubectl command, of course)
  • A machine with a GPU where you can run ollama
  • Multus CNI is installed, as well as Whereabouts IPAM CNI.

I used my own home lab. I’ve got a Fedora box with a Nvidia Geforce 3090, and it seems to be fairly decent for running LLMs and StableDiffusion training (which is the main thing I use it for, honestly!)

Installing and configuring ollama

Honestly, it’s really as easy as following the quickstart.

Granted – the main thing I did need to change was that it would accept queries from outside the network, so I did go and mess with the systemd units, you can find most of what you need in the installation on linux section, and then also the port info here in the FAQ.

I opted to use the LLaMA 2 model with 13B params. Use whatever you like (read: Use the biggest one you can!)

Using RoboCNI Config

First clone the repo @ https://github.com/dougbtv/robocniconfig/

Then kick off ./hack/build-go.sh to build the binaries.

Generally, I think this is going to work best on uniform environments (e.g. same interface names across all the hosts). I ran it on a master in my environment, which is a cluster with one 1 master and 2 workers.

You can test it out by running something like…

export OLLAMA_HOST=192.168.2.199
./robocni "give me a macvlan CNI configuration mastered to eth0 using whereabouts ipam ranged on 192.0.2.0/24"

That last part in the quotes is the “hint” which is basically the user prompt for the LLM that gets added to my larger context of what the LLM is supposed to do.

Then, you can run it in a loop.

But first! Make sure robocni is in your path.

./looprobocni --runs 5000

My first run of 5000 was…. Better than I expected!

Run number: 5000
Total Errors: 481 (9.62%)
Generation Errors: 254 (5.08%)
Failed Pod Creations: 226 (4.52%)
Ping Errors: 0 (0.00%)
Stats Array:
  Hint 1: Runs: 786, Successes: 772
  Hint 2: Runs: 819, Successes: 763
  Hint 3: Runs: 777, Successes: 768
  Hint 4: Runs: 703, Successes: 685
  Hint 5: Runs: 879, Successes: 758
  Hint 6: Runs: 782, Successes: 773

In this case, there’s 5% generation errors, but… Those can be trapped. So discounting those… 95% of the runs were able to spin up pods, and… Amazingly – whenever those pods came up, a ping between them worked.

Insert:<tim-and-eric-mind-blown.jpg>

I had more fun doing this than I care to admit :D

Installing Oobabooga LLM text webui on Fedora 38 (with podman!)

Today we’re going to run Ooobabooga – the text generation UI to run large language models (LLMs) on your local machine. We’ll make it containerized so that you can keep everything sitting pretty right where it is, otherwise.

Requirements

Looks like we’ll need podman compose if you don’t have it…

  • Fedora 38
  • A nVidia GPU
  • Podman (typically included by default)
  • podman-compose (optional)
  • The nVidia drivers

If you want podman compose, pick up:

pip3 install --user podman-compose

Driver install

You’re also going to need to install the nVidia driver, and the nVidia container tools

Before you install CUDA, do a dnf update (otherwise I wound up with mismatched deps), then install CUDA Toolkit (link is for F37 RPM, but it worked fine on F38)

And the container tools:

curl -s -L https://nvidia.github.io/libnvidia-container/centos8/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo dnf install nvidia-container-toolkit nvidia-docker2

(nvidia docker 2 might not be required.)

If you need more of a reference for GPUs on Red Hat flavored linuxes, this article from the red hat blog is very good

Let’s get started

In my experience, you’ve gotta use podman for GPU support in Fedora 38 (and probably a few versions earlier, is my guess).

Go ahead and clone this oobabooga/text-generation-webui

From their README, you’ve gotta set this up to do the container build…

ln -s docker/{Dockerfile,docker-compose.yml,.dockerignore} .
cp docker/.env.example .env
# Edit .env and set TORCH_CUDA_ARCH_LIST based on your GPU model
docker compose up --build

Importantly – you’ve got to set the TORCH_CUDA_ARCH_LIST. You can check that you’ve got the right one from this grid on wikipedia DOUBLE CHECK – everything, but especially that you’re using the right .env file. Because I really made that take longer than it should when I got that wrong.

TORCH_CUDA_ARCH_LIST=8.6+PTX

First, try building ti with podman – it worked for me on the second attempt. Unsure what went wrong, but I built with…

podman build -t dougbtv/oobabooga .

WARNING: These are some BIG images. I think mine came out to ~16 gigs.

And then I loaded that image it into podman…

I need make a few mods before I can run it… Copy the .env file also to the docker folder (we could probably improve this with a symlink in an earlier step). And while we’re here we’ll need to copy the template prompts, presets, too.

cp .env docker/.env
cp prompts/* docker/prompts/
cp presets/* docker/presets/

Now you’ll need at least a model, so to download one leveraging the container image…

podman-compose run --entrypoint "/bin/bash -c 'source venv/bin/activate; python download-model.py TheBloke/stable-vicuna-13B-GPTQ'" text-generation-webui

Naturally, change TheBloke/stable-vicuna-13B-GPTQ to whatever model you want.

You’ll find the model in…

ls ./docker/models/

I also modify the docker/.env to change this line to…

CLI_ARGS=--model TheBloke_stable-vicuna-13B-GPTQ --chat --model_type=Llama --wbits 4 --groupsize 128 --listen

However, I run it by hand with:

podman run \
--env-file /home/doug/ai-ml/text-generation-webui/docker/.env \
-v /home/doug/ai-ml/text-generation-webui/characters:/app/characters \
-v /home/doug/ai-ml/text-generation-webui/extensions:/app/extensions \
-v /home/doug/ai-ml/text-generation-webui/loras:/app/loras \
-v /home/doug/ai-ml/text-generation-webui/models:/app/models \
-v /home/doug/ai-ml/text-generation-webui/presets:/app/presets \
-v /home/doug/ai-ml/text-generation-webui/prompts:/app/prompts \
-v /home/doug/ai-ml/text-generation-webui/softprompts:/app/softprompts \
-v /home/doug/ai-ml/text-generation-webui/docker/training:/app/training \
-p 7860:7860 \
-p 5000:5000 \
--gpus all \
-i \
--tty \
--shm-size=512m \
localhost/dougbtv/oobabooga:latest

(If you’re smarter than me, you can get it running with podman-compose at this point)

At this point, you should be done, grats!

It should give you a web address, fire it up and get on generating!

Mount your models somewhere

I wound up bind mounting some directories…

sudo mount --bind /home/doug/ai-ml/oobabooga_linux/text-generation-webui/models/ docker/models/
sudo mount --bind /home/doug/ai-ml/oobabooga_linux/text-generation-webui/presets/ docker/presets/
sudo mount --bind /home/doug/ai-ml/oobabooga_linux/text-generation-webui/characters/ docker/characters/

Bonus note: I also wound up changing my dockerfile to install a torch+cu118, in case that helps you.

So I changed out two lines that looked like this diff:

-    pip3 install torch torchvision torchaudio && \
+    pip3 install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2 -f https://download.pytorch.org/whl/cu118/torch_stable.html && \

I’m not sure how much it helped, but, I kept this change after I made it.

I’m hopeful to submit a patch for https://github.com/RedTopper/Text-Generation-Webui-Podman which isn’t building for me right now hopefully integrating what I learned from this. And then have the whole thing in podman, later.

Don’t make my stupid mistakes

I ran into an issue where, I got:

RuntimeError: CUDA error: no kernel image is available for execution on the device

I tried messing with the TORCH_CUDA_ARCH_LIST in the .env file and change it to 8.6+PTX, 8.0, etc, the whole list, commented out, no luck.

I created an issue in the meanwhile: https://github.com/oobabooga/text-generation-webui/issues/2002

I also found this podman image repo!

https://github.com/RedTopper/Text-Generation-Webui-Podman

and I forked it.

It looks like it could possibly need updates.

I’ll try to contribute my work back to this repo at some point.

Installing Stable Diffusion on Fedora 38

In today’s tutorial, we’re going to install Stable Diffusion on Fedora 38.

I’m putting together a lab machine for GPU workloads. And the first thing I wanted to do was get Stable Diffusion running, and I’m also hopeful to start using it for training LoRA’s, embeddings, maybe even a fine tuning checkpoint (we’ll see).

Fedora is my default home server setup, and I didn’t find a direct guide on how to do it, although it’s not terribly different from other distros

…Oddly enough I actually fired this up with Fedora Workstation.

Requirements

  • An install of Fedora 38
  • A nVidia GPU (if someone has insight on AMD GPUs, and wants to provide instructions, hit me up and I’ll update the article)

Installing Automatic Stable Diffusion WebUI on Fedora 38

I’m going to be using Vladmanic’s fork of Automatic1111 sd webui: https://github.com/vladmandic/automatic

Clone it.

Fedora 38 ships with Python 3.11, but some dependency for stable diffusion requires python 3.11, which will require a few extra steps.

Install python 3.10

dnf install python3.10

Also, before you install CUDA, do a dnf update (otherwise I wound up with mismatched deps for NetworkManager and couldn’t boot off a new kernel, and I had to wheel up a crash cart, just kidding I don’t have a crash cart or a KVM for my Linux lab so it’s much more annoying where I move my server to my workstation area, luckily I just have a desktop server lab)

Install CUDA Toolkit (link is for F37 RPM, but it worked fine on F38)

And – follow the instructions there. You might need to reboot now.

Make a handler script to export the correct python version… I named mine user-webui.sh

#!/bin/bash
export python_cmd=python3.10
screen -S ./webui.sh --listen

NOTE: I fire it up in screen. If you don’t have Stockholm Syndrome for screen you can decide to not be a luddite and modify it to use tmux. And if you need a cheat sheet for screen, there you go. I also use the --listen flag because I’m going to connect to this from other machines on my network.

Then run the ./user-webui.sh once to get the venv, it will likely fail at this point. Or if you’re a smarter python user, create the venv yourself.

Then enter the venv.

 . venv/bin/activate

Then ensurepip…

python3.10 -m ensurepip

And now you can fire up the script!

./user-webui.sh

Running LLM's locally and interacting with an API (Using Ooobabooga Web UI)

Have you played with ChatGPT yet? Ummm, yeah, who hasn’t!? I have pirate-styled rap battles to make! So let’s get right to the point so we can get back to generating rap-battles as soon as possible!

Today we’re going to run a LLM (Large Language Model) locally on one of our own machines, and we’re going to set it up so that we can interface with it via API, and we’ll even write a small program to test it out.

I have some ideas where I want to take some software I’m building and hook it up to one of these, later I’d like to train it on custom data and then query it. Maybe even have some like real-ish-time data fed to it and then be able to query it in near-real-time too. I also have some ideas for populating it with logs and then being like “yo, tell me what’s up with this machine?”

But yeah, instead of relying on a GPT service, I want to run the LLM myself, using open source tools.

Pre-requisites

Ok, so I’m not going to go deep into the details of the installation, so I’m just going to give some pointers. It’s not necessarily rocket science

First up, we’re going to install a webUI, OobaBooga: https://github.com/oobabooga/text-generation-webui

This is one of the few times I’m going to say the word “windows” on this blog, but, I actually installed mine on windows, because it’s a windows box that’s an art and music workstation where I’ve got my decent GPU (for gaming and also for stable diffusion and then associated windoze-y art tools). I follow this youtube video by @TroubleChute. I even used his opinionated script to automatically install Vicuna!

But you can also just install it with the instructions on the README, which appears to be really straight forward.

The Model we’re going to use, Vicuna, you can find it @ https://vicuna.lmsys.org/ – the thing that’s interesting about Vicuna is that it’s trained on crowd-sourced GPT output, and claims be 90% as good as GPT, which seems like a lofty statement. But so far it does seem like it’s pretty decent, even if it does parrot a lot of the kind of “walled garden” stuff that ChatGPT says (“As an AI language model, I can’t tell you what you asked me for” kind of stuff.)

Quick tip: After you install it and you’re playing with it on the WebUI, assuming you’ve installed Vicuna… Go to the bottom of the chat and find the radio button for “mode” and switch it to “instruct” and then in the “instruction template” drop down, select “Vicuna” – this will parse the output from the LLM so it makes more sense in context with your queries/conversation/prompts to it.

Configure the start up script to launch the API.

Well, first, in your text-generation-webui git clone – go and check out the ./extensions/api folder and make sure there’s a script there, there should be by default.

Next, we’re going to enable that by adding a flag to our startup script.

In your startup script, add a --extensions api field, and also a --listen flag (if you want to listen on all addresses and not just your loopback so you can access the api from your LAN/WAN).

So now, for example, mine looks like:

python server.py --extensions api  --listen --auto-devices --chat --model anon8231489123_vicuna-13b-GPTQ-4bit-128g --wbits 4 --groupsize 128

Now your api should be ready to query.

Calling the API

Then I whipped together a node.js script (with ChatGPT’s help!) to query the API.

I’ve got the script here in this github gist.

Lucky you, the script asks for suggestions for cheese. If you ask me, I’d tell you have to have a Bayley Hazen blue, or maybe the Cabot Clothbound both from Jasper Hill Farms (they’ve go the raddest cheese cave, super good cheese)

And you can just run that with

node oobabooga_api_query.js

And away you go. Or if you don’t like Node, just have ChatGPT re-write it for you in assembly or something!

(The script still needs a little help parsing, but it’s close! I think it’s kind of missing the kind of “mode” + “template” from the UI, so that’s next to figure out)

B-b-b-bonus hip hop battle

Just for you, I asked it to:

write a rap battle about api clients written in node.js vs. assembly

And it gave me:

Verse 1 (Node.js):

I'm Node, I'm fast and sleek,
When it comes to APIs, I can't be beat!
With JavaScript at my core,
I make building apps a breeze, no need to feel bore-ot.

Chorus:

API clients, we rule the land,
Node.js or Assembly, which one takes the stand?
We may not be as old school,
But when it comes to speed and coolness, we'll never lack cool.

Verse 2 (Assembly):

I may be older than you think,
But don't underestimate me, I still have some tricks.
With raw power and precision,
My code runs smoothly without any friction.

Chorus:

API clients, we rule the land,
Node.js or Assembly, which one takes the stand?
We may not be as old school,
But when it comes to speed and coolness, we'll never lack cool.

Outro:

In the end, it doesn't matter who wins this battle,
Because both Node.js and Assembly are great for different purposes, so let's celebrate!

Thanks for your “everyone gets an award!” style indifference LLM!

…But not too shabby for something running on my local!