Revisiting my Personal Platform in 2023

2023-07-23

The tech landscape moves fast, and as a Platform/Developer experience engineer I like to stay up to date with recent technology and approaches so that I can deliver the best and most solid approach for my engineers. As such in this blog post I will explore what my personal development stack looks like, and how I want to kick it up a notch and reflect on the challenges I've had in the previous year. Strap in because this is gonna be a janky ride. But first I will dig into why I've got a personal platform and why that might be useful for you as well.

What do i mean by personal platform

You may've heard the term self-hosted thrown around, or homelab. These terms overlap a bit, but are also a bit orthogonal. Homelab is a personal or small deployment of stuff, you can tinker with experiment and enjoy using. Parts of it usually consist of HomeAssistant, Plex/Emby, various vms and such. Self hosted basically means off the shelf tools you can host yourself, whether for personal use or for enterprise.

When I mean personal platform, parts of it means a homelab, but taking it a step further, and specializing it for development usage. The goal is to develop a platform like a small to medium sized company that is capable of rolling out software and get the amenities you want to select for (more on that later). It should be useful and not just an experiment. You should actually use the platform to roll out software. One of the most important part of developing a platform is actually using it yourself (dog fooding) otherwise you will never learn the sharp edges and where your requirements break, and such.

So for me the basic requirements for a platform is:

A place to host deployments, this may be a vm, a raspberry pi, fly.io, aws. It doesn't matter too much, it all depends on your needs and what you want to develop.
A place to store source code, again the easiest option is just to choose GitHub, but you can also choose to go a step further and actually host the code yourself in the spirit of a homelab. I do this personally.
A domain or a way to interact with the services and deployments you build. You want to make the things you build be accessible to how wide of an audience you choose. Whether that is only yourself, your closest family and friends or the public. I personally do a mix, some stuff like the platform internals are only accessible internally, other services are public, and some are invite only.

If it is difficult to illustrate, you can kind of think of the platform as the same things you would get if you used, fly.io, aws, gcp or any of the Platform as a Service solutions out there.

Why build a platform only for yourself

This is a question I get a lot, I seemingly spent a lot of effort in building tools, services and whatnot, which is incredibly overkill for my personal needs. I think of it like so:

Get comfortable with advanced tooling and services, so when you actually need to do it in practice it is easy

It is part personal development, but also building up a certain expertise, that can be difficult to acquire in a job, it is also incredibly fun and filled with challenges.

It should also be noted that Personal platform would seem like incredibly overkill, but it is an incremental process, you may already have parts of it already. Just implicitly.

The beginning

My personal platform began as an old workstation running Linux (the distro doesn't really matter), with docker and docker-compose installed. Then I ran various homelab deployments, such as gitea, drone-ci, plex, etc.

My workflow would be to simply build a docker image on the service I was at. make ci, which docker build . and docker push, and finally I would ssh into the workstation, and bump the image version using image:latest. It is a fairly basic platform and a lot of the details weren't documented or automated. In the beginning everything would just be accessible internally and I would just use the hostname given by dhcp and so on. Such as http://home-server:8081/todo-list or something like that.

It worked fine for a while, but I began the needs to actually want to use some of those tools when I left the house. And as my tool stack grew and there were more hostnames and ports to remember I began to look for enhancements for the stack.

This is actually the most important part of building a personal platform. Start small, and grow in the direction of your requirements and needs. Do not start with a self hosted kubernetes with all the bells and whistles. And don't copy another persons stack, it will not fit your needs and you won't be able to maintain it.

In the beginning I choose to use tools such as upnp and ngrok, to expose these services as well as a dashboard service for discoverability. However, that didn't work out. First of all ngrok, upnp wasn't the most stable, and I didn't want to expose my home network to the internet in that way. I also didn't use the dashboard service that much, as just that extra step, made me not use the tools that I'd build that much. I would select for only those that I remembered the hostname and port for and not the more niche ones.

Getting a VPS

Getting my first vps for personal use, was a decision I made once I figured that there was a lot of ammenties that I would get out of the box, I would get a stable machine, which ran nearly 24/7, it has a public static ip, and was reachable from anywhere.

I choose hetzner, because it was the cheapest option I could get where I am at, with the required bandwidth cap and such.

I choose namecheap for a domain, and cloudflare for dns. Cloudflare technically isn't needed, but the tooling is nice.

At this point my stack was like this.

namecheap -> cloudflare -> hetzner vps

This was sort of useful, but not that much, I could host some things on the vps, but I'd like to use the cheap compute I had at home, but still make it reachable. I then began searching for a mesh vpn. I looked at openvpn, a bunch of other options, but finally landed on wireguard, because it seemed to be the most performant, and suited my needs quite perfectly.

In the beginning I wanted to just use the vpn as a proxy.

namecheap -> cloudflare -> hetzner vps -> wireguard -> home workstation

However, setting iptables rules and such turned out to be a nightmare, and as such I kept it simple and just installed caddy and nginx on the vps. Caddy for TLS certificates, and nginx for TCP load balancing and reverse proxying. (Caddy doesn't officially support TCP loadbalancing, only with a plugin which I don't want to use because of ergonomics).

So now the stack was like this:

namecheap -> cloudflare -> hetzner vps -> caddy/nginx ->  wireguard -> home workstation

I was really happy with this stack, and actually still use it.

The wireguard setup is setup as a bunch of point-to-point connections all pointing at the ingress node.

home workstation (interface) -> hetzner ingress vps (peer)
hetzner ingress vps (interace) -> home workstation (peer)

Home workstation:

[Interface]
PrivateKey = <home-workstation-priv-key>
Address = 10.0.9.2
ListenPort = 55107

[Peer]
PublicKey = <ingress-vps-public-key
AllowedIPs = 10.0.9.0/16 # allows receiving a wide range of traffic from the wireguard peer
Endpoint = <ingress-vps-public-static-ip>:51194
PersistentKeepalive = 25

Hetzner vps:

[Interface]
Address = 10.0.9.0
ListenPort = 51194
PrivateKey = <ingress-vps-private-key>

# packet forwarding
PreUp = sysctl -w net.ipv4.ip_forward=1

[Peer]
PublicKey = <home-workstation-public-key>
AllowedIPs = 10.0.9.2/32 # this peer should only provide a single ip
PersistentKeepalive = 25

It is incredibly simple and effective. I even have entries for on the vps for my android phone, mac, you name it. Super easy to setup, but requires some manuel handling. Tailscale can be used to automate this, but when I set this up it wasn't really a mature solution. But if I started today I would probably use it.

The important part is that the registration is only needed between the peer and the hetzner ingress vps. So if I add another vps at some point only that and the ingress vps, will need registration, but my phone would still be able to talk to it, because of the 10.0.0.0/16. That is of course as long as they share a subnet, i.e. 10.0.9.1 and 10.0.9.2.

Now my caddy things can just reverse proxy to my home workstation, without it needing a public port.

hetzner ingress vps -> caddy -> wireguard ip for home workstation and port for service -> home workstation -> docker service

Because of docker bridge networking, even if caddy is running in a docker container, it can still use the wireguard network interface and reverse proxy to that. This is what was and is still binding all my own services together, even if they don't share a physical network subnet.

Hosting

My hosting of personal services is now a mix between, home workstation for plex and other compute intensive services, and on hetzner, I've rented a few more for services I use frequently like gitea, grafana and so on.

infra

As you may imagine plex, drone, grafana etc. shouldn't be exposed to the internal, but I'd still like the convenience, so I've setup caddy to only allow the wireguard subnet, and use domain wildcard certs for certificates, such that it can still provision internal https certificates using lets encrypt.

There is a bunch more services I've left out, especially my own home built things. However, the deployment model is still as handheld as I mentioned in the beginning. Now they're just spread onto the vps and private nodes.

Development

My next iteration for development was using an open-source tool I've helped develop at work: https://github.com/lunarway/shuttle. The idea is to eliminate the need for sharing shell scripts, makefiles and configuration between different repositories. Now, just initialize a template shuttle.yaml file and fill it out with a parent template plan, and you've got all you need. I usually develop a mix of nextjs, sveltekit, rust-axum, rust-cron, rust-cli and finally go-service. All of these plans contains everything needed to build a docker image, prepare a docker-compose file and publish it. These again aren't public, because they specifically suit my needs.

I've ended up building my own incarnation of shuttle called cuddle https://git.front.kjuulh.io/kjuulh/cuddle it isn't made for public consumption, and was one of the first projects I built when I was learning rust.

My workflow has changed to simply be cuddle x ci and it will automatically build, test and prepare configs for deployment. It won't actually do the deployment step, that is left for CI in drone when it actually runs cuddle x ci --dryrun=false. I've developed a homegrown docker-compose gitops approach, where the deployment is simply creating a commit to a central repository with a docker-compose file, with a proper image version set. usually a prefix plus a uuid.

My vps simply has a cronjob that once every 5 minutes it does a git pull and executes a script

#!/bin/bash

set -e

LOG="/var/log/docker-refresh/refresh.log"
GIT_REPO="/home/<user>/git-repo"

exec > >(tee -i ${LOG})
exec 2>&1

echo "##### docker refresh started $(date) #####"

cd "$GIT_REPO" || return 1

git fetch origin main
git reset --hard origin/main

command_to_execute="/usr/local/bin/docker-compose up -d -v --remove-orphans"

find "$GIT_REPO" -type f \( -name "docker-compose.yml" -o -name "docker-compose.yaml" \) -print0 | while IFS= read -r -d '' file; do
    dir=$(dirname "$file")
    cd "$dir" || return 1
    echo "Executing command in $dir"
    $command_to_execute
done

# Monitor health check
curl -m 10 --retry 5 <uptime-kuma endpoint>

echo "##### docker refresh ended $(date) ##### "

This is simply run by cron and works just fine, I've setup uptime kuma to send a slack message to me if it isn't run once an hour.

The problems

This is my current state, except for some small experiments, you can never capture everything in a blog post.

The main problems now, are mostly related to the manual tasks I've got to do when creating a new web service i.e. axum, nextjs, svelte, go etc.

Create a new repository (manual)
Git push first (manual)
CI drone enable (manual)
GitOps repo update (automated)
Hostname inserted into caddy (manual)
If using authentication; setup (Zitadel manual)
Prometheus setup (manual registration)
Uptime kuma setup (manual registration)
Repeat for production deployment from step 5

Cuddle actually gives a lot out of the box, and I would quite easily be able to automate most of it if alot of the configuration for drone, prometheus etc, where driven by GitOps, but they aren't.

For service such as this blog, which is a rust-zola deployment, I also always have downtime on deployments because I only run a single replica. This isn't the end of the world, but I'd like the option to have a more declarative platform.

Visions of the future

I want to focus the next good while on converting as much of the manual tasks to be automated as possible.

The plan is to solve the root of the issues, and that is the deployment of the services and simply service discovery. By that I could continue with docker-compose and simply build more tooling around it. Many some heuristics on what is in the docker gitops repo. However, I could also venture into the path that is kubernetes.

We already maintain a fully declarative cluster setup in my dayjob, using ClusterAPI and flux. So that is the option I will go with.

Kubernetes

After some investigation and experiments, I've chosen to go with Talos and Flux. I simply have to copy a vm, register it, and I've got controller or worker nodes. I sadly have to run some Talos stuff imperatively, but to avoid the complexity around ClusterAPI this is suitable approach for now. Flux simply points at a gitops repo with a cluster path and it maintains the services I'd want to run.

This means I can run fluentbit, prometheus, traefik and such in kubernetes and automatically get deployments rolled out.

Cuddle

From the development point of view, I simply change the docker-compose templates to kubernetes templates, and I get the same benefit. Not much to say here. A release to master will automatically release to prod, and a release to a branch will create a preview environment for that deployment, which will automatically be pruned after a period of time after the branch has been deleted.

A prometheus and grafana dashboard maintains a list which preview environments are available, and how long they've been active for.

Future list of steps

Create a new repository (manual)
Git push first (manual)
CI drone enable (manual)
GitOps repo update (automated)
Hostname inserted into caddy (automated)
If using authentication; setup (Zitadel manual)
Prometheus setup (automated)
Uptime kuma setup (automated)
Repeat for production deployment from step 5

I've got some ideas for 3 but that will have to rely on a kubernetes operator sor something. The same goes for 6. As long as both has sufficient apis.

I've moved some of the operations from manual work, into kubernetes, but that also means that maintaining kubernetes is a bigger problem. As docker-compose didn't really have that much day 2 operation.s

Instead. I will have to rely on a semi automated talos setup for automatically creating vm images, and doing cluster failovers for maximum optime and comfort.

Conclusion

I've designed a future setup which will move things into kubernetes to relieve a lot of manual tasks. I will still need to develop tooling for handling kubernetes and various painpoints around it. As well as thinking up new solutions for the last manual tasks. Some may move into kubernetes operators, others into either chatops or clis.