Booting Go on Bare Metal
These days Docker has pretty much taken over how you deploy software. When
dealing with compiled languages it is ideal to deploy a statically compiled
binary and have a multi-stage Dockerfile which ends up being a scratch
container that runs the binary for the smallest image footprint.
I was thinking, what if instead of deploying a Docker image as a scratch container, I could deploy a statically compiled binary to a minimal Linux image and have it just run. No container runtime, no orchestration, just UEFI firmware loading a binary.
So I built a couple line main.go which runs an HTTP server serving
GET / and outputting a simple Hello, World!. The goal: build it into a
minilinux.img that I can boot in QEMU. The question: how small can I get
this image?
The final result is a ~7 MiB bootable image where the only userspace binary is a static Go executable running as PID 1. Oh yeah, secure.
What Does It Take to Boot?
As I’m developing this on Apple Silicon, I didn’t want to mess around with cross compilation toolchains. So we are going to be using Docker to build everything, like a mad man.
What does it take to actually have a Linux image boot our binary? Well, we need a bootloader, a kernel, a way to set up the initramfs, and something to act as init to run our binary as a service.
- Bootloader: GRUB is used by almost everything, surely this will work well
- Kernel: Might as well use the Debian linux-image-arm64 kernel since it is designed for general use and should have all the drivers (assuming all my problems will be driver related).
- Initramfs: Busybox for a shell, widely used for embedded Linux systems
- Init system: Runit as a lightweight service supervisor
If you haven’t encountered initramfs before, it’s a compressed cpio archive
that the kernel extracts into a tmpfs filesystem at boot. tmpfs is a filesystem
that lives entirely in RAM, so it’s fast but volatile. Everything in it
disappears on reboot. The kernel executes /init from this temporary root. On
a normal Linux system, initramfs contains just enough tools to find and mount
the real root filesystem (maybe it needs to load storage drivers or unlock an
encrypted volume), then switch_root hands control to the actual init system
on disk. But nothing says you have to switch. If your initramfs contains
everything you need, you can just stay there. That’s the key insight that makes
this whole project work.
Something else we need to deal with is DHCP and the network interface. So we
will use udhcpc to get an IP address, then use ip to bring the links up so we
can send a request to <host>:8080.
The First Attempt
My first attempt had a partition layout of:
img (GPT)
|- boot (FAT32, ESP)
| |- EFI/BOOT/BOOTAA64.EFI (GRUB)
| |- vmlinuz
| |- initramfs.img
|- data (ext4)
|- rootfs.squashfs
Getting here was not straightforward. The first few boots dropped me into a GRUB rescue shell, or showed nothing at all, or kernel panicked with cryptic messages about not finding init.
The console output was the first hurdle. QEMU with -nographic expects the
guest to output on the serial port, but the kernel defaults to the graphical
console. Without console=ttyS0 (x86) or console=ttyAMA0 (arm64) in the
kernel command line, the kernel boots silently and you have no idea what’s
happening. Debugging a black screen is not fun.
GRUB configuration was the next puzzle. GRUB needs to know where the kernel
and initramfs are, what command line to pass, and which device to boot from.
Getting the paths right when your ESP is partition 1 of a loop mounted image
built inside Docker took some trial and error. The grub.cfg ended up looking
something like:
set timeout=0
menuentry "minilinux" {
linux /vmlinuz console=ttyAMA0 root=/dev/vda2 ro
initrd /initramfs.img
}
Then the initramfs /init script had to actually work. It needed to mount the
ext4 partition, find the squashfs inside it, mount that as an overlay, then
switch_root into the new root and exec the real init. Each step is a chance
for something to go wrong, and when it does, you get “Kernel panic: not syncing:
Attempted to kill init!” which tells you absolutely nothing about which step
failed.
Eventually it all clicked. GRUB loaded the kernel, the kernel unpacked the
initramfs, busybox ran /init which mounted the squashfs, switch_root’d into
it, runit started, and my Go binary came up serving HTTP. Success!
Except the image was 180 MiB. The Debian kernel alone was 80 MiB. That seemed excessive for a “Hello, World!” HTTP server.
Stripping It Down
Looking at what was actually in the image, the breakdown was roughly:
| Component | Size |
|---|---|
| Debian kernel + modules | ~80 MiB |
| GRUB + EFI files | ~15 MiB |
| Squashfs rootfs | ~40 MiB |
| Initramfs (busybox) | ~5 MiB |
| My Go binary | ~2 MiB |
| Partition overhead | ~38 MiB |
Note: This is what I think it was, I don’t actually remember and it feels right. I’m not going back, you can’t make me go back.
The squashfs layer was the first to go. Why did I even have it? The original idea was that the initramfs would be tiny (just enough to mount the real root), and the squashfs would contain the actual system. But my “actual system” was just busybox, runit, and a Go binary. The complexity of having an initramfs that mounts an ext4 partition, finds a squashfs file, mounts that, then switch_roots into it was solving a problem I didn’t have.
Instead, I moved everything directly into the initramfs and stayed there. No ext4 partition, no squashfs, no switch_root. The new layout:
img (GPT)
|- boot (FAT32, ESP)
|- EFI/BOOT/BOOTAA64.EFI (GRUB)
|- vmlinuz
|- initramfs.img (contains busybox + runit + Go binary)
The initramfs went from a minimal “find the real root” setup to being the entire userspace. Busybox provided the shell and basic utilities, runit supervised services, and my Go binary was the only actual service.
Down to ~120 MiB. Still mostly kernel.
Next I switched from Debian’s kernel to Alpine’s linux-virt. Debian’s kernel
is built for physical hardware: it includes drivers for SATA controllers, USB
devices, graphics cards, sound cards, network adapters from a dozen vendors.
None of that matters in a VM where everything is virtio. Alpine’s kernel is
configured for virtual machines and strips out all that hardware support. The
image dropped to ~60 MiB.
Better, but I started wondering, do I need busybox? The Go binary is statically linked anyway and I could probably just do the syscalls on startup.
Going Full Go
The turning point was realising that everything busybox and runit were doing could be done in Go. Looking at what my init script actually did:
- Mount virtual filesystems (
/proc,/sys,/dev) - Bring up network interfaces
- Run DHCP to get an IP address
- Start the API server
- Supervise the service (restart if it crashes)
None of that requires a shell. Go can make syscalls directly, and there are pure Go libraries for networking. The only reason I had busybox was because “that’s how you do embedded Linux.” But statically compiled Go binaries don’t need libc, don’t need a shell, don’t need anything except the kernel :tm:.
So I rewrote the init system in Go. The entire program structure is simple:
func main() {
log.SetFlags(0)
runSysinit()
runAPI()
}
The runSysinit function replaces the shell script that used to do system
initialisation:
func runSysinit() {
mountFilesystems()
bringUpInterfaces()
lease := configureDHCP()
setHostname(lease)
log.Println(":: System ready")
}
Let me break down each piece.
Mounting Virtual Filesystems
The first thing any init does is mount the virtual filesystems that make Linux actually usable:
// source target fstype
var mounts = [][3]string{
{"proc", "/proc", "proc"}, // Process info
{"sysfs", "/sys", "sysfs"}, // Hardware/driver info
{"devtmpfs", "/dev", "devtmpfs"}, // Device nodes
{"devpts", "/dev/pts", "devpts"}, // Pseudoterminals
{"tmpfs", "/dev/shm", "tmpfs"}, // Shared memory
{"tmpfs", "/tmp", "tmpfs"}, // Temp files
{"tmpfs", "/run", "tmpfs"}, // Runtime data
}
func mountFilesystems() {
for _, m := range mounts {
os.MkdirAll(m[1], 0o755) // Create mount point
syscall.Mount(m[0], m[1], m[2], 0, "") // Mount it
}
}
Most of these aren’t real filesystems with files stored in RAM. They’re virtual filesystems that act as interfaces to the kernel:
| Mount | What it actually is |
|---|---|
/proc | Process information and kernel parameters. Reading /proc/cpuinfo doesn’t read a file, it asks the kernel to generate CPU info on the fly |
/sys | Hardware and driver information. This is where we find the ACPI power button device |
/dev | Device nodes. The kernel populates this with entries like /dev/null, /dev/urandom, and network interfaces |
/dev/pts | Pseudoterminal devices for SSH sessions (not that we have SSH) |
/tmp, /run, /dev/shm | These actually are tmpfs, used for runtime scratch space |
The initramfs itself only contains the Go binary and empty directories for these mount points. Everything else appears at runtime when we mount these virtual filesystems.
Networking with Netlink
Normally you’d use ip link set eth0 up to bring up a network interface. But
ip is a binary that comes from iproute2, and we don’t have that. Under the
hood, ip just talks to the kernel via netlink sockets. Go can do that too.
I used vishvananda/netlink, a pure Go implementation of the netlink protocol.
This is the same library Docker uses for container networking, so it’s well
tested:
var ifaces = []string{
"lo", // Loopback Interface
"eth0", // Default Interface
}
func bringUpInterfaces() {
for _, name := range []string{"lo", "eth0"} {
link, _ := netlink.LinkByName(name) // Get interface by name
netlink.LinkSetUp(link) // Equivalent to `ip link set <name> up`
}
}
The interface is now up, but it doesn’t have an IP address yet.
DHCP Without dhclient
DHCP clients like dhclient or udhcpc are typically written in C and shell
scripts. But DHCP is just UDP packets with a specific format:
- send DISCOVER
- receive OFFER
- send REQUEST
- receive ACK
Simple enough to implement from scratch, but I decided to be lazy and use
insomniacslk/dhcp which implements the protocol in pure Go and handles all
the edge cases I’d inevitably get wrong. I could’ve just done static IPs, but
why not be somewhat versatile.
One gotcha. The library uses getrandom(2) to generate transaction IDs, which
blocks until the kernel’s random number generator is initialised. In a minimal
VM with no hardware entropy source, this can hang forever. The fix is setting
UROOT_NOHWRNG=1 before calling the library, which makes it fall back to
/dev/urandom. I absolutely didn’t get affected by this.
func configureDHCP() *dhcpLease {
os.Setenv("UROOT_NOHWRNG", "1") // Don't block waiting for hardware entropy
lease, _ := dhcpExchange("eth0")
// Assign the leased IP to eth0 (like `ip addr add`)
eth0, _ := netlink.LinkByName("eth0")
netlink.AddrAdd(eth0, &netlink.Addr{
IPNet: &net.IPNet{IP: lease.IP, Mask: lease.Mask},
})
// Add default route via gateway (like `ip route add default via`)
netlink.RouteAdd(&netlink.Route{Gw: lease.Gateway})
// Write DNS config so resolution works
os.WriteFile("/etc/resolv.conf",
fmt.Appendf(nil, "nameserver %s\n", lease.DNS),
0o644,
)
return lease
}
Once we have a lease, we use netlink to assign the IP address and add the
default route. Finally, we write /etc/resolv.conf so DNS resolution works.
Technically we don’t need this since we’re just serving HTTP and never making
outbound requests that need DNS, but it’s nice to have if you ever want to
extend the binary to fetch something. I do end up removing this later.
Graceful Shutdown
The last piece was graceful shutdown. When QEMU or a hypervisor sends an ACPI power button event, you want the VM to shut down cleanly rather than just dying. Normally this is handled by acpid or systemd, but we have neither.
The ACPI power button shows up as a Linux input device. The Go binary scans
/sys/class/input/ to find the device named “Power Button”, then reads input
events from /dev/input/eventN.
Each event is a 24 byte input_event struct:
| timestamp (16 bytes) | type (2) | code (2) | value (4) |
We’re looking for type=EV_KEY, code=KEY_POWER, value=1 (pressed):
const (
EV_KEY = 0x01
KEY_POWER = 116
)
func listenPowerButton(shutdownCh chan<- struct{}) {
f, _ := os.Open(findPowerButtonDevice())
defer f.Close()
buf := make([]byte, 24)
for {
f.Read(buf)
evType := binary.LittleEndian.Uint16(buf[16:18])
evCode := binary.LittleEndian.Uint16(buf[18:20])
evValue := binary.LittleEndian.Uint32(buf[20:24])
if evType == EV_KEY && evCode == KEY_POWER && evValue == 1 {
shutdownCh <- struct{}{}
return
}
}
}
When the power button is pressed, the binary syncs filesystems and calls
syscall.Reboot(syscall.LINUX_REBOOT_CMD_POWER_OFF). Clean shutdown, no
external tools required.
The Result
Now the initramfs contained exactly one file: /init, my Go binary. No
busybox, no shell, no nothing.
Removing GRUB
With the Go init working, I turned my attention to the bootloader. GRUB is
powerful but it’s also “large” and complex. Modern UEFI firmware can load
executables directly from the ESP if they’re placed at the fallback path
/EFI/BOOT/BOOT{X64,AA64}.EFI.
systemd-boot is much simpler than GRUB. It reads a config file, loads a kernel
and initramfs, and hands off. That’s it. So I replaced GRUB with systemd-boot
and the image got smaller again.
But then I discovered Unified Kernel Images. A UKI bundles the kernel, initramfs, and command line into a single PE executable. UEFI firmware loads it directly. No bootloader configuration, no separate files, just one blob.
ukify build \
--linux=/uki/vmlinuz \
--initrd=/uki/initramfs.img \
--cmdline="console=tty0 console=ttyS0 random.trust_cpu=on" \
--output=/uki/BOOT.EFI
Now the ESP contains exactly one file: EFI/BOOT/BOOTX64.EFI, which is the
kernel, initramfs, and command line all in one.
Building Our Own Kernel
At this point I was still using Alpine’s kernel, and it was still the largest thing in the image. The kernel has loadable modules, but I wasn’t loading any of them. The kernel has thousands of drivers, but I only needed virtio.
Time to build a custom kernel.
I started with Linux 6.18 LTS and a minimal config. The key insight is that
CONFIG_MODULES=n means there’s no module loading at all. Every driver is
compiled directly into the kernel binary. This sounds limiting, but it means
no /lib/modules directory, no modprobe, no depmod. The kernel is entirely
self contained, and we know exactly what’s in it.
What we need (compiled in):
| Config | Why |
|---|---|
CONFIG_VIRTIO_NET, CONFIG_VIRTIO_BLK | VirtIO drivers for QEMU networking and block devices |
CONFIG_VIRTIO_PCI, CONFIG_PCI | PCI bus support (virtio devices attach via PCI) |
CONFIG_EFI_STUB | Kernel can boot directly from UEFI firmware |
CONFIG_BLK_DEV_INITRD, CONFIG_RD_XZ | Initramfs support with XZ decompression |
CONFIG_PROC_FS, CONFIG_SYSFS, CONFIG_DEVTMPFS | Virtual filesystems we mount |
CONFIG_PACKET, CONFIG_INET, CONFIG_UNIX | Networking stack for DHCP and sockets |
CONFIG_INPUT_EVDEV, CONFIG_ACPI_BUTTON | ACPI power button for graceful shutdown |
CONFIG_HW_RANDOM_VIRTIO, CONFIG_RANDOM_TRUST_CPU | Entropy sources so DHCP doesn’t block |
CONFIG_SERIAL_8250 / CONFIG_SERIAL_AMBA_PL011 | Serial console (arch dependent) |
What we disable:
| Config | Why |
|---|---|
CONFIG_MODULES | No module loading, everything built in |
CONFIG_ETHERNET | Disables all vendor NIC drivers (virtio_net is separate) |
CONFIG_EXT4_FS, CONFIG_XFS_FS, etc | No disk filesystems, rootfs is tmpfs |
CONFIG_SCSI, CONFIG_ATA, CONFIG_NVME | No storage drivers, boot from initramfs |
CONFIG_USB_SUPPORT | No USB in a VM |
CONFIG_SOUND, CONFIG_DRM, CONFIG_FB | No audio or graphics |
CONFIG_NETFILTER, CONFIG_WIRELESS | No firewall or wifi |
CONFIG_KVM, CONFIG_XEN | We’re a guest, not a hypervisor host |
CONFIG_FTRACE, CONFIG_DEBUG_KERNEL | No debugging or tracing overhead |
There are more options than this (TTY support, size optimisations, arch specific drivers, etc.) but these are the important ones. Many options also pull in dependencies automatically, so the actual config ends up longer than you’d expect.
Setting CONFIG_EXPERT=y is crucial here. Many of these options are forced on
by default in non-expert mode. Without it, the kernel config system ignores
your attempts to disable things like CONFIG_INPUT or CONFIG_HID. Which I
didn’t realise until wasting copious amounts of time per kernel compilation.
The build is a Docker stage that downloads the kernel source, applies my config fragments, and compiles:
cd "linux-${KERNEL_VERSION}"
make defconfig
scripts/kconfig/merge_config.sh -m .config /configs/config.common
make -j"$(nproc)" bzImage
The resulting kernel is ~3.7 MiB for x86_64. Combined with the ~2 MiB Go binary (compressed in the initramfs), the final image comes in around 8 MiB for amd64.
The Final Architecture
UEFI firmware
|-- loads /EFI/BOOT/BOOTX64.EFI (the UKI)
|-- kernel unpacks initramfs into tmpfs
|-- execs /init (the Go binary)
|-- mounts proc, sys, dev
|-- brings up eth0, runs DHCP
|-- listens for ACPI power button
|-- starts HTTP server on :8080
|-- on power button: sync, poweroff
No bootloader menu. No module loading. No shell. No package manager. The only userspace binary is the Go program, and it does everything.
The disk layout is equally simple:
minilinux.img (GPT)
|-- Partition 1: FAT32 ESP
|-- EFI/BOOT/BOOTX64.EFI (UKI: kernel + initramfs + cmdline)
That’s it. One partition, one file.
The Result
$ just build
... terrifying docker logs ...
SUCCESS: output/minilinux.amd64.img (7.7M)
$ just run
... kernel logs ...
:: Mounting filesystems...
:: Configuring network...
:: Running DHCP on eth0...
:: Lease: 10.0.2.15/24 gw 10.0.2.2 dns 10.0.2.3
:: Setting hostname...
:: Hostname: minilinux-525400
:: System ready
:: HTTP server listening on :8080
# From another terminal:
$ curl http://localhost:8080/
Hello, World!
A 7.7 MiB bootable image serving HTTP. For comparison, the scratch Docker
image with just the Go binary would be about 2 MiB. So we’re paying ~5.7 MiB
for a custom kernel and the ability to boot on bare metal (kind of… VM metal).
The arm64 build is even smaller at ~7 MiB because the aarch64 kernel seems to compresses better with EFI zboot.
When Would You Actually Use This?
Honestly? Probably never. Containers exist for a reason and orchestration tools like Kubernetes solve real problems. But there are some edge cases:
- Embedded appliances: When you’re shipping a physical device and want the smallest possible attack surface
- Single purpose VMs: When you want one VM to do one thing with no shared kernel
- Learning: Understanding how Linux boots is genuinely useful knowledge
For me, the real value was the journey. I now understand initramfs, kernel configuration, UEFI boot, and systemd-boot in a way I never did before. The next time I debug a boot failure, I’ll actually know what’s happening.
I’m also contemplating playing with the idea of building some form of software based router using nftables (well the netlink interface of it) and having a web controller/ui.
What’s Next
The project is functional but there’s more to explore:
- A/B system partitions: Have two system partitions (system-a, system-b) so upgrades can be written to the inactive partition and switched atomically. If the new version fails to boot, fall back to the previous one
- Squashfs system partition: Move the rootfs to a readonly squashfs partition instead of living entirely in tmpfs. Saves RAM and enables larger systems
- Persistent storage: Add an ext4 read/write partition for config, logs, and data that survives reboots
- ISO installer: Build an ISO that boots, presents a minimal UI, and installs itself to a storage volume. Would make Proxmox deployment much cleaner than uploading ISOs manually
- A proper init system: The current Go binary is a monolith that does everything. I’ve started working on coffey, a minimal init system in Go that could handle service supervision, logging, and process management properly
- Something actually useful: Replace the “Hello, World!” with a real service. The software router idea with nftables and a web UI is tempting
The code will be at github.com/lcox74/bingo
once I clean it up. The only requirement is Docker and just.