LAN Improvements
Hello there again. No, I didn’t go and forget about you all after a single post. I’ve just had a crazy last couple of months. Since the last time I posted, I interviewed with Google, got hired, and moved two timezones away to California. Talk about a busy period. But the project is still ongoing, and even has a name now: Ravenna.
One advantage of living in San Jose, heart of Silicon Valley, is that high speed internet is ubiquitous. I’m ostensibly getting a gigabit per second through Comcast. Unfortunately, when I performed a speed test from my laptop over wifi, I was only get somewhere around 10 Mbps. Oh man…
So I troubleshot the problem the best I could. I tested a wired device downstream from my router – same speed. I eliminated the router altogether and just tested the speed plugged directly into the modem. Now I was getting around 600 Mbps. Not quite a gig, but not shabby. Certainly much better than I was getting through WoW back in Huntsville.
So what was the problem, then? The first culprit in my mind was the custom router firmware I was running – dd-wrt. I use a Nighthawk r7000, a router I bought a few years ago without too much thought. After a while, I decided I wanted finer-grained controls over my network than the default Netgear GUI would allow. I wanted to be able to automate things using scripts. Unfortunately, the r7000 isn’t particularly well-supported under dd-wrt.
After I got under the hood, I could see that the router was basically an underpowered dual core ARM processor, with under a gigabyte of RAM hooked up to a broadcom ASIC. The ASIC is used for the data plane while the CPUs are used for the control plane. However, the ASIC’s API is proprietary, so the authors of dd-wrt had to reverse engineer its interface in order to support hardware-accelerated NAT.
I saw a lot of dark secrets while playing around with dd-wrt, including a 2.X series Linux kernel. But it met my needs. I didn’t notice a decrease in connection speed and I gained the flexibility of running scripts directly on my router. So I left it the way it was, for the most part.
Fast forward to a couple of weeks ago when I moved in. Now I could see it. This performance was unacceptable. So I bit my tongue and reflashed the netgear locked-down Duplo firmware. …and all of a sudden I was getting 600 Mbps wired connections and 200 Mbps wireless connections! Not bad!
But switching away from dd-wrt meant losing all of the flexibility I had gained
from using my router. So I decided to break out the
sidecar pattern.
I’d let my router do what it’s good at – routing, firewalling, and NATing –
and let another, more flexible component do the rest. So I set a raspberry pi up
with a static IP and let it be both the DNS server and the DHCP server.
dnsmasq
allowed me to do both.
But I had a laundy list of things I needed to DNS to do. I needed hostnames to
be resolvable after DHCPing. I needed service names to be resolvable after
they’d started up (e.g. jenkins
, gitlab
), and I needed all outbound DNS
traffic to be encrypted. A tall order for just dnsmasq
. So I decided to set up
a bit of DNS pipeline. After all was said and done, it looked a bit like this:
consul
is a great tool for service discovery.
It offers a REST API both for registering and querying services and their
associated data. It also offers a read-only DNS API. It was a perfect fit
for my use case.
cloudflared
is a service
created by the eponymous cloudflare
that proxies plaintext UDP DNS
queries over DNS over HTTPS. Since all of my outbound queries proxy through it,
none of my external traffic should be plaintext. This service uses Cloudflare’s
relatively recent public DNS server 1.1.1.1
, which has a new-fangled HTTPS
interface for secure name resolution.
PSA: 1.1.1.1
is not suitable for use as a fake IP in tests. I’ve seen
integration tests fail because people have thought that. If you must use a
fake IP, the IETF has set aside
test subnets for just this
purpose.
So let’s say I want to resolve node2
. The request is sent to the sidecar
since the system’s nameserver was configured via DHCP. dnsmasq
is listening on
port 53 of the sidecar and will immediately resolve the request since node2
is
one of its active DHCP leases.
How about jenkins
? Before I even make the request, Jenkins should have
registered with consul
on startup with a REST POST
request. Now, dnsmasq
will receive the DNS request and since it has no record of jenkins
, it will
proxy it to consul
. consul
will return the appropriate result. I’m actually
glossing over things a little bit here. I had to make use of a search path to
make this work since consul
returns its DNS records in the form of
<SERVICE_NAME>.service.<DOMAIN>
.
Finally, suppose I try to google.com
. This sort of request should be the
average case. The request will make its way through dnsmasq
and consul
,
neither of which will know the answwer and will pass the request along to the
next link in the chain. Cloudflared will receive the request, translate it to an
equivalent HTTPS request, and query 1.1.1.1
.
This all might sound quite circuitous, but in practice, I haven’t noticed much of an impact on my name resolution latency. Certainly not enough to abandon the level of flexibility this affords me. With this model, I can plug a compute node into my server farm and with zero manual configuration, the services hosted on it magically become available. Getting Jenkins up is as simple as plugging in a D plug and a cat5 cable.
After messing around with the configurations of these various services, I wrapped them all up into debian packges to make it real and reproducible. Stay safe kids. Practice immutable infrastructure.
There are a couple of warts to this system. I haven’t integrated with consul’s health check system yet, so if I ever migrate a service to another node, or a node changes addresses, I’ll have one invalid dns record hanging around for that service. I’ll need to fix that in the near future. For the moment, I’m marking it down as technical debt and moving along. A lot of changes have been made to the simulation since I last posted and I really want to get back to working on the meat of the project – geometry, physics, and procedural generation. Stay tuned for more.