So let begin with the ending: if you have an internal DNS server(s) which you wish to use with pfSense’s DNS Resolver and Domain Overrides function, you must include an interface capable of communicating with that server in the Outgoing Network Interfaces configuration item even if that server is accessible on another LAN interface. If you don’t you will be unable to resolve any internal names and you’ll see the RTO of your internal servers slowly creep up until they cap out at 120000 on the resolver status page. Now, on with the story.

Why are you doing this?

Mostly to make my home network less frustrating for my wife but also a little bit for ad blocking. I recently decided to enable pfBlockerNG to do full house ad removal which requires moving from the DNS Forwarder to the more fully featured DNS Resolver on pfSense. I also have a cluster of FreeIPA servers which handle name resolution for all the various machines/services running on my LAN. Unfortunately, because I wasn’t very smart when I set it all up, these services are running under a subdomain of another domain I use out on the wider net. So SOA stuff is….screwy. For those local services I have to make sure my FreeIPA servers get asked DNS questions first otherwise I’ll get a NXDOMAIN from the “real” nameservers out on the net.

Previously, the pfSense router was configured with the DNS Forwarder to look at each of the FreeIPA servers in order and then at 1.1.1.1 as a last resort when resolving DNS names. This “worked” but had a whole host of downsides. If any of the FreeIPA servers went down every request took ages because I had to configure pfSense to query sequentially to keep my FreeIPA boxes “first” in the lookup list. This was an improvement over the previous system where everything went through the FreeIPA servers always meaning that if they went down DNS resolution died for the whole house. Not great.

As a final hiccup there are certain DNS names that have to be resolvable even if the FreeIPA servers are offline because they are integral in bringing the cluster which hosts FreeIPA online. So those have to be handled somehow, and were previously static host entries on the pfSense DNS Forwarder.

What other questionable choices did you make?

Well my home network isn’t a simple WAN/LAN split either. There’s a wired LAN segment, a segment for my VMs and VM hosts, and a segment for WiFi devices. This is actually pretty neat for stuff like security and keeping rude IoT devices contained, but it can cause problems when you need to reach out to services across a segment boundary.

This had manifested previously with my poor choice of putting the VM NFS host on the LAN segment and the VM host which used it for VM disk storage on the VM segment. This led to really fun events like the router going down causing every VM in my cluster to tip over until force rebooted with possibly corrupted disks. Thankfully, this mistake was recently remedied.

All of this to say that the FreeIPA servers live on the VM segment of my home network and are only accessible from inside my home network.

When did you discover something was wrong?

Since it is approximately the temperature of the sun at home right now, hurray for dry AZ heat, I wanted to make sure all my weather station stuff was working right. I went to check on the gateway unit and found that I couldn’t get the name to resolve. Begin panic. Check the VM cluster, everything seems to be running. Check the VM consoles, everything is good and services are running as expected. Log into the FreeIPA console and look up the unit’s static IP and try that, connects just fine by IP. Try dig @freeipa weather.example.com and get back the right response. Try dig @router weather.example.com and…..nothing, a SRVFAIL. So we know where the problem is, now to discover what the problem is.

How did you fix it?

First order of business was to set the local zone type in pfSense to Inform and the log level to 3 to try and get a better idea of what the heck was going on. I also pulled up the DNS Resolver status page and took a look at the server stats that it was reporting. One thing I noticed right away was when Unbound was restarted and the first query to an internal domain was made the RTO number for each of the internal DNS servers started low and steadily climbed till it capped out at 120000. Now I can’t seem to find anywhere what the RTO number actually is but I’m going to get its like Remote Time Out and I’ll also note that the forwarding servers I have configured, 1.1.1.1 and 1.0.0.1, both have RTO values in the range of 1-600.

Next I checked the logs and watched the requests get parsed and sent out. I could see regular requests to the rest of the net exiting my network cleanly and getting responses back but anything for my internal domain seemed to just hang and go nowhere. So maybe I managed to firewall myself off from the FreeIPA boxes, unlikely but since the ad blocker injects a bunch of firewall rules it’s possible. I tried the dig @freeipa weather.example.com again but from the router’s console this time. Instant response with the correct IP. Definitely not a firewall issue then.

At a bit of a loss I went back to read through the DNS Resolver configuration options in more detail and stumbled upon the key to the whole thing. I had, like a giant idiot, overridden the default value for Outgoing Network Interfaces from all available interfaces to only my two WAN uplinks. Since my FreeIPA servers aren’t visible from the internet and aren’t on my WAN segment….surprise pfSense couldn’t communicate with them! Adding my VM segment to the list of allowed interfaces and restarting the service got everything patched up instantly.

What did you learn?

Read The Furnished Documentation. Beyond that, DNS is hard and tracking network issues takes an iterative approach both in discovery and repair. Don’t be like me, go slow and pay attention to default values. They might just be there for a reason!