OPNSense Secondary Gateway Fails

I am running opnsense as my router. I have done this for years now. I have a primary and secondary gateways. The primary is the cable internet and the secondary is Starlink. Key piece of info: The Starlink is not in bypass mode. I do this on purpose in order to keep the starlink wifi has a tertiary backup.

This week my primary internet decided to turn to trash at night. This is quote common and with it being high season the amount of tourists is too much for the cable company. What this means is I have high latency at night. Failover to Starlink is the only solution but for some reason it stopped working. I also use individual firewall rules to force certain devices to use Starlink. Again, crappy internet calls for crappy solutions. If I forced a device to use Starlink the web still would not work.

To troubleshoot this I spent several hours going through the steps by myself and then several more hours with Google’s thinking model on AI studio. I also brought in deepseak. After all that troubleshooting I was still stuck. What was weird is I could ping and use a dig and even a curl through the command line but nothing in any web browser.

The only changes I could think about was DNS to upgrade to the latest OPNSense and the major upgrade to OPNSense 25.1.1. Fast forward to the present and I remembered that I made 1 other change about 2 weeks ago. I changed the local IP range on Starlink from 192.168.1.1/24 to 10.0.0.1/16. I guess I had not used the Starlink since then because that little nugget was key. While Gemini had me confirm that my NAT outbound rules were correct I think that OPNSense never truly released the IP range. I changed Starlink back to the 192.168.1.0/24 and then it started to work.

I still want that range removed but I think that next time I will be removing the interface and gateway and then change the Starlink LAN range. I then add the interface back and my money is on that it will start to work.

One quick note on using LLM AI. I was disappointed using Gemini Flash Thinking Experimental 2.0. Near the end and still in context it began to unravel. I have had this happen on other LLMs but I thought the large context would help. It sometimes just wouldn’t look at the full picture. The best example of this is I told it that my failover didn’t work and neither did my firewall rule. It was very determined to focus on the failover not working and would ignore the firewall rule. It should have deducted that the firewall rule really has nothing to do with the gateway failover and thus it should have focused on troubleshooting that could have solved both situations.