Sometimes, doing nothing is the best approach.
This will be a different kind of article. I’m going to detail a (somewhat geeky) problem I recently faced, and how I “solved” it by doing nothing.
I want you to remember that “nothing” is often a useful option.
Become a Patron of Ask Leo! and go ad-free!
Waiting as a solution
Sometimes the problems we experience are out of our control. After attempting to diagnose a problem with two of my three servers, I determined the problem was probably not mine, and elected to simply wait, assuming the issue would be resolved elsewhere. It was.
The problem was simple: attempting to connect to a website on one of my servers failed. The error varied, but it was most often a timeout error, meaning that my browser (currently Firefox) reached out to the site, made a connection of some sort, and then didn’t get a timely response.
When you’re responsible for a website’s operation, that can be a big deal. Fortunately, this was on one of my “second tier” servers, so while it impacted some sites (newsletter.askleo.com, 7takeaways.com, heroicstories.com, and others), it didn’t affect askleo.com, which has its own dedicated server.
And then it did.
I started to get the same error attempting to make some changes to content on this site.
That officially made it a very big deal.
But it was weird. I could sign in to the server’s administrative interface1 just fine. It appeared to be running normally; nothing seemed amiss. There wasn’t anything like excessive load, which can sometimes lead to this symptom (albeit typically only on one server, not two).
I rebooted the server. It’s a common piece of advice, but it had no effect. Once the server was back, the administrative interface still showed the server running normally, but connections were still timing out.
That made it all the more confusing: sometimes things would work. They might be slow, but eventually, the connection would be made, although more slowly than normal. Other times, nothing.
With the server running normally and all other websites I use, like Gmail.com, operating properly, I started to suspect a network issue between myself and my servers. I say suspect because most networking issues that would impact my ability to view a webpage on my server should normally affect all connections to the server… and yet my administrative interface connected quickly and reliably.
I fired up a VPN so as to come at the server from a different route. No change.
I started to suspect a networking issue at my hosting provider, Amazon Web Services.
I have three servers running at AWS. Only two of them seemed affected by this: askleo.com and my second-tier server. The third server, interestingly, was unaffected.
AWS infrastructure is amazingly complex. Wonderfully efficient and powerful, but complex. It’s not for the weak of heart.
I noted that the two affected servers were in the same data center and the other was elsewhere.2
Two factors made me suspect Amazon.
- Some traffic, such as my administrative connections, passed normally, while others, such as https connections to the sites, did not.
- The problem was limited to servers that appeared to be co-located.
I suspected a misbehaving Amazon router was involved in the traffic to those servers. (Routers routinely operate on some kinds of traffic differently than others, and indeed, the networking infrastructure at AWS is, once again, amazingly complex.)
Interestingly, AWS Service Status showed no issues. However, it reports only on AWS services and not the networking infrastructure.
I had no practical way to raise the issue with Amazon, but the clues were all pointing to them.
So, we wait
With that as my gut feeling, I elected to simply… wait.
It was late, so I called it a day and stepped away. I did note that the second-tier server continued to have issues into the evening, but interestingly, askleo.com started to behave.3
I woke up the next day to everything working properly. Crisis averted.
I can’t say for certain that my analysis and suspicions were correct, but it certainly behaved and resolved that way.
The takeaway here is simple: sometimes problems resolve themselves. Certainly not all, and perhaps not even most, but many. It’s usually because those problems are elsewhere and are being addressed by others.
I’m not saying waiting is always the right answer, but keep it as a tool in your toolbox as you encounter issues, particularly online.
You don’t have to wait for this, though: subscribe to Confident Computing! Less frustration and more confidence, solutions, answers, and tips in your inbox every week.
Footnotes & References
2: It could well be the same data center or building, but they’re identified as being in different “zones”, which implies to me that they are in physically different locations from one another.
3: This may be due in part to the fact that it’s fronted by a content delivery network, or CDN. This means that many requests to view pages were provided by caches around the world, rather than hitting the actual server every time. In fact, it’s quite possible that most askleo.com visitors never saw a problem at all.