Technology in terms you understand. Sign up for the Confident Computing newsletter for weekly solutions to make your life easier. Click here and get The Ask Leo! Guide to Staying Safe on the Internet — FREE Edition as my thank you for subscribing!

Wait.

Sometimes, doing nothing is the best approach.

In which I diagnose an issue online and come to an interesting conclusion.
Hourglass
(Image: canva.com)

This will be a different kind of article. I’m going to detail a (somewhat geeky) problem I recently faced, and how I “solved” it by doing nothing.

I want you to remember that “nothing” is often a useful option.

Become a Patron of Ask Leo! and go ad-free!

TL;DR:

Waiting as a solution

Sometimes the problems we experience are out of our control. After attempting to diagnose a problem with two of my three servers, I determined the problem was probably not mine, and elected to simply wait, assuming the issue would be resolved elsewhere. It was.

Can’t connect

The problem was simple: attempting to connect to a website on one of my servers failed. The error varied, but it was most often a timeout error, meaning that my browser (currently Firefox) reached out to the site, made a connection of some sort, and then didn’t get a timely response.

When you’re responsible for a website’s operation, that can be a big deal. Fortunately, this was on one of my “second tier” servers, so while it impacted some sites (newsletter.askleo.com, 7takeaways.com, heroicstories.com, and others), it didn’t affect askleo.com, which has its own dedicated server.

And then it did.

I started to get the same error attempting to make some changes to content on this site.

That officially made it a very big deal.

Odd symptoms

But it was weird. I could sign in to the server’s administrative interface1 just fine. It appeared to be running normally; nothing seemed amiss. There wasn’t anything like excessive load, which can sometimes lead to this symptom (albeit typically only on one server, not two).

I rebooted the server. It’s a common piece of advice, but it had no effect. Once the server was back, the administrative interface still showed the server running normally, but connections were still timing out.

Sometimes.

That made it all the more confusing: sometimes things would work. They might be slow, but eventually, the connection would be made, although more slowly than normal. Other times, nothing.

With the server running normally and all other websites I use, like Gmail.com, operating properly, I started to suspect a network issue between myself and my servers. I say suspect because most networking issues that would impact my ability to view a webpage on my server should normally affect all connections to the server… and yet my administrative interface connected quickly and reliably.

I fired up a VPN so as to come at the server from a different route. No change.

I started to suspect a networking issue at my hosting provider, Amazon Web Services.

AWS

I have three servers running at AWS. Only two of them seemed affected by this: askleo.com and my second-tier server. The third server, interestingly, was unaffected.

AWS infrastructure is amazingly complex. Wonderfully efficient and powerful, but complex. It’s not for the weak of heart. Smile

I noted that the two affected servers were in the same data center and the other was elsewhere.2

The location of my servers at AWS.
The location of my servers at AWS. Click for larger image. (Screenshot: askleo.com)

Two factors made me suspect Amazon.

  1. Some traffic, such as my administrative connections, passed normally, while others, such as https connections to the sites, did not.
  2. The problem was limited to servers that appeared to be co-located.

I suspected a misbehaving Amazon router was involved in the traffic to those servers. (Routers routinely operate on some kinds of traffic differently than others, and indeed, the networking infrastructure at AWS is, once again, amazingly complex.)

Interestingly, AWS Service Status showed no issues. However, it reports only on AWS services and not the networking infrastructure.

I had no practical way to raise the issue with Amazon, but the clues were all pointing to them.

So, we wait

With that as my gut feeling, I elected to simply… wait.

It was late, so I called it a day and stepped away. I did note that the second-tier server continued to have issues into the evening, but interestingly, askleo.com started to behave.3

I woke up the next day to everything working properly. Crisis averted.

I can’t say for certain that my analysis and suspicions were correct, but it certainly behaved and resolved that way.

Do this

The takeaway here is simple: sometimes problems resolve themselves. Certainly not all, and perhaps not even most, but many. It’s usually because those problems are elsewhere and are being addressed by others.

I’m not saying waiting is always the right answer, but keep it as a tool in your toolbox as you encounter issues, particularly online.

You don’t have to wait for this, though: subscribe to Confident Computing! Less frustration and more confidence, solutions, answers, and tips in your inbox every week.

Podcast audio

Play

Footnotes & References

1: As well as the Linux equivalent of a command line, via SSH.

2: It could well be the same data center or building, but they’re identified as being in different “zones”, which implies to me that they are in physically different locations from one another.

3: This may be due in part to the fact that it’s fronted by a content delivery network, or CDN. This means that many requests to view pages were provided by caches around the world, rather than hitting the actual server every time. In fact, it’s quite possible that most askleo.com visitors never saw a problem at all.

2 comments on “Wait.”

  1. That’s similar to the problem I wrote you about an error message when I tried to download the New Windows Outlook to test it out. The message indicated the problem was on their end, but I’ve often seen error messages get it wrong. (Maybe that’s why error messages sometimes simply say. “Oops, something went wrong. You can’t go wrong with that message”)
    Half a day later it just worked.
    I also get a “Not responding. Wait or End program. I always click wait and the page eventually loads. Sometimes it takes up to half a dozen clicks on Wait. If I simply wait without clicking, the page also eventually loads.

    I’m giving those examples because Leo’s example is probably more complex than the average user would run into.

    Reply
  2. When I encounter an issue connecting to a website or service, I first check that I can connect elsewhere (I troubleshoot my Internet connection), then if everything appears to be O.K. on my end, I simply wait a while, or until the next day. I did that this past weekend when I was unable to connect to Oracle’s Virtual Box website. I was able to connect to other sites, but not the Virtual Box site. It was getting late, so I closed my web browser and went to bed. The next day, the site was back up and running normally. I usually chalk this sort of issue up to “Stuff happens” :),

    Ernie (Oldster)

    Reply

Leave a reply:

Before commenting please:

  • Read the article.
  • Comment on the article.
  • No personal information.
  • No spam.

Comments violating those rules will be removed. Comments that don't add value will be removed, including off-topic or content-free comments, or comments that look even a little bit like spam. All comments containing links and certain keywords will be moderated before publication.

I want comments to be valuable for everyone, including those who come later and take the time to read.