What to look for, what to check.
This simple question opens up a veritable Pandora’s box when it comes to understanding URLs and what is safe to click on.
The concepts are simple, but how those concepts can be combined is complex, particularly if someone is attempting to deceive you.
I’ll try to make some sense of it all.
Become a Patron of Ask Leo! and go ad-free!
A URL or web address beginning with “http” has several components, the most important being the server or website address between the leading “//” and the next, single ‘/’. This is often obfuscated by making trusted websites — like paypal.com — appear as subdomains of a hacker’s domain — paypal.com.hackersdomain — or by using confusing encoding — hackersdomain%2fpaypal.com. Always examine the left-most portion of the full domain to confirm it’s going where you think. Https is available to fraudulent domains as well, so you cannot rely on it alone as an indicator of safety.
Three basic URL components
“URL” is short for Uniform Resource Locator. The most common one we know of is the web address — something like “https://askleo.com”.
There are three primary components to a URL. We’ll use this URL as our example for discussion:
- http://www.somerandomservice.com – Server. This identifies the protocol (http or https — the language of webpages) and the server that hosts the webpage. www.somerandomservice.com identifies a specific server on the internet from which what follows will be requested.
- folder/page – Page. The page specifies exactly what you are requesting from the server. Typically it’s a webpage, perhaps within a folder on that server, but it could also be a program to run on the server or a file to be downloaded.
- ?parameter1=value1¶meter2=value2 – Parameters. The question mark indicates that the rest of the URL contains parameters — additional information supplied to the page. Since “pages” can often be small (or large) computer programs, information from this part of a URL is given to those programs to use in ways that can affect the content of the page ultimately displayed.
Important note: The Server specification ends at the first “/” that occurs after the “http://” or “https://” start of the URL, and the Page specification ends at the first question mark after that. This rule is important to understanding whether a URL is valid, bogus, or misleading.
The server matters, part 1
I’ll restate the first part of that rule to focus on what we care about (I’ll use “http” from here, but this all applies to both “http” and “https” unless otherwise specified):
The server being contacted begins after the “http://” and ends at the next “/”.
Or, in this URL, the part that’s highlighted:
That’s the part that matters, because that’s the part that tells your browser what internet server to connect to. Everything else is secondary. Important, yes, but not nearly as important.
Let’s look at one of the ways that phishing attempts try to fool you.
It might be tempting to look at that quickly and say “oh, that ends in paypal.com, so it’s PayPal!”
No, it’s not. Look again:
That URL loads a page named “www.paypal.com” (a valid page name) from the server www.somerandomservice.com.
Now, my example is pretty lame, as “www.somerandomservice.com” is big and obvious at the front of that URL. But scammers use all sorts of variations on this theme to make it look like you’re going to someplace you trust, when you’re not if you don’t look closely.
The server matters, part 2
For this point, we need to pick apart the way server names are created and used.
URLs are created from right to left, and the individual components are separated by a period. Consider “www.somerandomservice.com”.
- “.com” is the top-level domain, and indicates which registry service is used to register the domain initially.
- “somerandomservice” is the domain name. This is the part you purchase when you register a domain name.
- ““www.” is the subdomain. Once you own the domain, you can create as many of these subdomains as you like.
In general, fully qualified domain names like “www.somerandomservice.com” identify a server on the internet. “photos.somerandomserver.com” would typically be a different server, but it doesn’t have to be.
The choice between using something like “photos.somerandomserver.com” versus “somerandomserver.com/photos” is purely one of site design, and has no security implications. That’s just how the person building the website chose to do it. There are geeky pros and cons to each, but for a typical web user, it doesn’t really matter.
What does matter is how subdomains can be abused. For example, it’s perfectly possible for this to be a valid domain:
Once again, with only a quick glance, you might think it was actually paypal.com, since it starts with “http://www.paypal.com”.
In that example, “www.paypal.com.” is just a subdomain created by the owner of “somerandomservice.com” and has nothing at all to do with the real paypal.com.
Here’s a worse example:
Once again, it’s designed to fool you into looking like paypal.com, but in fact it’s not – especially if your browser happens to only show you the first part of the URL in your status bar since it’s so long.
Scammers use many different variations of this technique to trick you.
A slash is a slash is a … %2F?
This was brought up by a comment on this article (thanks, Ken!).
Characters in URLs can be “encoded” with a special representation that acts the same as the character it encodes. The format is a percent sign followed by a two-digit hexadecimal number (individual digits will be 0-9 or A-F).
A space character, for example, is %20, and you’ll actually see that in legitimate URLs from time to time, since an actual space character cannot be used.
%2F is the slash character “/”.
So this rule:
The server being contacted begins after the “http://” and ends at the next “/”.
still applies, but %2F could be seen in place of “/”. More correctly:
The server being contacted
begins after the “http:”, “/” or “%2F”, “/” or “%2F” and
ends at the next “/” or “%2F”.
It gets ugly, but the thing to remember is just this: %2F is exactly the same as “/”.
Here’s an example of how it might be abused:
That is not PayPal. Replace the %2F with “/” and you’ll see instead:
Clearly, it goes to www.somerandomservice.com.
Any URL with a % notation in the server portion (between that first “http://” and the next “/”) is suspect. A % notation after the server portion (in the page, or more commonly the parameters) is typically OK.
Https and secure websites
All of the above applies whether the URL begins with http or https.
Https adds two important things:
- It encrypts the data flowing between your computer and the server.
- It validates that the server you connect to is, in fact, the server you requested.
Important: https doesn’t validate you’re connecting to the server you think you are; it validates that you’re connecting to the server you requested. Those are two different things.
For example, let’s say you fall for one of my lame examples above, and click on a link like this:
That’s an https connection. It is very easy for the owner of somerandomservice.com to install a completely valid https certificate for www.paypal.com.somerandomservice.com.
Thus, when you click on that link, your browser will confirm that you are indeed connecting to what you asked for: www.paypal.com.somerandomservice.com. That might not be what you think you asked for, if you fell for a scammer’s trick, but that’s all that https can validate for you: you got what you asked for.
It’s unfortunate that something fairly simple is quite complex once you assume people will attempt to deceive you.
I’ll sum it up with this:
For any URL you are about to click on, pay close attention to the domain name: everything between “http://” or “https://” and the next “/”. Remember that domain names build from the right, so if it ends in, for example, “.paypal.com”, you can be assured that it’s a domain or sub-domain owned by paypal.com.