There are many dates in play, but none of them are great or guaranteed.
You want to know if the article you’re reviewing — say, some medical advice — is current. Was it written recently, or is it 10-year-old information that’s out of date by now?
It seems like such a simple question, but in an absolute sense, you can’t reliably tell.
Surprising as it might seem, there’s no standard way to absolutely, positively say on what date a webpage or its content was created.
While not absolute and not 100% reliable, there are clues we can use to determine how old a page might be.
Let’s look at some of those.
Become a Patron of Ask Leo! and go ad-free!
When was a webpage created?
There’s no technological solution for determining when a webpage was created or updated. Your best bet is to look for information on the page itself, but that’s not guaranteed to be present or accurate. Another approach is to use archive.org’s “Wayback Machine” to look at the page’s history.
First, I need to be clear that Google really has nothing to do with this. Google is only a way for you to find websites and webpages, and doesn’t really factor into the “when was it written” question. So I will not be talking explicitly about Google — or any search engine — at all.
Second, what we really care about here are not websites, but webpages. A site is a collection of webpages like, say, https://askleo.com. It may have a creation date,1 but in reality, what we care about is the recency of a particular webpage, like this one you’re looking at now.
The best source: the page itself
As silly as it sounds, the most authoritative source for when a webpage was written is the page itself. Many pages include an “updated” or “posted” date somewhere on the page. Here on Ask Leo! you’ll see dates on all the articles, near the bottom.
In this example, two dates are listed:
- The “posted” date is the date of the most recent major update to the article.
- The “originally posted” date is the date the article was first published.
There are, unfortunately, several problems.
- I could lie. There’s nothing that forces these dates to be accurate.
- There’s no standard location. Most common is above or below the main content of webpages, or perhaps in the webpage footer.
- The date may not be listed at all.
What most people want is some kind of magic date information about when the webpage was written within the page information returned by the web server. We figure it must be there, and we just can’t see it.
There is sometimes information returned called Last Modified, which is intended to reflect the date the page was last altered.
Once again, there are problems:
- It’s not required and is sometimes just wrong.
- There’s no standard as to exactly what it means.
- If present, it’s the last date (and time) the file you’re accessing was altered. This usually has no relationship to when the content was written. For example, a page is “altered” every time someone leaves a comment. That is completely unrelated to article creation.
- It could lie.
So the closest technological resource we have is inadequate.
OK, I lied: I’ll mention Google one more time.
Search engines could track historical changes to pages as they periodically spider the internet. If a page appears one week and the next week it changes, it seems like search engines could track this activity.
To the best of my knowledge, they rarely make the information available.
The Internet Archive, on the other hand, does exactly that.
Using the Internet Archive’s Wayback Machine, you can view webpages as they were in the past, assuming the Internet Archive had spidered and captured that webpage on that date.
Sadly, the Internet Archive also has limitations.
- It’s spotty. Within a site, not all pages may be included.
- It’s spotty. Not all sites are included. In fact, webmasters can request that they not be included.
- It’s spotty. Not all dates are included. The Internet Archive’s spider checks “periodically” at what appears to be a rate of every few weeks. Changes occurring faster than that are not captured.
- It can’t track many changes. A page moved from one domain to another (as I have done in the past, moving articles from ask-leo.com to askleo.com) will appear to be completely unrelated to one another in the archive.
- It may not be current. The Archive states it may take up to six months for pages to appear.
Even with all those limitations, if the website is present, the Archive can provide useful data for researching approximately when a webpage changed.
The reason archiving of this sort is so challenging is simply the sheer quantity of data involved. An ideal archive would keep an entire copy of the entire World Wide Web every so often… and that’s more data than can be reasonably managed.
Combining these approaches can get you interesting information, but as we’ve seen, each approach has limitations. Use the information you discover with the knowledge that it might not be accurate.
In the end, the absolute answer remains “No”: there’s no definitive way to determine when a webpage was written.
Footnotes & References
1: For AskLeo!, 2003. Or it could be something else:
- When askleo.pugetsoundsoftware.com became ask-leo.com (2004 or thereabouts).
- When ask-leo.com became askleo.com (2013).
- When https://askleo.com became https://askleo.com (2014).
Even before we begin, things get confusing.