There are many dates in play, but none of them are great or guaranteed.
You want to know if the article you're reviewing -- say, some medical advice -- is current. Was it written recently, or is it 10-year-old information that's out of date by now?
It seems like such a simple question, but in an absolute sense, you can't reliably tell.
Surprising as it might seem, there's no standard way to absolutely, positively say on what date a webpage or its content was created.
While not absolute and not 100% reliable, there are clues we can use to determine how old a page might be.
Let's look at some of those.
Become a Patron of Ask Leo! and go ad-free!
When was a webpage created?
There's no technological solution for determining when a webpage was created or updated. Your best bet is to look for information on the page itself, but that's not guaranteed to be present or accurate. Another approach is to use archive.org's "Wayback Machine" to look at the page's history.
Definitions
First, I need to be clear that Google really has nothing to do with this. Google is only a way for you to find websites and webpages, and doesn't really factor into the "when was it written" question. So I will not be talking explicitly about Google -- or any search engine -- at all.
Second, what we really care about here are not websites, but webpages. A site is a collection of webpages like, say, https://askleo.com. It may have a creation date,1 but in reality, what we care about is the recency of a particular webpage, like this one you're looking at now.
The best source: the page itself
As silly as it sounds, the most authoritative source for when a webpage was written is the page itself. Many pages include an "updated" or "posted" date somewhere on the page. Here on Ask Leo! you'll see dates on all the articles, near the bottom.
In this example, two dates are listed:
- The "posted" date is the date of the most recent major update to the article.
- The "originally posted" date is the date the article was first published.
There are, unfortunately, several problems.
- I could lie. There's nothing that forces these dates to be accurate.
- There's no standard location. Most common is above or below the main content of webpages, or perhaps in the webpage footer.
- The date may not be listed at all.
HTTP Headers
What most people want is some kind of magic date information about when the webpage was written within the page information returned by the web server. We figure it must be there, and we just can't see it.
There is sometimes information returned called Last Modified, which is intended to reflect the date the page was last altered.
Once again, there are problems:
- It's not required and is sometimes just wrong.
- There's no standard as to exactly what it means.
- If present, it's the last date (and time) the file you're accessing was altered. This usually has no relationship to when the content was written. For example, a page is "altered" every time someone leaves a comment. That is completely unrelated to article creation.
- It could lie.
So the closest technological resource we have is inadequate.
History
OK, I lied: I'll mention Google one more time.
Search engines could track historical changes to pages as they periodically spider the internet. If a page appears one week and the next week it changes, it seems like search engines could track this activity.
To the best of my knowledge, they rarely make the information available.
The Internet Archive, on the other hand, does exactly that.
Using the Internet Archive's Wayback Machine, you can view webpages as they were in the past, assuming the Internet Archive had spidered and captured that webpage on that date.
Sadly, the Internet Archive also has limitations.
- It's spotty. Within a site, not all pages may be included.
- It's spotty. Not all sites are included. In fact, webmasters can request that they not be included.
- It's spotty. Not all dates are included. The Internet Archive's spider checks "periodically" at what appears to be a rate of every few weeks. Changes occurring faster than that are not captured.
- It can't track many changes. A page moved from one domain to another (as I have done in the past, moving articles from ask-leo.com to askleo.com) will appear to be completely unrelated to one another in the archive.
- It may not be current. The Archive states it may take up to six months for pages to appear.
Even with all those limitations, if the website is present, the Archive can provide useful data for researching approximately when a webpage changed.
The reason archiving of this sort is so challenging is simply the sheer quantity of data involved. An ideal archive would keep an entire copy of the entire World Wide Web every so often... and that's more data than can be reasonably managed.
Do this
Combining these approaches can get you interesting information, but as we've seen, each approach has limitations. Use the information you discover with the knowledge that it might not be accurate.
In the end, the absolute answer remains "No": there's no definitive way to determine when a webpage was written.
Podcast audio
Footnotes & References
1: For AskLeo!, 2003. Or it could be something else:
-
- When askleo.pugetsoundsoftware.com became ask-leo.com (2004 or thereabouts).
- When ask-leo.com became askleo.com (2013).
- When https://askleo.com became https://askleo.com (2014).
Even before we begin, things get confusing.
this may work:
javasc#ipt:alert(document.lastModified)
Just copy and paste the line above in your address bar and
hit your ENTER key – and you’ll know the date and time the page
you’re viewing was last updated!
Please comment on this
Irv
14-Jul-2010
It is possible to detect page age with a simple google hack. View: http://www.labnol.org/internet/search/find-publishing-date-of-web-pages/8410/
also there is an installable tool that does it
http://www.linkdiagnosis.com/
17-Jul-2010
Right, Irving, that was the first thing that popped into my head. But, what you have to remember is if that webpage is generated differently each time, it will give you a time that doesn’t seem right. For example, try that line on http://www.google.com . It will probably return a time a few seconds before you checked.
While knowing when a page was written may be important, sometimes the date you read it is just as critical. Specifically, citations for papers and articles often call for an article’s retrieval or access date more often than the publication date. But sometimes both, if they’re available.
Thanks for dating your articles, Leo. I consider it an integral part of a professional article. It’s always very frustrating when you think you’re reading something very current until it references a “current event” that happened many years ago. I’ve even seen this on some news sites.
I completely missed the date at the end of each of your articles. Well done! I do wish there were more like you (but of course Ask Leo is inimitable!).
This will work in most cases – On the page in question, type this in the address bar..
Javascript:document.lastModified
Hit RETURN and look in upper left of screen
To get back to the page, click Refresh or Reload, whichever per your browser. BACK doesn’t do it.
If the date is current, the time-stamp may be off due to time zone’s origin.
No way, that simply shows you the timestamp of the instant where your browser loaded the page.
That has nothing to do with the page’s creation or modification date.
regards
Actually the date written can be important depending on the person’s reason for looking. I’m part of a writing group, and I just found this linked on my FaceBook feed: http://the-digital-reader.com/2015/03/03/scammers-now-trolling-indie-authors-with-bogus-dmca-noticest/
I’m also a Computer Science major and it really bugs me that there is no easy way to prove how old a file/post really is. I came across your blog post in the process of trying to find out if there WAS such a solution.
Now, I’m not so paranoid that I expect to be stolen from (but not so naive that I’ll rule it out), but the fact is, it is very easy to put in a fake date on your blog post, or to change the clock on your computer to back date a file, and that makes it all too easy to make the thief in these cases look like the victim.
As you said: “I could lie. There’s nothing that forces these dates to be accurate.” And there is nothing that proves them to be false, either… and there I times I think that there really needs to be.
My problem is when looking at ads selling something not being able to find posted date make them useless. This is true of any ad that is abandoned rather than removed when no longer valid.
Dates are actually a very important piece of information for the readers because it affects the credibility of the article especially if the article was written a long time ago, like 5 years ago. Things might have changed now and it might not be the same anymore. So may I ask where the date of the articles can mostly be found especially for research purposes/ Thank you.
As the article states, placing the date on an article is totally at the discretion of the person posting the article. If they don’t post it, there’s no way to find it.
PLEASE READ THE ARTICLE – dates you have access to can easily be completely meaningless. Dates for my articles are at the bottom of each, and I try to keep them accurate, but not all web sites do, or care to.
Thanks Leo. I have no particular reason to need to know the date of an article but it is quite logical to want to know when it was written to assess its relevance in time. ie reading an article about a film camera, let’s say Canon, when you research a 2017 product would not do. Obviously here I’m being extreme just to make the point. However many articles are a little less obvious as to their origin in time.
I can imagine this will be available one day…with a “premium” access! It’s not perfect now but it’s free!
Now and then I access Forums (Fora I’m told) and I’m not too good with them so start to read things that are a decade or more old…Eventually realise what I’m doing!
I don’t know if this could be part of this topic but actually it is what got me to your article.
I was browsing the web for a remote control replacement and found a “spare part” site not very far from me. I tried to call them but using the number listed (in a few different sites)it kept telling me their number had changed and I had to ring another number. Yet the address and spec. of the business was quite accurate. The number did not make much sense…. so I call the phone directory assistance who did not assist me as they could not see the listing on their data….It turned out that the business was now operating from another state and when i contacted them (by email) they could not explain to me why this site was still as so although it had been inoperative for more than 12mths! I wonder how these things get updated? I’d thought businesses have to pay for it
I have frequently found that web sites that advertise a variety of businesses, (the one mentioned above was not on one of these it was sand alone), often don’t update their database and when you call the number they display with other info etc. You get a “the number you call is invalid”! …the business does not exist anymore! Well thanks again for this enlightening article yet that leave us in relatively the same obscurity.
Like so many others, I wonder how often we read the story of an event – friendly or terrifying – and give no thought as to whether it’s recent or from months ago.
Today I attempted a few “date” searches before finding your site….and I’m glad I took the time to read your pages. Thanks.