The internet is forever, except when it's not.
Um.... no. There's no magical "scramble code" to recover anything.
But that does raise a very interesting conundrum. We often say "the internet is forever", while at the same time saying, "Be sure to back up, because once you delete it, it's gone."
The ways of both the internet and deletion are more complex than most people realize. While these two statements appear to be diametrically opposed, they're both very true -- often at exactly the wrong time.
Become a Patron of Ask Leo! and go ad-free!
Deleting or retrieving from the internet
- Trying to hide things online often only brings more attention. It's called the Streisand Effect.
- Anything you post can be copied, and likely is being copied within moments, by search engines, archives, and other third parties.
- Sharing information online, even with privacy restrictions, puts you at the mercy of those you share with.
- Services that back up their data make copies when they do, which includes your data on that service.
- Once posted, it's pragmatically impossible to delete all copies of your data.
- You can't access the remaining copies, but they could still come back to haunt you.
The public internet & Barbra Streisand
As many have discovered, one of the fastest ways to spread a rumor is to call it a secret. That's true in life, but nowhere is it more true than on the internet.
Just ask Barbra Streisand. There's a reason there's something now called "the Streisand effect".
In 2003, she attempted to prevent photographs of her home from being published on the internet. This drew more attention, rather than less, to the images. It prompted people to copy and re-post the photos elsewhere, again and again and again. Indeed, the Wikipedia article that describes "the Streisand effect" includes the picture she originally attempted to suppress. There's simply no way she or anyone else could find and delete all the copies that were made of that photo, and continue to be made to this day. Indeed, any attempt to do so would probably just spur another round of copying and re-posting.
We see this all the time with information posted publicly that is later removed or altered. Be it a tweet, a photo, a database, or something else, usually someone, somewhere, has a copy of its original. On discovering the attempts to rewrite or delete history, they're quick to call shenanigans on the effort by posting the original as proof. Depending on the perceived relevance of the information, the original may be re-posted once or many times in many places, making its removal from the public internet as impossible as removing the photo of Ms. Streisand's home.1
This is one reason we say "the internet is forever". Anything you post publicly can be copied. There's no easy way to know by who, or by how many, but you can never assume that the number of copies is zero. Never. Once it's on the public internet, you lose all control over it the instant that someone makes a copy.
Your posts are being copied
OK, so you're not Babs2. To the best of your knowledge, no one cares about your tweets, photos, or whatever you care to post online. No one's making copies of what you post publicly.
You're wrong.
While it's likely that you as an individual aren't that interesting, that doesn't mean what you're posting online isn't being copied, and probably quite quickly. You're "interesting" in the sense that you're a user of Twitter, Google Photos, Flickr, or whatever service you use. Those sites are mirrored regularly. Why?
- Search engines like to keep local, cached copies in case the site goes offline.
- Understanding how to "spider the web," as it's called, is something computer science students learn by writing spiders that pick a target and mirror it. Others just do it for fun.
- The Internet Archive attempts to keep copies of all public websites to preserve our digital history. (Pictured to the right, my company's website as of 2003.)
- Many sites exist specifically to mirror other sites (or portions of other sites). Pick a prominent politician on Twitter, for example, and I can pretty much guarantee you there's a site keeping copies of that person's tweets.
And that's before we even consider corporations, malicious agents, and governments copying public information for their records, analysis, and uses unknown.
Is something you've posted in there? Probably. Will it matter someday? There's no way to know. I personally publish a lot, but I don't expect it to be a problem.
I hope I'm right.
Sharing is copying
I keep using the phrase "public internet" because it's an important distinction many people fail to keep in mind. Public is public, and as we've seen, public must be considered to be "forever".
So we ratchet up our privacy settings, restricting who is allowed to access our stuff, or perhaps only emailing certain things to certain people. We keep it "private" -- or so we think.
Still, we remain at the mercy of everyone with whom we choose to share our data. Each could be copying what we give them access to, intentionally or otherwise. On top of that, they could have really bad security; should their accounts or computers be compromised, whatever we share with them could be in the hands of a hacker in moments.
While that last scenario is not very likely (unless you're a "high-value target" in the hacker's eyes, and he's used your friends to get access to you), it underscores something that is vital to understand: every time you share information with someone, you're giving them a copy, and you're giving them the ability to make more copies, and perhaps even post one of those copies publicly.
Sharing and exchanging data over the "private" internet might not seem quite as private, since there really is no "private" internet at all.
Backing up is copying
The internet is nothing more than a collection of computers that store data and know how to talk to each other. When you use a service like Twitter, send email, upload a photo, or even post a comment on a website like Ask Leo!, that information is stored on a computer not unlike your own3. Those services all take steps to back up the data they contain (hopefully like you do).
Backing up makes a copy of all their data -- including all of your data.
Even if you're the only one using an internet-hosted service -- perhaps your email, cloud storage, online password vault, or who-knows-what -- there's a good chance the service provider is regularly backing up their servers in case something goes wrong. In fact, we hope that's exactly what they do.
How long do they keep the backups? They're not saying. It could be moments or years. But it's possible that whatever you've shared online or stored online for yourself, has been backed up somewhere, somehow, in some way. That's yet another copy of your data that's effectively impossible for you to erase -- which brings us to the reason for all this "internet is forever" kind of talk.
Deleting doesn't delete all
You delete an email. You delete a file from your cloud storage. You delete a photo from your social media account. You delete a tweet. As you can see by now, regardless of exactly what that looks like to you, it's very likely you've deleted only one of many copies of your data.
Yet you can't get it back. Once you delete it, it's gone.
The "catch" is, you've deleted the copy under your control. Perhaps it's the copy most obviously visible to everyone, but it's probably not the only copy.
Unless you have access to those other copies, or you've kept a copy on your own machine, you've lost your data. The online services generally will not restore from their backups (the backups are to recover from their issues, not yours). Hackers certainly aren't going to share with you, even if you can track them down (they're probably overseas anyway). And the NSA isn't going to respond to your request to restore your data from their backups (assuming they've been watching you, of course).
This is why we say "Once you delete it, it's gone." There may be other copies, but there is likely no way to access them.
If it was public, maybe you'll get lucky and find a copy on The Internet Archive; I've recovered an occasional website or web page from there. If it was private, perhaps someone with whom you shared it still has a copy. If it was yours and yours alone, and it was stored in only one place, then you weren't backed up. It's likely gone forever, regardless of how many actual copies there might be out there somewhere.
Unless you have sufficient resources (read: money), a compelling reason, an attorney, and a court order to force an online service to retrieve it, whatever you deleted is gone.
And then it gets weird.
Deleted isn't deleted, except when it is
Whatever you deleted is gone from your grasp. You deleted it, and you can't recover it -- unless you had a backup, of course.
But it's not really gone, now, is it? As frustrating as it is, copies continue to exist: system backups, at a minimum, and possibly archive/mirror copies, research copies, malicious copies, and more.
All out of your reach and out of your control.
There are only two things you can count on, really:
- You can't get it. ("Once you delete it, it's gone.")
- It could still come back to haunt you. ("The internet is forever.")
The solutions are equally simple:
- Back up everything
you keep online. - Don't put anything online that might "haunt" you, for whatever definition of "haunt" you care to assume.
These are exciting times, to be sure, but they're complex and often frustrating times, as well.
Do this
Subscribe to Confident Computing! Less frustration and more confidence, solutions, answers, and tips in your inbox every week.
I'll see you there!
Podcast audio
Footnotes & References
1: In an exceptionally interesting and geeky note, the technology underlying BitCoin, called “block chain”, was used to “Irrevocably Mirror” 20,000 lectures before a legal issue forced the University hosting them to remove them from its site.
2: If English is not your native language, “Babs” is one of the many short forms of “Barbara”. It’s not common, and I’m guessing Ms. Streisand’s not a fan of it. But once published, it’s out there. Forever.
3: Seriously. They might have more cores, more RAM, more disk space, or more whatever, and they might run different operating systems (or not), but the majority of the internet runs on computers that aren’t that different from the desktop computer nearest you.
Since the NSA is supported by our tax dollars, they belong to us and should make their backup services available to us :-) .
Sadly, doing so would also admit that those backups exist. :)
Then they’d have to kill you, as the old line went :)
Interesting, because in my business I am often required to sign NDAs (Non-Disclosure Agreements) that are obviously drawn up by lawyers who have no idea how the internet works. I am often required, after completing a project, to “return all documents and irrevocably delete all copies”, despite the fact that I am also required to properly back up everything, and all documents are sent to and from clients by email. Much of my work is done on the cloud, furthermore. It always amuses me that the lawyers who draft these NDAs seem to think that “returning” a digital document means that I only have “copies” left, as if they behave like paper originals.
Remember, these are the same people that insist on a footer in emails that says (in effect) “if you’re not the person intended, this email is confidential and you should forget everything you read”. Or something like that. :-)
They seem to be forgetting the Streisand effect ;-)
Trying to cover a story up brings more attention to that story.
I first encountered the notion of “cloud” (I detest this term) storage when I installed Microsoft Office 2010. I asked myself then whether I felt like having Uncle Bill’s minions perusing my work, and elected to not use it. Since “cloud” = Internet, and Internet is constantly and repeatedly backed up (We live in a digital landfill), my stuff would be “out there” and out of my control, save for whatever level of security was applied by the storage entity. Choosing storage is a different level of trust than choosing software. I learned to manage my files in an old-school manner, and I consider a thumb drive a perfectly acceptable replacement for “clouds”.
There are programs which can prevent people from seeing what you have stored on the “cloud”. I use Cryptomator which uses military grade encryption. Can it eventually be cracked? Possibly, but at great time and expense, and seriously, I’m not that interesting.
Just remember, if you use email AT ALL, you’re using cloud storage. :-)
The Internet Archive isn’t all it’s cracked up to be, it’s mostly a snapshot of what a site looked like at a certain point in time and most of the links are long dead. Whether or not you can locate the information again is a matter of luck. This is obvious with commercial websites, who would not be keeping up outdated information unless it were a matter of historical importance. The current website represents the company’s newest and most pertinent data for the purpose of creating transactions. If the data or elements were to be stored for later use, say nostalgic or an anniversary celebration then they would probably be stored offline or recreated from classic documents.
One time someone wanted me to recreate an old website they had created and let slip into oblivion by not renewing the URL and letting their server contract lapse. I was able to register the URL as if it were new and I got about 90% of the content restored from the Wayback machine of the Internet Archive.
There is a variation of this rule which I find useful : if you find something interesting on the Internet, and you think you might need it later, save it immediately to a place you control, such as your computer. Don’t count on search engines to retrieve it later. Because, in all likeliness, you won’t be able to.
I say this from experience. Once upon a time, search engines (all right, Google) had that uncanny capability to immediately find whatever you were looking for. However obscure. This time is long gone.
Unless you’re looking for things everybody else is also looking for, meaning gossip about the latest news, the meaning of a word, or something that obvious, it’s likely you will need a lot of filters and attempts to retrieve that Web page you know exists (because you’ve seen it a few years ago), or you won’t be able to.
Not counting active censorship of some pages or media by Google (or other search engines).
That’s why browsers have bookmarks or sometimes called favorites. If you forget to save the bookmark and want to go back to an article, it will likely be found in browser history. Some people clear history and cookies because they think it gives them more privacy. It really doesn’t as those are only deleted from the browser but your ISP and others like Google etc. still have a record of your web history.
I’ve spent literally my entire life as a mindless consumer, and have just here recently been opening the hood on the technology that dominates my life. I’m blown away. Not that this is simple stuff by any means (though you explanation is), but I’m shocked to find that it’s FAR more simple than I thought it was. I was basically imagining magic, but this is all like common sense-style stuff.
Thanks Leo,
A correlary to Murphy’s law as regards to the Web: If you need it, it will be gone. If you need it gone, it will remain forever.