Technology in terms you understand. Sign up for my weekly newsletter, "Confident Computing", for more solutions you can use to make your life easier. Click here.

The Internet Is Forever, Except When It’s Not

//
I was taught that it is impossible to delete, I mean totally delete, anything on the internet. The protocol of the computer or network simply buries it deeper in the systems and scrambles a random password to recall it. Find that scramble code and Presto! You can recall the deleted item(s).

Um…. no. There’s no magical “scramble code” to recover anything.

But that does raise a very interesting conundrum. We often say “the internet is forever”, while at the same time saying, “be sure to back up, because once you delete it, it’s gone”.

The ways of both the internet and deletion are more complex than most people realize. While these two statements appear to be diametrically opposed, they’re both very, very true, often at exactly the wrong time.

Become a Patron of Ask Leo! and go ad-free!

The public internet & Barbra Streisand

As many people have discovered, one of the fastest ways to spread a rumor is to call it a secret. That’s true in life, but nowhere is it more true than on the internet.

Just ask Barbra Streisand. There’s a reason there’s something now called “the Streisand effect“.

In 2003, she attempted to prevent photographs of her home from being published on the internet. This drew more attention, rather than less, to the images. It prompted people to copy and re-post the photos elsewhere, again and again and again. Indeed, the very Wikipedia article that describes “the Streisand effect” includes the picture she originally attempted to suppress. There’s simply no way she, or anyone else, could find and delete all the copies of that photo that were made, and have continued to be made in the intervening years. Indeed, any attempt to do so would probably just spur another round of copying and re-posting.

We see this all the time with information posted publicly that is later removed or altered. Be it a tweet, a photo, a database, or something else, usually someone, somewhere, has made a copy of its original form. On discovering the attempts to rewrite or delete history, they’re quick to call shenanigans on the effort by posting the original as proof. Depending on the perceived relevance of the information, the original may be re-posted once or many times in many places, making its removal from the public internet as impossible as removing the photo of Ms. Streisand’s home.1

This is one reason we say “the internet is forever”. Anything you post publicly can be copied. There’s no easy way to know by who, or by how many, but you can never assume that the number of copies is zero. Never. Once it’s on the public internet, you lose all control over it – whatever it might be – the instant that first someone makes a copy.

The public internet & you

OK, so you’re not Babs2. To the best of your knowledge, no one cares about your tweets, photos, or whatever it is you care to post online. No one’s making copies of what you post publicly.

You’re wrong.

While it’s likely that you, as an individual, aren’t that interesting, that doesn’t mean that what you’re posting online isn’t being copied, and probably quite quickly. You’re “interesting” in the sense that you’re a user of Twitter, Google Photos, Flickr, or whatever public service you happen to use. Those sites are mirrored regularly. Why?

Puget Sound Software, in 2003

  • Search engines like to keep local, cached copies in case the site goes offline.
  • Understanding how to “spider the web,” as it’s called, is something computing science students learn by writing spiders that pick a target and mirror it. Others just do it for fun.
  • The Internet Archive attempts to keep copies of all public websites to preserve our digital history.  (Pictured to the right, my company’s website as of 2003.)
  • Many sites exist specifically to mirror other sites, or portions of other sites. Pick a prominent politician on Twitter, for example, and I can pretty much guarantee you there’s a site keeping copies of that person’s tweets.

And that’s before we even consider corporations, malicious agents, and governments copying public information for their records, analysis, and uses unknown.

Is something you’ve posted in there? Probably. Will it matter someday? There’s no way to know. I personally publish a lot, but I don’t expect it to be a problem.

Hope I’m right. Smile

The private internet

I keep using the phrase “public internet” because it’s an important distinction many people fail to keep in mind. Public is public, and as we’ve seen, public must be considered to be “forever”.

So instead, we ratchet up our privacy settings, restricting who is allowed to access our stuff, or perhaps only emailing certain things to certain people. We keep it “private” – or so we think.

Yet we remain at the mercy of everyone with whom we choose to share our data. Each could be copying what you’ve given them access to, intentionally or otherwise. On top of that, they could have really bad security; should their accounts or computers be compromised, whatever they have could be in the hands of a hacker in moments.

While that last scenario is not very likely, (unless you’re a “high value target” in the hacker’s eyes, and he’s used your friends to get access to you), it underscores something that is vital to understand: every time you share information with someone, you’re giving them a copy, or you’re giving them the ability to make a copy.

Sharing and exchanging data over the “private” internet might not seem quite as private, since there really is no “private” internet at all.

Backing up the internet

The internet isn’t a “series of tubes” at all; it’s nothing more than a collection of computers that store data and know how to talk to each other. When you use a service like Twitter, send email, upload a photo, or even post a comment on a web site like Ask Leo!, that information is stored on a computer not unlike your own3. Those services are all taking steps to back up the data they contain (hopefully like you are).

Drive ArrayBacking up makes a copy of all the data – including all your data.

Even if you’re the only one using an internet-hosted service – perhaps your email, cloud storage, online password vault, or who-knows-what – there’s a good chance the service provider is regularly backing up their servers in case something goes wrong. In fact, we hope that’s exactly what they do.

How long do they keep they backups? They’re not saying. It could be moments or years. But it’s possible that whatever you’ve shared online, or stored online for yourself, has been backed up somewhere, somehow, in some way. That’s yet another copy of your data that’s effectively impossible for you to erase – which brings us to the reason for all this “internet is forever” kind of talk.

Deleting files online

You delete an email. You delete a file from your cloud storage. You delete a photo from your social media account. You delete a tweet. As you can see by now, regardless of exactly what that looks like to you, it’s very likely you’ve deleted only one of many copies of your data.

Yet, you can’t get it back. Once you delete it, it’s gone.

The “catch” is that you’ve deleted the copy that’s under your control. Perhaps it’s the copy that’s most obviously visible to everyone, but it’s probably not the only copy.

Unless you have access to those other copies, or you’ve kept a copy on your own machine, you’ve lost your data. The online services generally will not restore from their backups (the backups are to recover from their issues, not yours). Hackers certainly aren’t going to share with you, even if you can track them down (they’re probably overseas anyway). And the NSA isn’t going to respond to your request to restore your data from their backups (assuming they’ve been watching you, of course).

This is why we say “once you delete it, it’s gone”. There may be other copies, but there is likely no way to access them.

If it was public, maybe you’ll get lucky and find a copy on The Internet Archive – I’ve recovered an occasional website or web page from there on occasion. If it was private, perhaps someone with whom you shared it still has a copy. If it was yours and yours alone, and it was stored in only one place, then you weren’t backed up. It’s likely gone forever, regardless of how many actual copies there might be.

Unless you have sufficient resources (read: money), a compelling reason, an attorney, and a court order to force an online service to retrieve it, whatever it is you deleted is gone.

And then it gets weird.

Deleted isn’t deleted except when it is

Whatever you deleted is gone from your grasp. You deleted it, and you can’t recover it. Unless you had a backup, of course.

But it’s not really gone, now, is it? As frustrating as it is, copies continue to exist: system backups, at a minimum, and possibly archive/mirror copies, research copies, malicious copies, and more.

All out of your reach and out of your control.

There are only two things you can count on, really:

  • You can’t get it. (“Once you delete it, it’s gone.”)
  • It could still come back to haunt you. (“The internet is forever.”)

The solutions are equally simple:

  • Back up everything you keep online.
  • Don’t put anything online that might “haunt” you, for whatever definition of “haunt” you care to assume.

These are exciting times, to be sure, but they’re complex and often frustrating times, as well.

Podcast audio

Play

Footnotes & references

1: In an exceptionally interesting and geeky note, the technology underlying BitCoin, called “block chain”, was used to “Irrevocably Mirror” 20,000 lectures before a legal issue forced the University hosting them to remove them from its site.

2: If English is not your native language, “Babs” is one of the many short forms of “Barbara”. It’s not common, and I’m guessing Ms. Streisand’s not a fan of it. But once published, it’s out there. Forever.

3: Seriously. They might have more cores, more RAM, more disk space, or more whatever, and they might run different operating systems (or not), but the majority of the internet runs on computers that aren’t that different from the desktop computer nearest you.

10 comments on “The Internet Is Forever, Except When It’s Not”

  1. Since the NSA is supported by our tax dollars, they belong to us and should make their backup services available to us 🙂 .

  2. Interesting, because in my business I am often required to sign NDAs (Non-Disclosure Agreements) that are obviously drawn up by lawyers who have no idea how the internet works. I am often required, after completing a project, to “return all documents and irrevocably delete all copies”, despite the fact that I am also required to properly back up everything, and all documents are sent to and from clients by email. Much of my work is done on the cloud, furthermore. It always amuses me that the lawyers who draft these NDAs seem to think that “returning” a digital document means that I only have “copies” left, as if they behave like paper originals.

    • Remember, these are the same people that insist on a footer in emails that says (in effect) “if you’re not the person intended, this email is confidential and you should forget everything you read”. Or something like that. 🙂

  3. I first encountered the notion of “cloud” (I detest this term) storage when I installed Microsoft Office 2010. I asked myself then whether I felt like having Uncle Bill’s minions perusing my work, and elected to not use it. Since “cloud” = Internet, and Internet is constantly and repeatedly backed up (We live in a digital landfill), my stuff would be “out there” and out of my control, save for whatever level of security was applied by the storage entity. Choosing storage is a different level of trust than choosing software. I learned to manage my files in an old-school manner, and I consider a thumb drive a perfectly acceptable replacement for “clouds”.

    • There are programs which can prevent people from seeing what you have stored on the “cloud”. I use BoxCryptor which uses military grade encryption. Can it eventually be cracked? Possibly, but at great time and expense, and seriously, I’m not that interesting.

  4. The Internet Archive isn’t all it’s cracked up to be, it’s mostly a snapshot of what a site looked like at a certain point in time and most of the links are long dead. Whether or not you can locate the information again is a matter of luck. This is obvious with commercial websites, who would not be keeping up outdated information unless it were a matter of historical importance. The current website represents the company’s newest and most pertinent data for the purpose of creating transactions. If the data or elements were to be stored for later use, say nostalgic or an anniversary celebration then they would probably be stored offline or recreated from classic documents.

Leave a reply:

Before commenting please:

  • Read the article.
  • Comment on the article.
  • No personal information.
  • No spam.

Comments violating those rules will be removed. Comments that don't add value will be removed, including off-topic or content-free comments, or comments that look even a little bit like spam. All comments containing links and certain keywords will be moderated before publication.

I want comments to be valuable for everyone, including those who come later and take the time to read.