Become a Patron of Ask Leo! and go ad-free!
You Can’t Un-Ring a Bell
Hi everyone! Leo Notenboom here for askleo.com. What I want to talk to you today about relates to privacy in a slightly different way. What I want to do is give you kind of an overview of some of the things that happen to your data whenever you post pretty much anything online.
I’m going to use social media as my example, but in reality, just about any type of public or even private kind of data sharing or data storage online will have some interesting ramifications that I’ll be going over here in a moment. I think what most people don’t realize is that their data gets copied and archived way quicker and in many more places than they realize.
Now, realize also, one of the things I talked about, I think it was last week that one of the joys of digital data is how easy it is to copy. This is what makes backups both possible and in some ways trivial. It’s what makes digital data, to me, so much preferable over the analog counterpart.
That being said, though, digital data is easy to copy and copying happens a lot. So, let’s take a look at a very simple example. You post something: a statement to Twitter, a photograph to Instagram, you share a picture on Flickr – any number of different ways that you are making some amount of content available to probably the world, although, even if you restrict access, in may ways, you’ll find that you’re still making it accessible to more people than I think you really realize.
So, you upload your picture to this service. The very first thing that happens is that your picture gets immediately replicated across multiple servers that service uses to provide their service. I’m sure you realize that places like Facebook or Instagram or Flickr, they’re not just one machine in a closet somewhere, these are literally hundreds, if not thousands of computers each with their own hard disk, each with some amount of shared storage on which all of these photographs exist.
The photographs or your tweet or your text or your whatever, get replicated as quickly as possible across multiple devices to account for failure. In a lot of ways, it’s a form of backup, because what these services don’t want to have happen is have a failure at a specific point in time take data that they could have saved had they simply replicated it as soon as that data arrives.
So your photograph gets uploaded and boom, it’s on a couple of dozen, a couple hundred servers on that service’s hardware already. Next, well, people look at it. As you might expect, that was kind of the whole point of your sharing this data. So what happens when somebody looks at a picture that’s stored on the internet? The picture gets downloaded to their machine. In other words, it’s copied to their machine. That means that everybody who takes a look at your photograph now has a copy of it on their PC. Typically, that copy lives in the browser cache. It can live there for a few minutes, it can live there for several weeks depending on exactly how busy that particular person is with their internet and how much space they’ve allocated to their cache, how many sites they visit, how much room it all takes up.
You get the idea. The point is that a copy of your photograph now lives on their machine, and that’s true for every single person who views your photograph. OK, so that kind of, sort of makes sense. While this is maybe more copies than we might have expected, it makes sense, because we’ve uploaded to a service that’s trying to provide a solid and stable service, and we’re letting people look at it, and they need to be able to do whatever it means to look at a photograph downloaded from the internet.
But wait. There’s more. People can copy your photograph and I mean by more than just seeing it in their cache, people can do things like right-click on a photograph and “Save As” or take a screenshot of the photograph. There are many sites that try to prevent photographs or texts from being copied in various ways and yet, if you can see it on the screen, you can copy it somehow.
There are ways; they’re not necessarily always elegant, but there are absolutely ways to copy whatever can be seen. So, what that means is that if someone likes your photograph, they like the picture that you posted of some dogs, for example. They can download it to their machine and save it for themselves.
You’ve just lost all control over that for sure, because they now have a copy that’s completely in their control that you know nothing about. They can then go ahead do things with it later like turn it into a meme, or who knows what else, but people who can view your information can copy your information. They can download your information.
So that kind of sort of makes sense. People have access to the data; they can copy it; they can do things with it on their computers. Another source of surprise for many people are search engines. So we uploaded our photograph to a place say, like Flickr or Instagram or whatever, and many of these sites are enabled in the search engines which means if you were to search for something, you might find your photograph on one of these hosting services because a search engine came along some time prior to that and said, “Oh, here’s a photograph. There are these words associated with it. I’ll return it in some search results.”
Now, there are what I’ll call “gentleman’s agreements” that allow a site to say, “No, don’t do this to me. Don’t index my content.” But it’s a gentleman’s agreement and not all search engines are gentlemen, and there are a lot more search engines than you realize.
We tend of think of the “Big Two” right now: Google and Bing. But in reality, there are hundreds if not thousands of search engines around the planet. Once you post content online, it could be getting indexed by any, if not, all of these different search engines.
Worse. Many of these search engines create what are called caches. All that really means is they take a copy of what they’ve indexed, so rather than just necessarily pointing at the original site like Facebook or Instagram or Flickr, they actually copy your photograph or copy your text and put it on their servers in a cache. Google does this.
You can request the cached copy of a website, of a web page when you find it in the search results. There are many reasons for doing that. Google does it in case the site goes away, but other services, other search engines do it for a variety of reasons. Again, what that means is your content, your text, your photograph has just now been copied on to other services around the planet that you don’t even know of.
But wait … it gets worse. We talk about search engines but in reality, what there are out there are these things called “spiders” and what they’re doing is essentially what a search engine does. It goes out and it indexes the web; it tries to find all of the different pages on the internet and see what’s there and index them so that you can find them.
Those aren’t the only reasons a spider might exist. One very legitimate example is research. A lot of universities, a lot of computer science programs, a lot of educational institutions have their students write spiders that go out and index the web, retrieve content from the web, archive content from the web for various and sundry research purposes. Your data might be part of that.
But wait … we’re not done. There’s more! Archiving is a really, really interesting word and especially concept when it comes to the internet. Because the internet is always changing in one way or another, there are various and sundry services that attempt to archive what’s out on the internet at any point in time.
The most famous, perhaps, is archive.org. You can find old versions of askleo on archive.org to see what it was like 13 years ago. The point being, though, that these sites specifically take copies of what they find on the internet for archival purposes.
Well, your information may be part of that. If your information is visible on one of these sites that is getting archived, it will get archived along with everything else. Ok, great. We’ve got search engines, we’ve got research spiders, we’ve got archives, what else is there?
Well, we’re not done. At this high level, there’s one more thing that I talk about all the time, and yet everybody forgets when they think of online data and that is backing up. And this is particularly true for even data you considered to be private and online – your email, your files that you use in a file sharing service. In addition to replicating the data as soon as they get it, so that they can provide a high level of availability immediately, they’re backing up your data.
What that means is that they’re creating copies of your data and storing it somewhere. And we don’t know for how long. It could be a few days; it could be a few weeks; it could be for years. There’s literally no way to know because these services don’t tell us.
And if the people who are viewing what it is we’re sharing with the world take the time and trouble to back up their computer, they’ve potentially backed up your photographs, maybe in their internet cache, maybe in their explicit saving of your pictures. Who knows? The bottom line is your information could be backed up there as well.
The point that I’m trying to make here is that when you share something publicly, and even when you share something privately, the data that you share online or using online services is getting replicated in dozens if not hundreds of different places whether or not you realize it and to be honest, whether you or not you really want it to be because it’s all out of your control.
The bottom line is that once you share something publicly and even when you share something privately, you lose a tremendous amount of control over what happens to that data. Now, there are two pushbacks I always get when I talk about this.
One is well, can I just ask people to remove the data? The answer is you can ask all you want. This is actually partly what the so-called the “right to be forgotten” is all about. It’s an attempt at some legislation that would force the search engines to not point to data on request of the person who the data is about. For example, if I didn’t like something that was posted about me, I could request that the search engines not point to it. Note that doesn’t remove the data. The data is still there in the original source.
You would then actually have to go to each and every individual source that might have replicated that data and ask them to remove it. Do you know who all those sources are? Of course not. Neither do I. There’s no way to know who made copies of the data once it was published online. Second, asking for something to be removed is just calling attention to it. Originally now referred to as “The Streisand Effect” because the Hollywood star, Barbra Streisand made a big fuss about photographs of her home being removed from the internet, well, all that really did was cause those pictures to be duplicated and posted and reposted again, again, again and again so that we now actually refer to this as “The Streisand Effect” calling attention to something by requesting its removal. You can see it happening to tweets these days a lot particularly, from politicians who maybe speak or tweet a little bit without thinking.
The other question all the time is, “Well, if there are all these copies out there, why can’t I get a copy of my data if I’ve lost it for some reason? Like, I’ve lost my email, or I’ve lost all my photographs. Can’t I go out and get a copy of it from these services that have all of these backups and replications?”
Theoretically, you could but generally they don’t and the reason they don’t is a very simple one. It’s, I’ll say, cost-effective for them to duplicate, replicate and back up absolutely everything all the time. It is not cost-effective for them to go looking for your needle in their haystack, to actually go out and say, retrieve back up copies, no, they’re not going to do that; not on request.
What they will do or what they may do or what they might be required to do is respond to a court order, a subpoena, legal action that might require them to take the time to incur the expense to go out and retrieve that information, but to just get it for you especially if the service is free, to just get it for you, because you happen to lose something?
You’ll find that is in their Terms of Service explicitly stated is your data is your responsibility. If you lose it, they’re not going to find it for you no matter how many copies of it they may have stored away in various places. So, the bottom line here really is be aware that when you post something online, you really are losing all control over it. You’re basically setting it free to go out and live a life of its own out on the internet. Where it ends up, what happens to it, what people will do with it, whether they steal it, copy it or completely ignore it, there’s simply no way to tell, but it is important to realize that every time you post anything online.
What do you think? Is this too scary? I mean, we’re doing it every day, and it doesn’t really seem to hurt us very often. What really are the ramifications that have you concerned about this massive data replication that happens whenever we do something? Let me know? As always, here’s a link to this article out on askleo.com. That’s where I have the video posted along with moderated comments where I read every comment. We keep the trolls out. We keep the discussion civil. I’d really be interested in what you have to say. Until next time, I’m Leo Notenboom for askleo.com. Remember, be safe, have fun, and don’t forget to make a few copies of your own. Don’t forget to back up. Take care.
Was that video interesting? Helpful even? Well, then I could use your help. I’ve got a Patreon project under way. You’ve got an opportunity to contribute and help support askleo.com to help me do what I do: Help more people, answer more questions, produce more information about technology that hopefully can help you and others use it more effectively and with more confidence. Visit Patreon.com to learn more. Among other things, you get rewards depending on the level of your patronage so check out patreon.com/askleo to learn more and help contribute to askleo.com. Thanks.
Subscribe to Confident Computing! Less frustration and more confidence, solutions, answers, and tips in your inbox every week.
I'll see you there!