Become a Patron of Ask Leo! and go ad-free!
Transcript
You Can’t Un-Ring a Bell
Hi everyone! Leo Notenboom here for askleo.com. What I want to talk to you today about relates to privacy in a slightly different way. What I want to do is give you kind of an overview of some of the things that happen to your data whenever you post pretty much anything online.
I’m going to use social media as my example, but in reality, just about any type of public or even private kind of data sharing or data storage online will have some interesting ramifications that I’ll be going over here in a moment. I think what most people don’t realize is that their data gets copied and archived way quicker and in many more places than they realize.
Now, realize also, one of the things I talked about, I think it was last week that one of the joys of digital data is how easy it is to copy. This is what makes backups both possible and in some ways trivial. It’s what makes digital data, to me, so much preferable over the analog counterpart.
That being said, though, digital data is easy to copy and copying happens a lot. So, let’s take a look at a very simple example. You post something: a statement to Twitter, a photograph to Instagram, you share a picture on Flickr – any number of different ways that you are making some amount of content available to probably the world, although, even if you restrict access, in may ways, you’ll find that you’re still making it accessible to more people than I think you really realize.
So, you upload your picture to this service. The very first thing that happens is that your picture gets immediately replicated across multiple servers that service uses to provide their service. I’m sure you realize that places like Facebook or Instagram or Flickr, they’re not just one machine in a closet somewhere, these are literally hundreds, if not thousands of computers each with their own hard disk, each with some amount of shared storage on which all of these photographs exist.
The photographs or your tweet or your text or your whatever, get replicated as quickly as possible across multiple devices to account for failure. In a lot of ways, it’s a form of backup, because what these services don’t want to have happen is have a failure at a specific point in time take data that they could have saved had they simply replicated it as soon as that data arrives.
So your photograph gets uploaded and boom, it’s on a couple of dozen, a couple hundred servers on that service’s hardware already. Next, well, people look at it. As you might expect, that was kind of the whole point of your sharing this data. So what happens when somebody looks at a picture that’s stored on the internet? The picture gets downloaded to their machine. In other words, it’s copied to their machine. That means that everybody who takes a look at your photograph now has a copy of it on their PC. Typically, that copy lives in the browser cache. It can live there for a few minutes, it can live there for several weeks depending on exactly how busy that particular person is with their internet and how much space they’ve allocated to their cache, how many sites they visit, how much room it all takes up.
You get the idea. The point is that a copy of your photograph now lives on their machine, and that’s true for every single person who views your photograph. OK, so that kind of, sort of makes sense. While this is maybe more copies than we might have expected, it makes sense, because we’ve uploaded to a service that’s trying to provide a solid and stable service, and we’re letting people look at it, and they need to be able to do whatever it means to look at a photograph downloaded from the internet.
But wait. There’s more. People can copy your photograph and I mean by more than just seeing it in their cache, people can do things like right-click on a photograph and “Save As” or take a screenshot of the photograph. There are many sites that try to prevent photographs or texts from being copied in various ways and yet, if you can see it on the screen, you can copy it somehow.
There are ways; they’re not necessarily always elegant, but there are absolutely ways to copy whatever can be seen. So, what that means is that if someone likes your photograph, they like the picture that you posted of some dogs, for example. They can download it to their machine and save it for themselves.
You’ve just lost all control over that for sure, because they now have a copy that’s completely in their control that you know nothing about. They can then go ahead do things with it later like turn it into a meme, or who knows what else, but people who can view your information can copy your information. They can download your information.
So that kind of sort of makes sense. People have access to the data; they can copy it; they can do things with it on their computers. Another source of surprise for many people are search engines. So we uploaded our photograph to a place say, like Flickr or Instagram or whatever, and many of these sites are enabled in the search engines which means if you were to search for something, you might find your photograph on one of these hosting services because a search engine came along some time prior to that and said, “Oh, here’s a photograph. There are these words associated with it. I’ll return it in some search results.”
Now, there are what I’ll call “gentleman’s agreements” that allow a site to say, “No, don’t do this to me. Don’t index my content.” But it’s a gentleman’s agreement and not all search engines are gentlemen, and there are a lot more search engines than you realize.
We tend of think of the “Big Two” right now: Google and Bing. But in reality, there are hundreds if not thousands of search engines around the planet. Once you post content online, it could be getting indexed by any, if not, all of these different search engines.
Worse. Many of these search engines create what are called caches. All that really means is they take a copy of what they’ve indexed, so rather than just necessarily pointing at the original site like Facebook or Instagram or Flickr, they actually copy your photograph or copy your text and put it on their servers in a cache. Google does this.
You can request the cached copy of a website, of a web page when you find it in the search results. There are many reasons for doing that. Google does it in case the site goes away, but other services, other search engines do it for a variety of reasons. Again, what that means is your content, your text, your photograph has just now been copied on to other services around the planet that you don’t even know of.
But wait … it gets worse. We talk about search engines but in reality, what there are out there are these things called “spiders” and what they’re doing is essentially what a search engine does. It goes out and it indexes the web; it tries to find all of the different pages on the internet and see what’s there and index them so that you can find them.
Those aren’t the only reasons a spider might exist. One very legitimate example is research. A lot of universities, a lot of computer science programs, a lot of educational institutions have their students write spiders that go out and index the web, retrieve content from the web, archive content from the web for various and sundry research purposes. Your data might be part of that.
But wait … we’re not done. There’s more! Archiving is a really, really interesting word and especially concept when it comes to the internet. Because the internet is always changing in one way or another, there are various and sundry services that attempt to archive what’s out on the internet at any point in time.
The most famous, perhaps, is archive.org. You can find old versions of askleo on archive.org to see what it was like 13 years ago. The point being, though, that these sites specifically take copies of what they find on the internet for archival purposes.
Well, your information may be part of that. If your information is visible on one of these sites that is getting archived, it will get archived along with everything else. Ok, great. We’ve got search engines, we’ve got research spiders, we’ve got archives, what else is there?
Well, we’re not done. At this high level, there’s one more thing that I talk about all the time, and yet everybody forgets when they think of online data and that is backing up. And this is particularly true for even data you considered to be private and online – your email, your files that you use in a file sharing service. In addition to replicating the data as soon as they get it, so that they can provide a high level of availability immediately, they’re backing up your data.
What that means is that they’re creating copies of your data and storing it somewhere. And we don’t know for how long. It could be a few days; it could be a few weeks; it could be for years. There’s literally no way to know because these services don’t tell us.
And if the people who are viewing what it is we’re sharing with the world take the time and trouble to back up their computer, they’ve potentially backed up your photographs, maybe in their internet cache, maybe in their explicit saving of your pictures. Who knows? The bottom line is your information could be backed up there as well.
The point that I’m trying to make here is that when you share something publicly, and even when you share something privately, the data that you share online or using online services is getting replicated in dozens if not hundreds of different places whether or not you realize it and to be honest, whether you or not you really want it to be because it’s all out of your control.
The bottom line is that once you share something publicly and even when you share something privately, you lose a tremendous amount of control over what happens to that data. Now, there are two pushbacks I always get when I talk about this.
One is well, can I just ask people to remove the data? The answer is you can ask all you want. This is actually partly what the so-called the “right to be forgotten” is all about. It’s an attempt at some legislation that would force the search engines to not point to data on request of the person who the data is about. For example, if I didn’t like something that was posted about me, I could request that the search engines not point to it. Note that doesn’t remove the data. The data is still there in the original source.
You would then actually have to go to each and every individual source that might have replicated that data and ask them to remove it. Do you know who all those sources are? Of course not. Neither do I. There’s no way to know who made copies of the data once it was published online. Second, asking for something to be removed is just calling attention to it. Originally now referred to as “The Streisand Effect” because the Hollywood star, Barbra Streisand made a big fuss about photographs of her home being removed from the internet, well, all that really did was cause those pictures to be duplicated and posted and reposted again, again, again and again so that we now actually refer to this as “The Streisand Effect” calling attention to something by requesting its removal. You can see it happening to tweets these days a lot particularly, from politicians who maybe speak or tweet a little bit without thinking.
The other question all the time is, “Well, if there are all these copies out there, why can’t I get a copy of my data if I’ve lost it for some reason? Like, I’ve lost my email, or I’ve lost all my photographs. Can’t I go out and get a copy of it from these services that have all of these backups and replications?”
Theoretically, you could but generally they don’t and the reason they don’t is a very simple one. It’s, I’ll say, cost-effective for them to duplicate, replicate and back up absolutely everything all the time. It is not cost-effective for them to go looking for your needle in their haystack, to actually go out and say, retrieve back up copies, no, they’re not going to do that; not on request.
What they will do or what they may do or what they might be required to do is respond to a court order, a subpoena, legal action that might require them to take the time to incur the expense to go out and retrieve that information, but to just get it for you especially if the service is free, to just get it for you, because you happen to lose something?
You’ll find that is in their Terms of Service explicitly stated is your data is your responsibility. If you lose it, they’re not going to find it for you no matter how many copies of it they may have stored away in various places. So, the bottom line here really is be aware that when you post something online, you really are losing all control over it. You’re basically setting it free to go out and live a life of its own out on the internet. Where it ends up, what happens to it, what people will do with it, whether they steal it, copy it or completely ignore it, there’s simply no way to tell, but it is important to realize that every time you post anything online.
What do you think? Is this too scary? I mean, we’re doing it every day, and it doesn’t really seem to hurt us very often. What really are the ramifications that have you concerned about this massive data replication that happens whenever we do something? Let me know? As always, here’s a link to this article out on askleo.com. That’s where I have the video posted along with moderated comments where I read every comment. We keep the trolls out. We keep the discussion civil. I’d really be interested in what you have to say. Until next time, I’m Leo Notenboom for askleo.com. Remember, be safe, have fun, and don’t forget to make a few copies of your own. Don’t forget to back up. Take care.
♥
Was that video interesting? Helpful even? Well, then I could use your help. I’ve got a Patreon project under way. You’ve got an opportunity to contribute and help support askleo.com to help me do what I do: Help more people, answer more questions, produce more information about technology that hopefully can help you and others use it more effectively and with more confidence. Visit Patreon.com to learn more. Among other things, you get rewards depending on the level of your patronage so check out patreon.com/askleo to learn more and help contribute to askleo.com. Thanks.
Do this
Subscribe to Confident Computing! Less frustration and more confidence, solutions, answers, and tips in your inbox every week.
I'll see you there!
Hi Leo – did you realise there is no transcript of this video on the website?
As it is 16 mins long, I was hoping to be able to skim read the transcript, but when I click on the button, it just says “coming soon”. But 24 hours later there is no sign of it.
Is it coming?
Many thanks for all your great advice by the way!
Cheers, Helen.
Transcripts usually come a day or 2 after the video is uploaded.
Coming soon.
It’s up now.
Hi Leo, I was going to post anyway to THANK YOU for going to the trouble to give transcripts of your videos. It annoys me immensely that so much info is now only available in video, That’s fine for demonstating something that needs, or at least benefits from video (like how to collimate a telescope), but in so many cases it’s the author being lazy, I think – easier to make an informal video that type out something where traditionally little matters like spelling, grammar and syntax matter more! With you we get to have the choice! What more could one want!
For the record I do sympathise with videographers that choose not to post a transcript. It’s not cheap. It’s not expensive, but it definitely costs to have someone (Hi Andrea!) do the work. Machine-generated closed-captioning, like those available on most YouTube videos, aren’t quite there yet either – and of course that’s not presented as a transcript.
Well I always appreciate the transcripts and I’m certain many other hearing impaired and deaf folks do as well. So thanks Leo (and Andrea). You work is treasured.
Nine times out of ten I will only read the transcript and skip the video – just my preference. So thank you for going the extra mile Leo (and Andrea).
I never watch the videos. Only the transcript I read.
Yes, quite an eye opener, Leo. Makes perfect sense, all of it.
I know that you can’t address all solutions to the photo sharing problem. But, what I find interesting most times is, that people don’t resize there photos when putting them on such services as Flickr and Facebook. Posting the full size photo gives everyone the best possible image to use, particularly far off commercial companies.
See what you mean but the point of me sending a photo to my daughter is that she can see it all in its glory and copy and print it.
Given what’s being described in the article it makes no difference which media you use email or flickr they all copy it.
The only other means is snail mail, rather defeats the purpose of instant communication not to mention time and cost.
Dan’s comments apply particularly to social media sites where unknown parties can copy your photos. Emailing or sharing them through a site like Dropbox or OneDrive would prevent third party copying unless, of course, those accounts were hacked.
Makes me want to use what I understand is a search engine that does not track me. And maybe I’ll change privacy settings in Windows 10. But at the same time I do like the Cortana option.
Something confirming ABSOLUTELY what you were talking about.
A person I know was having a paid up search job in USA and he was asking about what Laptop-PC should he buy there as they are less expensive than in Canada.
I wrote to him on AOL an e-mail specifications of what he should try to get. Lenovo, 8 GB memory, quad processor AMD, 1 TB HDD.
To my surprise, while doing something on a browser (not AOL!!), I got a message on my screen that I DID NOT ASK FOR, giving specifications EXACTLY OF WHAT I DESCRIBED and stating that for one day (ONLY!!) there is $70 off and a price better than average in Canada and the price in CANADIAN DOLLARS although it is shipped from USA!!!!
I wrote about it to numerous friends and colleagues and more surprisingly, some of them wrote about similar experiences!!!!!!! One described it as “Target Marketing!!!!”
Believe it or not…This is a true VERY REAL HAPPENING!!!
That seems unrelated to this article. But yes, targeting marketing is a very real thing. If you visit, say, a company’s web page looking for product specifications, then wander off elsewhere on the net, it’s extremely possible, even likely, that you will see ads for what you were just looking for. A little creepy, but totally legit. More here:
http://ask-leo.com/why_do_these_ads_keep_following_me_around_the_internet.html
Leo
Do all these apply to all cloud services as well?
I am a bit worried now
It depends on exactly what you’re worried about, but in short: yes.
Does it mean that whatever data I stored on cloud (E.g. Onedrive, dropbox, google drive, amazon…) (data only accessible with my own login), after I remove the file/data, there is still a copy somewhere in the cloud server and someone else can see the file/data? Is it only a copy on that cloud company server or is that copy also available to the public?
Thanks
If you delete the files on your computer, they will be removed from the Dropbox, OneDrive etc websites. Ie. if you log on to your account, the files will sometimes still be available in the trash folders. And as the article states, those files will reside somewhere in the form of archives and backups for some indeterminate time, possibly forever. Those files will likely be encrypted and generally only accessible by law enforcement officials.
None of those file on cloud storage servers are ever made available to the public if those companies are doing their job correctly. The only files which are publicly available are the ones you make public by creating a link to them, and even then those files are only available to those who know the link (not that secure, so be careful not to create non-encrypted public links to sensitive files). Any public links you created to those files and sent to people would no longer work once you’ve deleted the files.
Should only be available to the cloud service. It could be as simple as backups they took of their servers while your data was on it.
I seem to remember that, some years ago, I read somewhere that Facebook records every keystroke made by any given user, and that record is retained, EVEN IF THAT USER DELETES ALL THE TEXT THAT HE/SHE HAS TYPED!
I understood from the article that this applies, EVEN IF THE USER CANCELLED THE POST – I don’t mean deleted it after posting: I mean, that the user changed his/her mind, and decided not to go ahead and post.
I also understaood from the article that the reason Facebook records the text as it’s being typed, is to improve the targetting of advertisements to any given user.
Just my 2¢ worth… :)
Does anyone else remember reading this article, or something similar? I’m fairly sure that it could be anything up to 5 years ago when I saw the article, and I seriously doubt that I could find a link to it now, but if I do, I’ll post it here.
I don’t believe that Facebook records every keystroke. That’s a little too “conspiracy theory” for me. They do, obviously, work with the text as you type (it’s how they automatically turn names of Facebook friends and pages into links, for example), so they could, of course. But man that’d be a lot of text. I doubt very seriously that they’re recording it.
Comes back to another of my maxims: don’t use services you don’t trust. (Or, don’t use them further than you trust them.) If even the possibility concerns you, then your only recourse is to get off of Facebook.
I don’t believe in most conspiracy theories either, but it does seem like they do record all key strokes. They also have the ‘Friend is typing’ feature on articles which is additional evidence that they are recording it. Whether they save them and use them is another story, although if there a way to monetize those keystrokes, I wouldn’t be surprised if they took advantage of it. No conspiracy, just monetization.
“Friend is typing” only means that they notice the typing happens. THat’s NO indication that they’re being recorded. (Recorded to me means saving them somewhere.)
I don’t believe that they record it either but, as you say, they certainly could. In fact, Facebook’s data scientists did exactly that in a study on self-censorship behaviour…..
“For our purposes, we operationalize “self-censorship” as any non-trivial content that users began to write on Face- book but ultimately did not post. This method is fast and ightweight enough to not affect users’ experience of Face- book. We also believe that this approach captures the es- sence of self-censorship behavior: The users produced con- tent, indicating intent to share, but ultimately decided against sharing.”
http://www.aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/viewFile/6093/6350
Fascinating. Thanks for that.
Very many thanks for the informative advice Leo. Although I know that once posted it’s there for all to see, I didn’t realise just how wide scale it is!! Forewarned is forearmed and I shall be thinking twice in future thanks to your kind help.
A very Merry Christmas to you, a safe, healthy and Happy New Year too!
Best wishes, Sue
Some years ago a client hired me to write several articles including specific SEO phrases and post them on a number of free article websites. These are copyright free websites where anyone can post articles and anyone can copy the info as presented to other websites. I was amazed at how quickly these proliferated around the world (which was just what the client wanted). Although they no longer appear (thankfully) at the top if you search on my name, they are still out there. I just searched on a few of my titles and see some have been turned into YouTube videos (and no longer include my byline).
Worse are podcasts, even including my own. Music download (aka theft) sites will go around and suck up anything that even looks like an “mp3”, and copy it to their sites. So I regularly get reports of my podcast episodes on all sorts of questionable sites.
Leo, Very interesting video! Firefox has an app that will retrieve a deleted website or page thereof. I lost pages of my website and am able to access them with the Firefox app. I assume it accesses a cache from some search engine.
Now that you have explained in great details what happens to data uploaded online, I can tell you with a big “YES”, it is scary enough. By the way, I am taking the opportunity to wish you and your family a prosperous, happy and especially HEALTHY (the most important) new Year!