Technology in terms you understand. Sign up for the Confident Computing newsletter for weekly solutions to make your life easier. Click here and get The Ask Leo! Guide to Staying Safe on the Internet — FREE Edition as my thank you for subscribing!

I Gave AI My Search History and Here’s What It Told Me

We leak so much information about ourselves.

A 16:9 photorealistic collage featuring various search bar queries in different styles and fonts, scattered across the image. In the center, there is a large, bold question mark that stands out against the background of search queries. The overall composition is dynamic and visually appealing, with a focus on the central question mark symbolizing curiosity and the quest for knowledge through internet searches.
(Image: DALL-E 3)

I was listening to a podcast recently in which one of the hosts described giving ChatGPT the history of the questions he had asked it and then asking it to build a profile on him based on that.

It was interesting, of course, but it got me to thinking… why not use internet searches as the input? There’s much more information available in our search histories since we use it for more general-purpose things.

I was not disappointed, a little surprised, and more than a little impressed.

Become a Patron of Ask Leo! and go ad-free!

TL;DR:

Using AI to build a profile from search history

I used AI tools to build a personal profile of me based on my Google search history. Claude AI used 16,000 searches over five years, but ChatGPT managed with just 126 over six months. Both provided surprisingly accurate profiles. The upshot is that benign data can be used to construct detailed personal profiles. We all need to be aware of online data leakage and what we expose when we use the internet.

My search history

My Google search history goes back to 2006 and has 48,392 entries. The first entry was “signed up for” (I have no idea why), and the most recent search was “download search history” (for obvious reasons)1. The whole list was over 900 megabytes in size.

That’s a lot of data to chew on.

In fact, it was too much.

Claude

I started with Claude because it can accept much more data in a query than ChatGPT can. Unfortunately, 48,000 search terms were still over the limit. In fact, I had to cut it back to 16,000 searches representing the most recent five years.

I wrote this query:

Provide a profile for the person whose search history is as follows:

followed by 16,000 searches.

Here’s what Claude had to say:

Based on the search history provided, here is a possible profile for this person:

– Male, likely in his 40s or 50s
– Lives in the Seattle area
– Works in tech, possibly as a software developer or IT professional
– Interested in computers, technology, online security, Linux/Unix, web development (WordPress, PHP, etc)
– Owns a Nikon digital SLR camera
– Dog owner, possibly of a Welsh Corgi
– Enjoys watching TV shows like Game of Thrones, Westworld, Vikings
– Travels occasionally, including some international trips
– Financially savvy; invests and tracks finances online
– Health-conscious; watches diet and nutrition
– Politically left-leaning

Some key details that point to this profile:

– Technical topics like Linux command line, security tips, and web development suggest a tech professional
– Interest in Nikon cameras and Photoshop indicates a photography hobby
– Corgi-related searches imply a Corgi owner
– TV interests point to sci-fi/fantasy and historical dramas
– Travel searches about things like passport renewal and international banking
– Finance-related searches on stocks, Bitcoin, and bank accounts
– Nutrition/diet searches suggest health-consciousness
– Political searches on topics like ACLU donations suggest left-leaning views

The combination of tech-related, finance, photography and pet hobbies indicates a male in his 40s or 50s who likely lives in the Seattle area and works in a tech-related job. Some details may be guesses, but the search topics provide a reasonably detailed profile.

It’s not far off. (Though I chuckled that it thinks I’m in my 40s or 50s. Smile) In many ways, much of this seems generic, but remember, this was deduced using only the terms I’ve searched Google for in the last five years.

I was impressed.

ChatGPT

ChatGPT’s limits are such that I could give it only about 126 searches from the last six months2 of my searches.

Same question:

Provide a profile for the person whose search history is as follows:

followed by 126 searches.

Here’s what ChatGPT had to say:

Based on this search history, the profile of this person appears to be quite diverse, indicating a range of interests and knowledge areas:

  1. Tech-Savvy and Problem Solver: Searches related to downloading search history, using regex in Python, and troubleshooting various tech issues (like macOS cron alternatives and dealing with ‘operation not permitted’ errors) suggest a strong familiarity with technology and an inclination towards solving technical problems.
  2. Interest in Geography and Travel: Queries like distances between countries, converting coordinates to decimal degrees, and looking up country codes indicate an interest in geography or travel.
  3. Automotive Knowledge: Searches about car values, vacuum line diameters in automobiles, and other car-related inquiries suggest a knowledge of, or interest in, automotive topics.
  4. Cultural Interests: The person seems to enjoy entertainment and culture, evident from searches about movies and TV shows (like MCU movies, ‘Good Omens’, and ‘Aliens’), indicating an interest in popular culture.
  5. Healthcare Awareness: Searches about end-of-life breathing patterns suggest either a professional or personal interest in healthcare or medical conditions.
  6. Food and Local Activities: Looking up local markets, restaurants, and specific menu items suggests an interest in local activities and food.
  7. Academic Inclinations: The search for the meaning of ‘adjunct professor’ and various educational content indicates an interest in academia or education.
  8. Digital Content Creation and Management: Queries about converting and editing images, managing digital content through platforms like Google Drive, Dropbox, and OneDrive, and using software like DaVinci and Photoshop point towards a proficiency in digital content creation and management.
  9. General Curiosity and Diverse Interests: The wide array of topics, from combinatorics to ham radio antenna connectors, suggests a personality that is curious and enjoys learning about a wide range of subjects.

This person appears to be well-rounded with diverse interests spanning technology, entertainment, geography, automotive, health, and more, indicating a continuous pursuit of knowledge and understanding in various domains.

Once again, it feels generic, but based on only 126 queries, the profile is pretty amazing from my perspective.

“An inclination towards solving technical problems” — imagine that.

Yes, there were blatant errors. My search for vacuum line diameters in automobiles had nothing to do with cars and everything to do with covering the perches on our bird feeders. I guess the confusion is understandable. And I have no idea what the context was for “end-of-life breathing patterns” — while I remember searching for it, I have no memory of why. I suspect it was a passing interest rather than something contributing to “Healthcare Awareness”.

Why does this matter?

I know many people are concerned about the amount of data being collected about them. I think this is a good example of how an arbitrary collection of what could be seen as completely benign data can be used to build a profile on someone. That profile could be used by businesses — or hackers — to better target their efforts at you directly, for example.

I still believe that by and large, you and I are just aren’t that interesting and don’t merit the CPU cycles to build such a profile. For example, I’m not concerned that Google has 17 years of my search history. It’s not that exciting, trust me.

But I know some people should be concerned because of the circumstances of their lives, and I know some people will be concerned no matter how uninteresting they might be.

And this isn’t really about search history. While that’s one way we expose information about ourselves, it’s certainly not the only one. Consider the posts you make on social media, the websites you visit without searching, the and more. While I absolutely believe that you and I are simply not that interesting, it’s worth noting the different ways that information might be getting collected and aggregated, in case you happen to be one of the few who really are interesting.

It would be interesting (albeit more difficult) to run a similar analysis based on email or social media posts. That would tell a LOT about people.

Do this

The bottom line here is simply to remain aware of the amount of information you’re leaking just by doing whatever you do online. You can, of course, delete your search history if you’re so inclined, but in general it’s just not worth the effort.

I certainly don’t collect any of that kind of information when you subscribe to Confident Computing! Less frustration and more confidence, solutions, answers, and tips in your inbox every week.

Podcast audio

Play

Footnotes & References

1: Getting the search history in a usable form was a project involving Google TakeOut and using some regular expression search-and-replace on the results.

2: If that seems small, it’s because I’ve been shifting to alternate search engines like DuckDuckGo and Brave.

14 comments on “I Gave AI My Search History and Here’s What It Told Me”

    • Obviously I don’t believe they’re actively doing anything like that, but for those that do believe, then yes, switching to more private alternatives is a step. Not a complete step, of course, but a step.

      Reply
  1. Scary – not scary. I guess I knew they were doing this. I do like to think I am not interesting enough.

    I do like duckduck though. Like not being tracked or profiled. But I don’t always use their browser. Always use their search engine though.

    Thanks Leo and Mark for your back and forth in the comments. I like that you both inspire new insights on the columns Leo publishes. You both sometimes having different perspectives is great, it brings out more info for those of us who don’t know what it is we don’t know.

    Reply
  2. I find that AI is mislabeled. Not intelligence but a fantastic tool. But as far as the information posted about our search history definitely interesting. I do feel that certain entities could use this for a tailored search to utilize against us. Only the info needed to sell us something or show that we are a bad person. Like any technology can be used for good of evil. Depending on your outlook. Well it’s a great and interesting article for sure but that’s by take on it. I don’t use chat got or others.

    Reply
  3. Here’s one way to make your search history less revealing: Start doing crossword puzzles. The actual things you look for will be lost among searches for capital cities, rivers, songs you’ve never heard of by performers you’ve never heard of and obscure movie credits :)

    Reply
  4. You number 2 footnote is very interesting.
    ‘..it’s because I’ve been shifting to alternate search engines like DuckDuckGo and Brave.’
    Why Leo, why are you doing this, please explain ?

    Reply
  5. I’m not as impressed with AI, Leo, as I am with you. There seems little surprising in what AI listed for you that wouldn’t be clearly apparent in your searches. Same here. I didn’t notice anything that suggested investigating your searches history revealed anything about you which was not already apparent. This was not a valid test and I’m surprised your comments were as friendly as they were.
    It shouldn’t be news that your search topics are that revealing with regard to your personal psyche or personal/business interests. Given the same information, I suspect most of us could have come up with the same set of conclusions.

    Reply
  6. I’ve been using DuckDuckGo as my default browser for years. I have used LastPass and 1Password and have had no trouble with 99% of the sites loading the necessary info to log in. I do usually have to click on the first box, and then all the info fills automatically.

    Reply
  7. As an open sourced tool or discrete app, AI is fine. It can be very useful and helpful as we perform various tasks. However, OpenAI is being integrated into most all aspects of the Windows OS, and that concerns me because it leads me to the question “How much more personal information can Microsoft collect about me using AI, and do they?”.

    I know I’m not particularly interesting, and I’m not normally paranoid, although I am very skeptical about everything I see online. Bases on recent trends, I’m becoming much more skeptical about new technology in general, and Microsoft’s objectives regarding their integration of AI into most all aspects of the Windows OS. If Microsoft gives me the ability to control where or if AI is integrated into Windows, and what information (if any) is collected and/or retained (provided I have control over it), I’m good with that.

    The next major release of Windows will help me decide whether I continue to be a Windows user. At present, Windows appears to be becoming very cloud/AI-centric. If they take that trend to the extreme, I’ll drop Windows for good and switch to GNU/Linux. I currently dual-boot Windows 11 with Solus Linux, and I like using Solus a lot. For me, it’s a great adjunct to Windows 11, and is my choice of OS, should I decide to stop using Windows altogether. I hope my concerns are unfounded, but I’m becoming increasingly distrustful of Microsoft, and that, above all, will guide me in making my decision about continuing to use Windows.

    I suppose I’ll see what comes next, sometime later this year,

    Ernie (Oldster)

    Reply

Leave a reply:

Before commenting please:

  • Read the article.
  • Comment on the article.
  • No personal information.
  • No spam.

Comments violating those rules will be removed. Comments that don't add value will be removed, including off-topic or content-free comments, or comments that look even a little bit like spam. All comments containing links and certain keywords will be moderated before publication.

I want comments to be valuable for everyone, including those who come later and take the time to read.