Your Browsing History is not Your Own

status
published
dsq_thread_id
Were you using a web browser 10 years ago? What about 1 year ago? I know I was.
But you know what? I couldn't tell you the specifics of what I was doing on the internet. Could you tell me what sites you visited 728 days ago? Probably not.
This situation I find quite appalling.
Web browsers record what sites you visit so that you can revisit them in the future, right? Indeed, they do have this functionality. However, after a certain number of months your history gets deleted.
Months, that's right. In the case of Chrome it's 3 months.
This is why your browsing history is not true: It's not a full history.

Why should you care?

Well, it's not that you should care it's that I do care. If you have zero curiosity about your past browsing history you can stop reading right now—the rest of this post probably won't interest you.
The reason you might care is that you value having access to your own data.
💡
NOTE This is not a rant about privacy and how your data is being collected and used to sell ads. While also an interesting topic, that's not the focus of this post. This post is concerned with actually using the data you generate.
Every time you use a browser you generate A LOT of data, but for the most part this data isn't accessible.
Have you ever wanted to recall something you read online a few months ago? If so, we're in the same boat. In fact, that little situation is what led me to discover that Chrome only stores 3 months of browsing history!

The elephant in the room

The obvious objection to this talk of unlimited and persistent browsing history is that you don't want to record every site you visit. Totally fair. That's what private browsing is for. Chrome and Firefox literally don't record* what you do in a private tab, so everything I'm talking about here would not break the guarantees of private browsing.

What can you do?

As far as I know, not much. There is a Chrome Plugin that will store your history in a separate database and thus keep it around for you. That's a great start, but it has a few shortcomings:
  • Chrome only. If you use more than one browser you have incomplete data.
  • Single-device only. If you use more than one computer you have fragmented data.
  • Trust. That Plugin requires full access to all your browsing history and you have to trust that the developer is not doing nefarious things with your data.
    • I'm not implying they are, I'm just pointing out that having to trust a third party is a weak point of any service that deals with sensitive data. Even if you do trust the developer, their account could become compromised and thereby compromise all your browsing history.
    • This problem would be somewhat mitigated if the project was open source, but as far as I can tell the code is not available for review.

An alternative

At long last I decided to take matters into my own hands. I'm building a tool to easily and continuously backup your browsing history while making it easy to search.
Here's what I'm thinking:
  • Continuous history backups for Chrome and Firefox so you never lose history.
    • Do many people use any of the other browsers? This is not a dig, I honestly don't know anyone not using one or both of these. I'm sure the number is > 0 but is it significant? 🤷
  • Search / Dashboard UI to give high level stats, a simple search box and of course allow you to run SQL against your history.
  • Optional cross-device syncing.
    • This would either be through a third-party service (Dropbox probably) or I'd have to create a back-end. Not sure what would make most sense yet. I'm still in the brainstorming phase.
💡
NOTE There's no mention of mobile browsers here because I'm honestly not sure how to pull that data. On iOS for example, could I create an app that simply exfiltrates browsing history? If so, where would I send the data? In this case it seems I'd definitely need a server and possibly need to add some mobile UI to make Apple allow it through the app store.

A broader vision

Browsing history is just the tip of the iceberg. I'd also like to instantly be able to search over many things for which ample data exists but is not readily accessible:
  • Podcasts I've listened to
  • Audiobooks I've listened to
  • Songs I've listened to
  • Ebooks I've read
  • Highlights in ebooks I've read
  • Youtube videos I've watched
  • etc...
We're all generating so much data every day and yet we have very limited access to it. I'd like to change that.
No more wondering "What was that interesting video I watched earlier this year?" and not being able to find an answer.

Interested?

If having access to your browsing history sounds at all interesting to you then message me on Twitter: @ian_sinn. I'd be happy to share.