The cost of a thing is the amount of what I will call life which is required to be exchanged for it, immediately or in the long run.
~ Henry David Thoreau, Walden or, Life in the Woods
As some of you may know, a while ago I made my own (pretty minimal and open source) analytics service called Mochi, mostly because I was curious about how such a thing might be implemented.1 Lately, I've been chipping away at it, fixing some bugs with how we collect "hits" (page views) as well as modernizing the style a bit. As I was doing these changes, I also spent some time checking how we filter out "bot visits" and realized that right now it's really hard to know algorithmically if the visitor is a human or a bot.
Now, I don't care much about "views" per se (nor do I think you should, it's not healthy); I care more about "referrers". Knowing who adds links to my site allows me to read their stuff and maybe even start a conversation! Being quoted by others is also one of the most flattering things there is :) Still, I do think this problem of knowing whether a visitor is human or not is a really interesting one, especially because there doesn't seem to be any obvious solution!
Bots are getting more sophisticated every day. However, we recently crossed a threshold that makes distinguishing them especially hard. It now takes minimal setup for someone to have an LLM pilot an actual web browser and act on (or scrape) the sites they visit. This is actually the best way to go if the person managing the bot wishes to avoid detection because, from the point of view of web servers (eg Mochi), the requests seem to come from a legit browser/user.
I've been thinking about an interesting approach that the community has come up with to dissuade bots from interacting with their sites. It's an automated check (so no need for the user to solve any impossible puzzles); it just requires "time." Specifically, I'm thinking about "proof of work" captchas. In simple terms, proof of work (PoW) requires the client computer to do some heavy-ish computation locally and then prove it to the server. The proof itself can't be faked (quantum algorithms notwithstanding, but that's another topic) so the client browser MUST have done the required computation, expended a non-trivial amount of effort and, most importantly, time.
I realize that for many readers this is a meaningless concept, so let me try to ground it:
- Your computer has a limited amount of computational power.
- A single proof in this context is easy to do if you don't have much else going on in parallel (in other words, you're not overloading your CPU) or don't care that much about speed (the PoW process might take anywhere from five seconds to a couple of minutes, or more, depending on the required difficulty, which is configurable).
- However, these bots usually care a lot about speed, about doing as many "X" per minute as they can; this means they may have multiple parallel browsers going.
- If they have, then they will suffer the compounded effect of running those extra browsers plus the PoW calculations, potentially slowing down the whole system.
- But independently of whether they do or not, it might be unfeasible for the bot to wait for the PoW to finish. Tens of seconds might not be much to a real user, but for a bot meant to do things at scale, it adds up quickly!
However, PoW would be a terrible choice for an analytics site. On one hand, it would mean doing hidden computation pretty much every visit, which is not a nice thing (though one could use "cookies" to track that user and "vet" them for a given amount of time); on the other hand, it might not be a fast enough process to actually capture meaningful visits, as (remember) the PoW could take ~ a minute or more in some cases. One could configure it to make it faster (easier), but if it's too fast, then it becomes pointless because bots won't have to wait an appreciable time for it. It could work, but my ethical compass prevents me from pursuing this (tracking users is bad, as is burning up energy "just because").
[aside]
While PoW is a bad fit for Mochi, I thought it might actually be a fun (optional) addition to Guestbooks! I haven't received many "spam reports," but imagine you could add a "verify you're human" checkbox that you need to click before submitting a message. It might not be much, but it could deter automated bot spammers from spamming at scale (though why someone would want to spam guestbooks is beyond my understanding). I actually already pushed an initial implementation of this that will be live as of my publishing of this post :)
To enable it, go to Guestbook settings and tick the "Enable Proof of Work challenge" checkbox. Note that you will need to re-copy your embedding code if you're using the JS embed option, as there's a new HTML element that holds the verification status. Also, there are a couple of new CSS classes that you can add (though it should mostly just work by inheriting the overall styles from your site).
If you do try it out, then let me know how it goes!
[/end aside]
Going back to the issue of how to know if a visitor is human or not... I don't really have a solution for this. There's still some innovation that the community needs to do to make it happen. In the meantime, however, I thought that one could partially solve this issue by adding kudos/likes/upvotes/toasts to the bottom of posts!
(As part of this post, I also implemented Kudos tracking on Mochi. Like the one at the bottom of this post.)
I've never really liked the idea of adding "kudos" to my site, mostly because I'm afraid I will start caring about whether people "like" or "don't like" what I write. It could even be that no one is really visiting my site at all, and all the visitor numbers are actually bots! (Which is perhaps more likely than I would like to admit.) But I thought that we could use the kudos buttons not so much as a "like" (which is a concept we've mostly inherited from traditional social media) but more as a way to say "a human read this."
I don't know about you, but whenever I read a blog that has a "kudos" button I always click it, regardless of whether I like the post or not. For me, it's more of a way to say "I see you," "I appreciate what you're doing" more than "this is cool" (though often it is cool, everything is cool depending on the mindset with which you read it). As it is right now, traditional analytics just can't cut it, and we need another mechanism to inform "humanness."
I'm not sure if I'll leave mine as a plain "clap" icon or if I'll actually add the text "click to say a human read this" or some other stuff. For now, I'll probably leave it as a plain emoji :) I like how clean it looks.
...
As I write this, there's a (quite insistent) voice in the back of my mind screaming "YOU SHOULDN'T CARE ABOUT THIS". And yeah, probably it is right, I shouldn't care. I should just write and write for myself, not for others. But the tricky bit is that another part of me does care. It cares about being seen.
This is bringing up a whole lotta questions about "why do I do this". Will it happen that if I find no one reads my stuff, then will I be disappointed? And if I'm disappointed, does that mean I'm really writing just to "fish for attention," or is there more to it? Could I just be writing for myself because I enjoy it? Or maybe a bit of both?
Well, I think this post is already long enough as it is, so I'll leave those questions for a future one :)