I’ve had my email address in a `mailto:` link in plaintext on my then-web-site, now-blog, since the early 2000s, and spam is no real problem. There are a few spam messages in my spam mailbox per day.
Perhaps my provider’s just great at filtering spam - but I kind of doubt it’s better than the major players (for years I’ve used Zoho for email - and it’s ‘okay’ enough that it’s not worth switching).
you know what's funny is that llms are also good at detecting spam as they are generating it. I've got an automation that scores incoming emails and it's getting better and better each day (also more expensive haha)
I doubt it. Most of the signals spam filters use these days are reputation based. You have to build up your domain and IP reputation for a long time first.
> You have to build up your domain and IP reputation for a long time first.
Or buy/rent domains/IPs that have good reputations, as there are services that specializes in just bringing up the reputation for stuff so they can sell it once "good". Same exists for user accounts for various platforms like reddit and so on.
I have a hypothesis email scrapers don't parse HTML at all. I suspect they search the raw bytestring for @ characters and take whatever's on either side of it. That probably gets them as many addresses as they can realistically use at a fraction of the cost, given how expensive HTML parsing can be.
(Similarly, I'm sure most links can be found by searching the bytestring for "href" and taking what's to the right of it.)
This would explain why HTML entities are so effective.
On the other hand, surely the TLS handshake is far more expensive than HTML parsing? Maybe it's to avoid parser failure modes that consume a lot of resources?
it really varies, you are correct most modern ones search the byte string for @ characters but there are probably hundreds of different methods out there in black hat marketing circles to scrape emails.
One trick is having an tarpit email adress on your website. It is hidden using CSS so no real visitor sees it but it is visible in source. If your mail server recieves mail for that adress you can just block that IP for 24h.
This sounds like bad advice and would result in blocking google and other major ESPs.
I occasionally get spam from people who took the time to create gmail accounts. Based on this advice, the honey pot email address would get spam from a Gmail account and your script would block Gmail servers.
Some time ago i was wondering if the common "me at foobar dot com" you still see a lot of people do actually helps at all, especially now with LLMs, so i searched for some common "obfuscation" techniques and found this site (not the 2026 update, but the previous - it was a few months ago). Then i wrote a simple LLM query with a bunch of examples from the site[0] (the tool is just a frontend for a commandline program that uses llama.cpp and Mistral Small 3.1 in Q4_K_M quantization since it loads relatively fast and is fine for simple prompts). AFAICT it could reveal anything that wasn't relying on CSS tricks or JavaScript.
Like others mentioned, though, personally i haven't bothered by email harvesting for years now since spam filters seem to do a decent job. I have my email posted in plaintext here (which i bet is harvested very often) and in various other places and the occasional spam i get is eclipsed from "spam" from services i've actually signed up for (coughlinkedincough).
IMO a better approach would be individualized addresses.
Imagine someone visiting your blog who wants to e-mail you can burn some CPU cycles to "earn" an address that hasn't been given out to anybody else, e.g. user+TOKEN@example.com, where it is algorithmically-unlikely for them to be able to guess a different TOKEN that will work. Then if abuse occurs, you can just retire that one address. (In a non-interactive context, like a paper ad, you could just generate one yourself.)
Naturally, this would be best with an e-mail client that is aware of the scheme, and with a mail-service that has some API for generating new addresses, such as if you want to cold e-mail somebody and use a new from/return address.
Some years ago I had the fanciful idea of doing it with a phone-app, where it manages creating new addresses as-needed, disabling them, and keeping notes about who you gave them to.
They left off html cgi form. Generate the email on the web page and the server sends the email after performing some basic sanity checks and anti-spam on the form and web server itself such as solving some CSS puzzle or winning a game of DOOM.
When I wrote my own brainf*ck interpreter (in C) at the start of the year I was really struggling to find a use for the language. Eventually I had the idea to obfuscate emails on my websites with the language.
Basically each email gets written as a brainf*ck program and stored in a "data-" attribute. The html only includes a more primitively obfuscated statement "Must enable Javascript to see e-mail." by default which then gets replaced by another brainf*ck interpreter (in JS) with the output of the brainf*ck code. Since we only output ASCII we can reduce the size of the brainf*ck code by always adding 32 to each value it outputs. The Javascript is loaded from what seemingly looks like a 3rd party domain. There we filter basing on heuristics and check if the "referer" matches before sending out the actual interpreter code.
Of course all this would not help if a scraper properly runs things through Javascript too.
Recently I read you soon will be able to run DOOM via CSS, so certainly it should be possible to have a brainf*ck interpreter in CSS? That would be the next step… just to get rid of the Javascript, but then I'm okay with all the downsides of using Javascript just for the e-mail obfuscation.
Anyway… I also regularly (at least once a year) rotate those public contact addresses.
Contact details: [any mailbox] [at] [the domain name of this web site]. Please don’t ask me to give interviews, sign books, appear on podcasts, attend conferences or conventions, or provide feedback or endorsements for works of fiction, scientific theories, or slabs of text disgorged by chatbots.
Of course, the technical term for that setup is 'catch all', you can set this up with your email provider. You can send your email to "ghywertelling@gregegan.net", for example.
Yes, people using “email” for “email address” in contexts where it could also mean “email message”, which “email” more frequently means, is really annoying.
I use a very simple encryption plus some padding (fluff in the article), but the email address gets updated by JS. This requires JS plus evaluating the resulting DOM. If you don't evaluate JS, the address will be something like "please@activate.javascript". Or you could use "potus@whitehouse.gov", in which case clueless scrapers end up spamming the US government.
WTH, a 302 into a "mailto:" (search for "HTTP redirect" in the featured article) opens up my e-mail client without clicking a mailto link!? This seems wrong.
Some browsers ask whether to open the email client in that case. I don’t see it as significantly different from a redirected download link that would open a program based on the mime type or file ending. Or from a redirect to another URL pattern associated with an app, like for example how YouTube links may open in the YouTube app.
> HTML entities are often decoded automatically by server-side libraries, which means that even the most basic harvesters can get your email addresses without any special effort. This technique should be worthless—and, yet, it still stops most harvesters.
Anecdotal, but I’ve used HTML entities on a public static website for a long time using an href tag with mailto, and yet I’ve not seen any spam.
I guess any spammer who uses some level of GenAI to process and extract email addresses would have a lot more success against all the methods listed in this article.
Same. I have a normal mailto link on a Google-indexed page (a top hit with the right search terms) with a dedicated email address for over a decade, and rarely ever received spam for it. This is after DNSBL filtering.
I use SVG where I created a text object in Affinity Designer and converted it to curves so the SVG doesn't have text any more, just vectors for the glyphs of it. Seems to work pretty well at keeping spammers at bay.
I'm sorry, but that is not how email address are spammed in bulk.
The data-source are the enormous data breach that are more and more frequent.
There is more intensive to collect more information on someone you already know something about than spamming an email you don't even know if it's a valid one.
The spam can also be very more effective as it present itself with personal information about the spammed.
I'm not denying that it happens.
I'm saying that it not the classical way to spam people nowadays.
It's obvious to any non native english speaker, when you have a spam in english, it is because they toke the email from the web. When it's in you native language, it's usually from a data breach.
I'm vastly more spammed by the later. I can confirm it with unique email addresses of the "+" form (but not with the + character).
Also when I'm spammed in english, it's for Web3 crypto stuff and from a data breach it's a phishing attempt.
I’ve run a small thingy last year, on its own domain, with a (project-specific) email in plaintext on the homepage. I’ve got a fair bit of spam to that address.
But yeah, I’d say most junk mail is coming to (1) an address leaked from one Russian bank (!) I used, (2) the address listed in public business databases (I have a company in Estonia).
If you're only passing the address in private to some service, you can just use [some-string-unique-to-that-service]@yourdomain.com. Or, more classically, plus addressing to do the same. Then you just block that recipient.
That solution doesn't apply to the use case in the article.
Surely spammers just turn `me+leaked/sold@mail.com` into `me@mail.com` as well as `me+apple@mail.com`, `me+softbank@mail.com` etc. The cost of stripping any `+postfix` must be about zero even at volume.
Some people block all mail to non-plus-addressed emails on that inbox, so a plus address is required to be received at all. You could say then spammers will just add a random one, but they wouldn't be getting bounces and would have to guess as much. Still, even stripping the +'ed part is beyond what most of them even bother to do. That dropoff plus normal spam filters works well enough.
honestly at this point i question whether obfuscation even matters. I run a few sites with plain mailto links, spam volume is maybe 3-5/day and the provider catches all of it.
the actual vector that causes problems isnt harvesting from your website - its your address showing up in a breached database or a client's compromised outlook leaking your whole contact list. had that happen twice last year, no amount of CSS tricks on the website would have helped
if you need a contact form anyway for UX reasons just use one, it solves the obfuscation thing as a side effect
Did it ever matter? My gmail address had been in the open for 22 years. I have more problems with people sharing the same first name and using my email for registrations than the spam.
You are replying to an AI bot. Notice how every comment has the same structure, and has likely been prompted to share a piece of their "life" to make the comments seem more believable
This is such a waste of effort. Your E-mail address is not and can't be a secret. It will get into spammer databases eventually, no matter what you do. You will spend a lot of effort doing all these fancy tricks, and eventually you will get spam anyway.
Also, a note to those who make fancy "me+someservice@somedomain.com" addresses: make really sure you are in control and these work. Some services (including mine) will need to E-mail you one day, for example to tell you that your account will be deleted because of inactivity. If you don't receive that E-mail because of your fancy spam defenses, your account will be deleted. I've seen people hurt themselves like this and it makes me sad.
On a constructive note: what works very well is spam filtering using LLMs. We have AI to help us with this problem today. I wrote an LLM despammer tool which processes my inbox via IMAP using a local LLM (for privacy reasons). I see >97% accuracy in my benchmarks on my (very difficult) testing corpus. It's nearly perfect in real life usage. I've tested many local models in the 4-32B range and the top practical choice is gpt-oss:20b (GGUF, I run it from LM Studio, MLX quantizations are worse) — not only does it perform very well, but it's also really fast.
Plus-addressing is built in to most email services. There's no 'fancy' set up to break; it just works. That is, there's no way me@gmail.com works but me+someservice@gmail.com doesn't, unless you explicitly configure it not to work. Similarly for custom domains on most services.
If you use a catch-all on a domain, i.e. someservice@somedomain.com, I guess in theory that might break. But it seems about as likely as messing up the overall domain setup.
Also, my account on your service is likely much more disposable to me than my email address/domain. Anything I care about, I'd back up. Not just assume some random website is going to preserve it for me forever.
The techniques in the article right now have had around 95%-100% success at avoiding spam and take about 5 min. to implement. Your approach of putting an LLM in front of your inbox gives 97% accuracy, may have false positives (so you may not receive that account deletion email after all), requires to run inference and, I assume, would take at least an hour to setup.
Also, the two can be complementary, anyways, so I am not sure what your point is.
Plus tags annoy signup forms more than they slow spam crawlers. If you're spending this much effort on obfuscation, run a sane mail filter and save the weird tricks for the sites that insist on emailing you later, because some apps treats a plus alias as invalid and then you get to debug their broken account recovery.
But I like this review of techniques, even the simplest ones are very effective, that surprised me.
Perhaps my provider’s just great at filtering spam - but I kind of doubt it’s better than the major players (for years I’ve used Zoho for email - and it’s ‘okay’ enough that it’s not worth switching).
I never got SpamAssassin working very well, but since moving my email hosting to Apple (from my own server), spam has not been a problem.
However, LLMs are quite good at generating spam and I think soon will evade most filters.
Or buy/rent domains/IPs that have good reputations, as there are services that specializes in just bringing up the reputation for stuff so they can sell it once "good". Same exists for user accounts for various platforms like reddit and so on.
Yes, that is indeed the point of those; "build up reputation -> sell/rent -> someone uses it to burn reputation -> rinse and repeat".
(Similarly, I'm sure most links can be found by searching the bytestring for "href" and taking what's to the right of it.)
This would explain why HTML entities are so effective.
On the other hand, surely the TLS handshake is far more expensive than HTML parsing? Maybe it's to avoid parser failure modes that consume a lot of resources?
I occasionally get spam from people who took the time to create gmail accounts. Based on this advice, the honey pot email address would get spam from a Gmail account and your script would block Gmail servers.
Like others mentioned, though, personally i haven't bothered by email harvesting for years now since spam filters seem to do a decent job. I have my email posted in plaintext here (which i bet is harvested very often) and in various other places and the occasional spam i get is eclipsed from "spam" from services i've actually signed up for (coughlinkedincough).
[0] https://i.imgur.com/ytYkyQW.png
Imagine someone visiting your blog who wants to e-mail you can burn some CPU cycles to "earn" an address that hasn't been given out to anybody else, e.g. user+TOKEN@example.com, where it is algorithmically-unlikely for them to be able to guess a different TOKEN that will work. Then if abuse occurs, you can just retire that one address. (In a non-interactive context, like a paper ad, you could just generate one yourself.)
Naturally, this would be best with an e-mail client that is aware of the scheme, and with a mail-service that has some API for generating new addresses, such as if you want to cold e-mail somebody and use a new from/return address.
Some years ago I had the fanciful idea of doing it with a phone-app, where it manages creating new addresses as-needed, disabling them, and keeping notes about who you gave them to.
Basically each email gets written as a brainf*ck program and stored in a "data-" attribute. The html only includes a more primitively obfuscated statement "Must enable Javascript to see e-mail." by default which then gets replaced by another brainf*ck interpreter (in JS) with the output of the brainf*ck code. Since we only output ASCII we can reduce the size of the brainf*ck code by always adding 32 to each value it outputs. The Javascript is loaded from what seemingly looks like a 3rd party domain. There we filter basing on heuristics and check if the "referer" matches before sending out the actual interpreter code.
Of course all this would not help if a scraper properly runs things through Javascript too.
Recently I read you soon will be able to run DOOM via CSS, so certainly it should be possible to have a brainf*ck interpreter in CSS? That would be the next step… just to get rid of the Javascript, but then I'm okay with all the downsides of using Javascript just for the e-mail obfuscation.
Anyway… I also regularly (at least once a year) rotate those public contact addresses.
/edit
And you can combine both approaches: XOR'ing the code first for good measurements. :)
Contact details: [any mailbox] [at] [the domain name of this web site]. Please don’t ask me to give interviews, sign books, appear on podcasts, attend conferences or conventions, or provide feedback or endorsements for works of fiction, scientific theories, or slabs of text disgorged by chatbots.
I have no idea how to decipher this obfuscation.
Anecdotal, but I’ve used HTML entities on a public static website for a long time using an href tag with mailto, and yet I’ve not seen any spam.
I guess any spammer who uses some level of GenAI to process and extract email addresses would have a lot more success against all the methods listed in this article.
The data-source are the enormous data breach that are more and more frequent. There is more intensive to collect more information on someone you already know something about than spamming an email you don't even know if it's a valid one.
The spam can also be very more effective as it present itself with personal information about the spammed.
Edit: that’s not to deny that big data leaks are a serious problem
It's obvious to any non native english speaker, when you have a spam in english, it is because they toke the email from the web. When it's in you native language, it's usually from a data breach.
I'm vastly more spammed by the later. I can confirm it with unique email addresses of the "+" form (but not with the + character).
Also when I'm spammed in english, it's for Web3 crypto stuff and from a data breach it's a phishing attempt.
But yeah, I’d say most junk mail is coming to (1) an address leaked from one Russian bank (!) I used, (2) the address listed in public business databases (I have a company in Estonia).
That solution doesn't apply to the use case in the article.
the actual vector that causes problems isnt harvesting from your website - its your address showing up in a breached database or a client's compromised outlook leaking your whole contact list. had that happen twice last year, no amount of CSS tricks on the website would have helped
if you need a contact form anyway for UX reasons just use one, it solves the obfuscation thing as a side effect
Also, a note to those who make fancy "me+someservice@somedomain.com" addresses: make really sure you are in control and these work. Some services (including mine) will need to E-mail you one day, for example to tell you that your account will be deleted because of inactivity. If you don't receive that E-mail because of your fancy spam defenses, your account will be deleted. I've seen people hurt themselves like this and it makes me sad.
On a constructive note: what works very well is spam filtering using LLMs. We have AI to help us with this problem today. I wrote an LLM despammer tool which processes my inbox via IMAP using a local LLM (for privacy reasons). I see >97% accuracy in my benchmarks on my (very difficult) testing corpus. It's nearly perfect in real life usage. I've tested many local models in the 4-32B range and the top practical choice is gpt-oss:20b (GGUF, I run it from LM Studio, MLX quantizations are worse) — not only does it perform very well, but it's also really fast.
If you use a catch-all on a domain, i.e. someservice@somedomain.com, I guess in theory that might break. But it seems about as likely as messing up the overall domain setup.
Also, my account on your service is likely much more disposable to me than my email address/domain. Anything I care about, I'd back up. Not just assume some random website is going to preserve it for me forever.
Also, the two can be complementary, anyways, so I am not sure what your point is.
Just wait until one of these companies demands an email from the registered email address of your account!