The fact that ZIP files include the catalog/directory at the end is such nostalgia fever. Back in the day it meant that if you naïvely downloaded the file, a partial download would be totally useless. Fortunately, in the early 2000s, we got HTTP's Range and a bunch of zip-aware downloaders that would fetch the catalog first so that you could preview a zip you were downloading and even extract part of a file! Good times. Well, not as good as now, but amusing to think of today.
> ... a partial download would be totally useless ...
no, not totally. The directory at the end of the archive points backwards to local headers, which in turn include all the necessary information, e.g. the compressed size inside the archive, compression method, the filename and even a checksum.
If the archive isn't some recursive/polyglot nonsense as in the article, it's essentially just a tightly packed list of compressed blobs, each with a neat, local header in front (that even includes a magic number!), the directory at the end is really just for quick access.
If your extraction program supports it (or you are sufficiently motivated to cobble together a small C program with zlib....), you can salvage what you have by linearly scanning and extracting the archive, somewhat like a fancy tarball.
Partial zip shouldn't be totally useless and a good unzip tool should be able to repair such partial downloads. In addition to catalog at end zip also have local headers before each file entry. So unless you are dealing with maliciously crafted zip file or zip file combined with something else, parsing it from start should produce identical result. Some zip parsers even default to sequential parsing behavior.
This redundant information has lead to multiple vulnerabilities over the years. As having redundant information means that a maliciously crafted zip file with conflicting headers can have 2 different interpretations when processed by 2 different parsers.
Debian's `unzip` utility, which is based off of Info-ZIP but with a number of patches, errors out on overlapping files, though not before making a 21 MB file named `0` - presumably the only non-overlapping file.
unzip zbsm.zip
Archive: zbsm.zip
inflating: 0
error: invalid zip file with overlapped components (possible zip bomb)
Yep, these kinds of format shenanigans are increasingly rejected for security reasons. Not zip bombs specifically, but to prevent parser mismatch vulnerabilities (i.e. two parser implementations decompressing the same zip file to different contents, without reporting an error).
Someone shared a link to that site in a conversation earlier this year on HN. For a long time now, I've had a gzip bomb sitting on my server that I provide to people that make a certain categories of malicious calls, such as attempts to log in to wordpress, on a site not using wordpress. That post got me thinking about alternative types of bombs, particularly as newer compression standards have become ubiquitous, and supported in browsers and http clients.
Unfortunately, as best as I can see, malicious actors are all using clients that only accept gzip, rather than brotli'd contents, and I'm the only one to have ever triggered the bomb when I was doing the initial setup!
In one of my previous jobs, I got laid off in the most condescending way, only to be asked days later by my former boss to send her some documents. If only I knew about this then...
Violations of the Computer Fraud and Abuse Act (CFAA) can be either misdemeanors or felonies. It's definitely broad enough that doing so could get you in serious trouble if pursued.
Decompression is equivalent to executing code for a specialized virtual machine. It should be possible to automate this process of finding "small" programs that generate "large" outputs. Could even be an interesting AI benchmark.
My guess is this is a subset of the halting problem (does this program accept data with non-halting decompression), and is therefore beautifully undecidable. You are free to leave zip/tgz/whatever fork bombs as little mines for live-off-the-land advanced persistent threats in your filesystems.
Okay, so I know back in the day you could choke scanning software (ie email attachment scanners) by throwing a zip bomb into them. I believe the software has gotten smarter these days so it won’t simply crash when that happens - but how is this done; How does one detect a zip bomb?
The detection maintains a list of covered spans of the zip files
so far, where the central directory to the end of the file and any
bytes preceding the first entry at zip file offset zero are
considered covered initially. Then as each entry is decompressed
or tested, it is considered covered. When a new entry is about to
be processed, its initial offset is checked to see if it is
contained by a covered span. If so, the zip file is rejected as
invalid.
So effectively it seems as though it just keeps track of which parts of the zip file have already been 'used', and if a new entry in the zip file starts in a 'used' section then it fails.
For any compression algorithm in general, you keep track of A = {uncompressed bytes processed} and B = {compressed bytes processed} while decompressing, and bail out when either of the following occur:
Is it possible to implement something similar but with a protocol that supports compression?
Can we have a zip bomb but with a compressed http response that gets decompressed on the client? There are many protocols that support compression in some way.
There was https://idiallo.com/blog/zipbomb-protection earlier this year. It sends highly compressed output of /dev/zero. No overlapping files or recursively compressed payloads.
no, not totally. The directory at the end of the archive points backwards to local headers, which in turn include all the necessary information, e.g. the compressed size inside the archive, compression method, the filename and even a checksum.
If the archive isn't some recursive/polyglot nonsense as in the article, it's essentially just a tightly packed list of compressed blobs, each with a neat, local header in front (that even includes a magic number!), the directory at the end is really just for quick access.
If your extraction program supports it (or you are sufficiently motivated to cobble together a small C program with zlib....), you can salvage what you have by linearly scanning and extracting the archive, somewhat like a fancy tarball.
This redundant information has lead to multiple vulnerabilities over the years. As having redundant information means that a maliciously crafted zip file with conflicting headers can have 2 different interpretations when processed by 2 different parsers.
https://sources.debian.org/patches/unzip/6.0-29/23-cve-2019-...
Like bomb the CPU time instead of memory.
Someone shared a link to that site in a conversation earlier this year on HN. For a long time now, I've had a gzip bomb sitting on my server that I provide to people that make a certain categories of malicious calls, such as attempts to log in to wordpress, on a site not using wordpress. That post got me thinking about alternative types of bombs, particularly as newer compression standards have become ubiquitous, and supported in browsers and http clients.
I spent some time experimenting with brotli as a compression bomb to serve to malicious actors: https://paulgraydon.co.uk/posts/2025-07-28-compression-bomb/
Unfortunately, as best as I can see, malicious actors are all using clients that only accept gzip, rather than brotli'd contents, and I'm the only one to have ever triggered the bomb when I was doing the initial setup!
But you probably don't want to be investigated for either.
https://sources.debian.org/patches/unzip/6.0-29/23-cve-2019-...
So effectively it seems as though it just keeps track of which parts of the zip file have already been 'used', and if a new entry in the zip file starts in a 'used' section then it fails.1. A exceeds some unreasonable threshold
2. A/B exceeds some unreasonable threshold