Static Allocation with Zig

(nickmonad.blog)

86 points | by todsacerdoti 2 hours ago

12 comments

netik 0 minutes ago
Didn’t we solve this already with slab allocators in memcached? The major problem with fixed allocation like this is fragmentation in memory over time, which you then have to reinvent GC for.
MatthiasPortzel 27 minutes ago
One key thing to understand about TigerBeetle is that it's a file-system-backed database. Static allocation means they limit the number of resources in memory at once (number of connections, number of records that can be returned from a single query, etc). One of the points is that these things are limited in practice anyways (MySQL and Postgres have a simultaneous connection limit, applications should implement pagination). Thinking about and specifying these limits up front is better than having operations time out or OOM. On the other hand, TigerBeetle does not impose any limit on the amount of data that can be stored in the database.
=> https://tigerbeetle.com/blog/2022-10-12-a-database-without-d...
It's always bad to use O(N) memory if you don't have to. With a FS-backed database, you don't have to. (Whether you're using static allocation or not. I work on a Ruby web-app, and we avoid loading N records into memory at once, using fixed-sized batches instead.) Doing allocation up front is just a very nice way of ensuring you've thought about those limits, and making sure you don't slip up, and avoiding the runtime cost of allocations.
This is totally different from OP's situation, where they're implementing an in-memory database. This means that 1) they've had to impose a limit on the number of kv-pairs they store, and 2) they're paying the cost for all kv-pairs at startup. This is only acceptable if you know you have a fixed upper bound on the number of kv-pairs to store.
[-]
- levkk 10 minutes ago
  That makes sense. For example, your redis instance will have fixed RAM, so might as well pre-allocate it at boot and avoid fragmentation.
  Memcached works similarly (slabs of fixed size), except they are not pre-allocated.
  If you're sharing hardware with multiple services, e.g. web, database, cache, the kind of performance this is targeting isn't a priority.
leumassuehtam 2 hours ago
> All memory must be statically allocated at startup. No memory may be dynamically allocated (or freed and reallocated) after initialization. This avoids unpredictable behavior that can significantly affect performance, and avoids use-after-free. As a second-order effect, it is our experience that this also makes for more efficient, simpler designs that are more performant and easier to maintain and reason about, compared to designs that do not consider all possible memory usage patterns upfront as part of the design. > TigerStyle
It's baffling that a technique known for 30+ years in the industry have been repackage into "tiger style" or whatever this guru-esque thing this is.
[-]
- mitchellh 1 hour ago
  Snide and condescending (or at best: dismissive) comments like this help no one and can at the extremes stereotype an entire group in a bad light.
  I think the more constructive reality is discussing why techniques that are common in some industries such as gaming or embedded systems have had difficulty being adopted more broadly, and celebrating that this idea which is good in many contexts is now spreading more broadly! Or, sharing some others that other industries might be missing out on (and again, asking critically why they aren't present).
  Ideas in general require marketing to spread, that's literally what marketing is in the positive (in the negative its all sorts of slime!). If a coding standard used by a company is the marketing this idea needs to live and grow, then hell yeah, "tiger style" it is! Such is humanity.
  [-]
  - astrobe_ 14 minutes ago
    Because garbage-collected languages are easier to teach and to use. So the low-level, low-resource or high-performance stuff is left to a handful of specialists - or "insects" according to Heinlein. Speaking of old things, this reminds me of one of Asimov's short stories, where someone who rediscovers mental calculus is believed to be a genius.
  - brabel 40 minutes ago
    > had difficulty being adopted more broadly
    Most applications don’t need to bother the user with things like how much memory they think will be needed upfront. They just allocate how much and when necessary. Most applications today are probably servers that change all the time. You would not know upfront how much memory you’d need as that would keep changing on every release! Static allocation may work in a few domains but it certainly doesn’t work in most.
    [-]
    - AlotOfReading 15 minutes ago
      It's best to think of it as an entire spectrum from "statically allocate everything with compile time parameters" to "directly call the system allocator for every new bit of memory". It's just a helpful way to separate the concerns of memory allocation from memory usage.
      What this article is talking about isn't all the way at the other end (compile time allocation), but has the additional freedom that you can decide allocation size based on runtime parameters. That frees the rest of the application from needing to worry about managing memory allocations.
      We can imagine taking another step and only allocating at the start of a connection/request, so the rest of the server code doesn't need to deal with managing memory everywhere. This is more popularly known as region allocation. If you've ever worked with Apache or Nginx, this is what they do ("pools").
      So on and so forth down into the leaf functions of your application. Your allocator is already doing this internally to help you out, but it doesn't have any knowledge of what your code looks like to optimize its patterns. Your application's performance (and maintainability) will usually benefit from doing it yourself, as much as you reasonably can.
  - pjmlp 55 minutes ago
    It was a common practice in 8 and 16 bit home computing.
- jandrewrogers 1 hour ago
  Static allocation has been around for a long time but few people consider it even in contexts where it makes a lot of sense. I’ve designed a few database engines that used pure static allocation and developers often chafe at this model because it seems easier to delegate allocation (which really just obscures the complexity).
  Allocation aside, many optimizations require knowing precisely how close to instantaneous resource limits the software actually is, so it is good practice for performance engineering generally.
  Hardly anyone does it (look at most open source implementations) so promoting it can’t hurt.
- publicdebates 34 minutes ago
  > a technique known for 30+ years in the industry have been
  Knowledge sharing with next generations is one of those very tricky things.
  For one thing, how would I know where to find this? What book? What teacher? There are so many books, must I read all of them? What if my coworkers awaren't aware of it, how can they share it with me?
  Also, an old saying goes, if you're good at something, never do it for free. This isn't exactly a trade secret, but how many people blog about every advanced technique and trick they know? I blogged about how to create real C function pointers from Lua closures, as a way to advertise for my product, but that could very well have been kept a trade secret (and probably should have, as I got 0 sales from that blog post still). Why would anyone want to share this "tiger style" knowledge with newer generations with no personal benefit? Aren't they incentivized to use it secretly, or maybe write it in a book, or blog about it for advertising?
- codys 2 hours ago
  It seems it's just a part of a doc on style in tigerbeatle, in a similar way to the various "Google Style Guide" for code. These rarely have something new, but document what a particular project or organization does with respect to code style.
- addaon 2 hours ago
  Yep. Those of us who write embedded code for a living call this “writing code.”
  [-]
  - nickmonad 2 hours ago
    Author here! That's totally fair. I did learn this is a common technique in the embedded world and I had a whole section in the original draft about how it's not a super well-known technique in the typical "backend web server" world, but I wanted to keep the length of the post down so I cut that out. I think there's a lot we can learn from embedded code, especially around performance.
    [-]
    - titzer 7 minutes ago
      Back in 2005 Virgil I, which target MCUs like AVR, had static initialization and would generate a C program with all of the heap statically allocated, which was then compiled into the binary. C programmers for AVR are used to just declaring globals, but Virgil allowed arbitrary code to run which just initialized a heap.
      Virgil II and III inherited this. It's a standard part of a Virgil program that its components and top-level initializers run at compile time, and the resulting heap is then optimized and serialized into the binary. It doesn't require passing allocators around, it's just part of how the language works.
- testdelacc1 2 hours ago
  It is known, but not widely used outside of embedded programming. The fact that they’re using it while writing a database when they didn’t need to makes people sit up and notice. So why did they make this conscious choice?
  It’s tempting to cut people down to size, but I don’t think it’s warranted here. I think TigerBeetle have created something remarkable and their approach to programming is how they’ve created it.
  [-]
Zambyte 2 hours ago
I'm doing pretty much this exact pattern with NATS right now instead of Redis. Cool to see other people following similar strategies.
The fact that the Zig ecosystem follows the pattern set by the standard library to pass the Allocator interface around makes it super easy to write idiomatic code, and then decide on your allocation strategies at your call site. People have of course been doing this for decades in other languages, but it's not trivial to leverage existing ecosystems like libc while following this pattern, and your callees usually need to know something about the allocation strategy being used (even if only to avoid standard functions that do not follow that allocation strategy).
[-]
- nickmonad 1 hour ago
  I have a few cases in this (proof of concept) codebase that require knowledge about allocation strategy, even in Zig, but that's on me and the design at this point. Something I wanted to touch on more in the post was the attempt to make the components of the system work with any kind of allocation strategy. I see a common thing in Zig projects today where something like `gpa: std.mem.Allocator` or even `arena: std.mem.Allocator` is used to signal intent, even though the allocator interface is generic.
ivanjermakov 2 hours ago
Related recent post from Tigerbeetle developer: https://matklad.github.io/2025/12/23/static-allocation-compi...
d3ckard 2 hours ago
Personally I believe static allocation has pretty huge consequences for theoretical computer science.
It’s the only kind of program that can be actually reasoned about. Also, not exactly Turing complete in classic sense.
Makes my little finitist heart get warm and fuzzy.
[-]
- mikepurvis 1 hour ago
  I'm not an academic, but all those ByteArray linked lists have me feeling like this is less "static allocation" and more "I re-implemented a site-specific allocator and all that that implies".
  Also it's giving me flashbacks to LwIP, which was a nightmare to debug when it would exhaust its preallocated buffer structures.
  [-]
  - kevin_thibedeau 22 minutes ago
    This is still a more dependable approach with resource constraints. Fragmentation is eliminated and you can monitor pools for usage in a worst case scenario. The only other risk here versus true static allocation is a memory leak which can be guarded against with suitable modern language design.
    LwIPs buffers get passed around across interrupt handler boundaries in and out of various queues. That's that makes it hard to reason about. The allocation strategy is still sound when you can't risk using a heap.
  - d3ckard 56 minutes ago
    Personally, I see dynamic allocation more and more as a premature optimization and a historical wart.
    We used to have very little memory, so we developed many tricks to handle it.
    Now we have all the memory we need, but tricks remained. They are now more harmful than helpful.
    Interestingly, embedded programming has a reputation for stability and AFAIK game development is also more and more about avoiding dynamic allocation.
    [-]
    - mikepurvis 24 minutes ago
      Also not a game dev, but my understanding there is that there there's a lot of in-memory objects whose lifetimes are tied to specific game-time entities, like a frame, an NPC, the units of octtree/bsp corresponding to where the player is, etc.
      Under these conditions, you do need a fair bit of dynamism, but the deallocations can generally be in big batches rather than piecemeal, so it's a good fit for slab-type systems.
- kibwen 52 minutes ago
  > It’s the only kind of program that can be actually reasoned about.
  Theoretically infinite memory isn't really the problem with reasoning about Turing-complete programs. In practice, the inability to guarantee that any program will halt still applies to any system with enough memory to do anything more than serve as an interesting toy.
  I mean, I think this should be self-evident: our computers already do have finite memory. Giving a program slightly less memory to work with doesn't really change anything; you're still probably giving that statically-allocated program more memory than entire machines had in the 80s, and it's not like the limitations of computers in the 80s made us any better at reasoning about programs in general.
  [-]
  - d3ckard 16 minutes ago
    Yes, but allocations generate ever increasing combinatorial space of possible failure modes.
    Static allocation requires you to explicitly handle overflows, but also by centralizing them, you probably need not to have as many handlers.
    Technically, all of this can happen as well in language with allocations. It’s just that you can’t force the behavior.
- IshKebab 16 minutes ago
  > It’s the only kind of program that can be actually reasoned about.
  What do you mean? There are loads of formal reasoning tools that use dynamic allocation, e.g. Lean.
- dnautics 1 hour ago
  i think you mean "exactly not Turing complete"
  [-]
  - d3ckard 1 hour ago
    Nice correction :)
    It’s actually quite tricky though. The allocation still happens and it’s not limited to, so you could plausibly argue both ways.
    [-]
    - chongli 1 hour ago
      I’m confused. How is a program that uses static allocation not Turing complete?
      [-]
      - bmacho 14 minutes ago
        It just isn't. For example it has finite many states, so it is either cyclic or stops. A real Turing machine can decide if your program stops on a given input or not, just by running it long enough. It's really simple and well behaving, while Turing machines are really complex and weirdly behaving.
      - skybrian 1 hour ago
        A Turing machine has an unlimited tape. You can’t emulate it with a fixed amount of memory.
        It’s mostly a theoretical issue, though, because all real computer systems have limits. It’s just that in languages that assume unlimited memory, the limits aren’t written down. It’s not “part of the language.”
        [-]
        dnautics 55 minutes ago
        If we get REALLY nitpicky, zig currently (but not in the future) allows unbounded function recursion with "theoretically" assumes unlimited stack size, so it's potentially "still technically theoretically turing complete". For now.
      - e12e 22 minutes ago
        It's not, if it can do Io to network/disk..?
      - jerf 27 minutes ago
        Technically, your computer is not Turing Complete because it does not have access to infinite memory. Technically, once all the input has been given to a program, that program is a finite state automaton.
        That "once all the input has been given to the program" is doing a bit of heavy lifting since we have a number of programs where we have either unbounded input, or input modulated by the output itself (e.g., when a human plays a game their inputs are affected by previous outputs, which is the point after all), or other such things. But you can model all programs as their initial contents and all inputs they will ever receive, in principle if not in fact, and then your program is really just a finite state automaton.
        Static allocation helps make it more clear, but technically all computers are bounded by their resources anyhow, so it really doesn't change anything. No program is Turing complete.
        The reason why we don't think of them this way is several fold, but probably the most important is that the toolkit you get with finite state automata don't apply well in our real universe to real programs. The fact that mathematically, all programs can in fact be proved to halt or not in finite time by simply running them until they either halt or a full state of the system is repeated is not particularly relevant to beings like us who lack access to the requisite exponential space and time resources necessary to run that algorithm for real. The tools that come from modeling our systems as Turing Complete are much more practically relevant to our lives. There's also the fact that if your program never runs out of RAM, never reaches for more memory and gets told "no", it is indistinguishable from running on a system that has infinite RAM.
        Technically, nothing in this universe is Turing Complete. We have an informal habit of referring to things that "would be Turing Complete if extended in some reasonably obvious manner to be infinitely large" as simply being Turing Complete even though they aren't. If you really, really push that definition, the "reasonably obvious manner" can spark disagreements, but generally all those disagreements involve things so exponentially large as to be practically irrelevant anyhow and just be philosophy in the end. For example, you can't just load a modern CPU up with more and more RAM, eventually you would get to the point where there simply isn't enough state in the CPU to address more RAM, not even if you hook together all the registers in the entire CPU and all of its cache and everything else it has... but such an amount of RAM is so inconceivably larger than our universe that it isn't going to mean anything practical in this universe. You then get into non-"obvious" ways you might extend it from there, like indirect referencing through other arbitrarily large values in RAM, but it is already well past the point where it has any real-world meaning.
loeg 1 hour ago
We do a lazier version of this with a service at work. All of the large buffers and caches are statically (runtime-configured) sized, but various internal data structures assumed to be approximately de minimis can use the standard allocator to add items without worrying about it.
kristoff_it 1 hour ago
A coincidental counterpart to this project is my zero allocation Redis client. If kv supports RESPv3 then it should work without issue :^)
https://github.com/kristoff-it/zig-okredis
dale-cooper 1 hour ago
Maybe I'm missing something, but two thoughts:
1. Doesn't the overcommit feature lessen the benefits of this? Your initial allocation works but you can still run out of memory at runtime.
2. For a KV store, you'd still be at risk of application level use-after-free bugs since you need to keep track of what of your statically allocated memory is in use or not?
[-]
- nickmonad 1 hour ago
  Author here! Overcommit is definitely a thing to watch out for. I believe TigerBeetle calls this out in their documentation. I think you'd have to explicitly disable it on Linux.
  For the second question, yes, we have to keep track of what's in use. The keys and values are allocated via a memory pool that uses a free-list to keep track of what's available. When a request to add a key/value pair comes in, we first check if we have space (i.e. available buffers) in both the key pool and value pool. Once those are marked as "reserved", the free-list kind of forgets about them until the buffer is released back into the pool. Hopefully that helps!
- justincormack 1 hour ago
  You can work around overcommit by writing a byte to every allocated page at allocation time, so that it has to be actually allocated.
  [-]
  - dnautics 50 minutes ago
    out of curiosity, does that generally mean that (linux) OOM killer can't get you? IIRC the oom killer is only triggered on new page request, and only the requesting process is eligble for the murder?
    [-]
    - chris- 16 minutes ago
      No, it does not. The oom killer acts on (mostly) the oom score and no process is exepmt, regardless of whether or not it allocates new memory. It may help you write correct programs in certain situations though, eg. if your program was running in a defined context, eg. a cgroup, and you would not allocate beyond your cgroup limits, and the system was configured sanely, you can handle allocation problems easier.
lazy-lambda 45 minutes ago
No mention of AI - such a good post.
nromiun 1 hour ago
> All memory must be statically allocated at startup.
But why? If you do that you are just taking memory away from other processes. Is there any significant speed improvement over just dynamic allocation?
[-]
- AnimalMuppet 1 hour ago
  1. On modern OSes, you probably aren't "taking it away from other processes" until you actually use it. Statically allocated but untouched memory is probably just an entry in a page table somewhere.
  2. Speed improvement? No. The improvement is in your ability to reason about memory usage, and about time usage. Dynamic allocations add a very much non-deterministic amount of time to whatever you're doing.
  [-]
  - Ericson2314 1 hour ago
    If you use it and stop using it, the OS cannot reclaim the pages, because it doesn't know that you've stopped. At best, it can offload the memory to disk, but this waste disk space, and also time for pointless writes.
    [-]
    - norir 1 hour ago
      This is true, whether it matters is context dependent. In an embedded program, this may be irrelevant since your program is the only thing running so there is no resource contention or need to swap. In multi-tenant, you could use arenas in an identical way as single static allocation and release the arena upon completion. I agree that allocating a huge amount of memory for a long running program on a multi-tenant os is a bad idea in general, but it could be ok if for example you are running a single application like a database on the server in which you are back to embedded programming only the embedding is a database on a beefy general purpose computer.
      [-]
      - Ericson2314 1 minute ago
        Yes it is context dependent, but the parent comment was acting as if it was just better. I wanted to correct that.
  - kibwen 41 minutes ago
    > On modern OSes, you probably aren't "taking it away from other processes" until you actually use it.
    But if you're assuming that overcommit is what will save you from wasting memory in this way, then that sabotages the whole idea of using this scheme in order to avoid potential allocation errors.
  - jeffreygoesto 39 minutes ago
    Using this as well in embddded. The whole point is to commit and lock the pages after allocation, to not experience what you correctly describe. You want to have a single checkpoint after which you simply can stop worrying about oom.