Memory-efficient enum arrays in Zig




In general I'm maybe somewhat disappointed that pattern matching in Rust can't be expressed more as a type system trait-type thing (hand waving here) that any struct could conform to, rather than an explicit first class hardcoded object type with its own storage structuring, etc?

I may not be expressing this cogently, but. I've implemented both an AST (as in this article) and an opcode/bytecode interpreter recently, and I feel that in large part Rust's enums are not ideal for either:

I never had the memory usage concerns about the AST that this author did, but I wanted to do things like: add a line number / column attribute to every statement node. I had a `Stmt` enum, and I had a choice: put line # and column on every enum possibility (boilerplate and ugly), or put the enum in a new Stmt struct that contains both the original enum and the line/col attributes -- which involved a bunch of refactoring and also didn't feel elegant. I feel there must be a more elegant type system way that my "Stmt" struct could have been reworked, that Rust isn't offering me.

And re: the opcode stuff,I am fairly sure that a pattern-matched Rust enum is not an ideal encoding for a VM opcode interpreter performance wise. But the language really does want to point you in this direction, and the facilities for de-structuring patterns are very seductive. I haven't gotten to yak shaving this yet, but again I feel like there's likely some potential for improvements in the type system that could open things up so one could get pattern matching facilities while using one's own preferred lower level impl.

I dunno. Thoughts.


> pattern matching in Rust can't be expressed more as a type system trait-type thing (hand waving here) that any struct could conform to, rather than an explicit first class hardcoded object type with its own storage structuring, etc?

Do you think you can provide a more concrete example? First thing that comes to mind for me is some kind of structural typing [0], but I'm not confident I'm interpreting your comment correctly.



Other people who are better with these things could probably conceptualize it better than me.

What I'd like to be able to do is declare a trait that says "this thing can yield the following patterns, can destructure in this pattern, and these are the functions you'd call to extract references to those values"

And then be able to use match on it as usual.

Putting it another way, I wish an algebraic data type "enum" was something any struct could "be", with the type system providing the means to express that that's the case, and the programmer being able to provide the "how" and with existing Rust enums merely being the default way of doing this.


Haskell and F# have this feature. In F# it is called active patterns. In Haskell the feature is pattern synonyms (maybe combined with view patterns) and is used to e.g. pattern match on the prefix or suffix of a sequence-like structure, or to match a tree as a an element and left/right node without the caller seeing a bunch of other tree metadata.


Interesting idea. I feel a bit conflicted about it, but it's probably something I'd have to think about.

I suppose it's possible to somewhat approximate this in existing Rust by creating a trait with a function that returns a destructure-able type, performing the transformations that you would have implemented in your hypothetical trait, then pattern match over that. Not sure if that would be "good enough".


Several languages have what I think you are asking for. Have a look at Scala's extractors or F#'s active views.


you mean f# active patterns?


Sorry, yes.


Ah yes, many moons ago (like... almost 15 years) I worked in Scala and I recall extractors, or at least the discussion of them.


An old trick for bytecode interpreters is to use indirect jumps for going to the next opcode’s implementation (gcc had a ‘computed goto’ extension that was used for this. For Rust I guess you would want function pointers plus something to force TCE?) and putting those indirect jumps at the prelude of every opcode implementation. This meant that the indirect-jump-predictor (which cpus have because of oop) would have separate models for the ends of different opcodes and so would be more likely to predict correctly (maybe you have a hard time predicting a the next instruction but it is much more likely that a test is followed by a branch).

I think other tricks (e.g. storing top of stack in registers for a stack machine) matter more though, and I don’t know if the trick described above is still relevant.


For Zig this is planned via labeled continue syntax inside switch expressions:

As mentioned in the proposal, this would even allow implementation of Duff's Device in Zig :-o


Yeah I've spent time on this topic before when working in C++, but haven't invested much energy in thinking about how one would do it efficiently in Rust yet. Luckily performance is not my #1 concern, at least not yet. And maybe if I got down that road more I'd just invest in going JIT with cranelift or similar.


Just to add some jargon for folks who want to do more research into this:

The technique of having the interpreter dispatch duplicated in the postlude of each handler function is called "threading". If your dispatch uses an indirect jump, it's called "indirect threading". The indirect-jump-predictor is known as branch target predictor (BTB).

[deleted by user]

I do with proc macros could evolve to be able to query the compiler for information (which would necessitate careful thought to adding stages to compiling). I've wanted to know "does this struct impl this trait" or "give me a list of all concretely implemented traits" or various things like this that would be very, very useful in a proc macro.


IIRC the compiler runs plugins in two stages. The first stage gets the AST before any typechecking and it can make modifications to the AST; macros and some clippy lints run in this stage. The second stage runs after typechecking so it gets type information, but it cannot make any modifications; other clippy lints run in this stage.


I think the example code is bugged

field_map[idx] = svec.len - 1;

Would be wrong when svec contains size already at not the last svec entry




I only partially understand this article but this seems incredibly relevant to me since I'm planning on writing a spreadsheet engine in rust. Cell values require me to have something like (this very poorly written POC code, sorry):

    pub enum Expr {
        Binary(Box<Expr>, char, Box<Expr>),
        Function(String, Vec<Expr>), 
I'll keep reading and studying! If anyone has more resources to throw at me, I'm all ears

Hey, I've actually build a spreadsheet engine in Rust. Not open sourced, but could give you some pointers. You are going to hit lots of performance issues before you can have some gains using the methods of the article. The single most difficult problem is the evaluation strategy.


Hey, thanks for the offer. I have been thinking about evaluation and the extent of my strategy so far has been to consider it as a dependency graph, which led me to this interesting article:

Also it sounds like the right next step for me is to read more on incremental computation. Someone on reddit just pointed me to adapton:

Actually implementing anything with Rust is on my TODO list (I'm spending some time on the UI prototype right now)

I will give a couple things a try and then, if you don't mind, I can ping you via email. I was checking out your website and loved the read on the parser -- not to mention A MAZE (played all the way to level 4!)


(just jumping in here) You may want to take a look at which describes things on the spreadsheety and not-very-spreadsheety ends of the spectrum (moreso the latter). If you follow some of the links there you should find some good descriptions of various implementation strategies, e.g. Naiad style timely dataflow or incremental-style pseudotime.


thank you so very much! this is invaluable


Wow! Glad you had a look around. Yes, please send me an email whenever you feel like.

And yes, you are going to need some sort of "support graph" and topological sorting. But this is way more complicated than it looks like. Most functions in a spreadsheet don't know it's dependencies until you are actually computing them, like `LOOKUP` or `IF`. Some functions like `RANDBETWEEN` or `NOW` change their value every time. In some modern spreadsheets some functions like `SORT` or `UNIQUE` spill their values to some other cells. Calculating dependencies correctly in all those situations can be challenging.

There is not a lot of literature on spreadsheet implementation and reading code of existing open source implementations is hard. And exception is Peter Sesoft: and

My recommendation would be to first set the scope. Then code manually the lexer and parser and drawn those two in tests. Excel formulas can be quite complicated! There are lots of fancy things you can do there like use arena allocators and other optimizations. I would try to avoid that on a first go.

Here is something that I have found extremely convenient. Excel saves the cell formula AND the resulting value. Meaning that to create a test case you just need to create an Excel workbook (and a xlsx parser, of course).

Doing the UI for a spreadsheet is an equally challenging problem.

Good luck!!!!

Oh, and remember this thread from a couple of days ago. Might be inspiring:


Sounds like a fun project. If it's intended for ordinary users, one thing you should expect them to do is add content to the 4 extreme corners of a sheet and see if the spreadsheet engine falls over as a result. If you allow 1 million by 1 million cells, and you store a "null" for every cell not filled in, then the engine will run out of memory. That means you might consider storing the cell content in a sparse manner. One way to do that is with a hash map implementation such as "hashbrown". Just a thought.

(To clarify why I'm bringing this up, the points in this article are low level details that you probably don't need to think about if you start with a hash map and thereby avoid running into memory constraints initially.)


Thanks, great point indeed. I am looking into this

The way I think about it -- rather naively, I suppose -- is that I care more about the references cells make to each other than the actual grid of cells displayed on a table. The latter feels more like a "view" of the data than an actual data structure?

This also seems to align with the relative priority of (sorted from highest to lowest): figuring out the order of evaluation, calculating those evaluations, and finally displaying the results of the evaluation


The problem being highlighted is that if you have widely disparate sizes of the variants and/or a lot of these in an array, your hurting performance because there’s a lot of space being wasted on padding.

A common technique that came from gaming is that if you have an array of structs (AoS) (eg array<struct { health: int, ammo: Ammo, … }> to break it up into a struct of arrays (SoA): struct Humans { healths: Vec<int>, ammo: Vec<ammo>, … }. Then the ith index into those vecs is the ith Human in the AoS layout (parallel vecs here are for example, but it’s not optimally efficient because the bookkeeping of length and capacity on a per field basis is wasteful as that info is duplicate).

This is basically a similar idea except for enums and doing it automatically in a way that Rust can’t. It’s possible the problem is a bit overrated in terms of how big of a problem this is. Also, in Rust you may be able to wrap the third party type in your own and then the derive macro approach probably can work.

Re your spreadsheet, just keep in mind of this as a possible optimization and you have to first figure out if you’re building for speed (ie you’ve done this a few times and you know this is probably a bottleneck) or experience (ie build for simplicity and understanding)


Thank you -- definitely building it for speed which is why this felt topical. I really appreciate your explanation, it's super clear.


On the subject of AoS, Zig has a cool data structure for that:


More or less OT.

I keep hearing about both Zig and D as interesting C-like low level languages with better satefies/ergonomics. I wonder if someone who's familiar with both would like to discuss the main differences...?


D is a GC-based C++/Java-like OOP language. People used to call it "a C++ done right", but it was never simple and didn't have any clear edges against its competitors. TBH, one simply should not build on top of C++ - it's a beast that no one can comprehend, and it only keeps growing.

In case of Zig, I think it's a very good modern C alternative. Zig can do what C can do, can interoperate w/ C (e.g. importing C headers), and offers new features that are useful (comptime, slices, error type). It's like Go, but without GC and a large runtime.


Oh many I was loving your description of Zig up until when you said "it's like Go"...


I haven't heard of Zig referred to as like Go except in the concept of trying to replace C.

(Rust is more aiming for C++)


Yeah, it may sound weird, but there were days when Go was considered a C replacement - which turned out to be quite false. There’s a dialect - TinyGo - sorta carries the old vibe(?), but the upstream never picked up the idea.


Well WalterBright could have answered that. But since HN has started taking a piss out of WalterBright replying in every single Zig ( or Rust ) threads he stopped commenting on most if not all Programming languages topics. Sigh.


There are productive ways to comment on threads without being a shill for your own thing.


Zig is C-like. D is C++-like. Zig has first-class compile-time metaprogramming. D is a kitchen sink language; it has many more features than even C++ does. Zig is minimalistic and strictly imperative rather than multi-paradigm. D has a cult following but hasn't really taken off. Zig is a relative newcomer trying to prove itself.


i think the term "D has a cult following" is a bit harsh D's community is small, that is it, nothing cult like, they dont over defend D most of the community seem to be happy using other languages etc ..

but i agree with D being a kitchen sink language, trying to be many things, but never being the absolute best at anything, which i think is why it never really took off

i am really hopeful for Zig, trying to be a more pragmatic C replacement


> i think the term "D has a cult following" is a bit harsh D's community is small, that is it, nothing cult like

I think "cult following" is used here like you would for movies, it’s not actually cult-like, it’s more that it’s a rather small, close-knit community.


> Zig is C-like

Only in the sense that the language is simple and small, but C pays for that by being inexpressive (you can find families of C++ programs for which an equivalent C program would require exponentially more code). Zig, on the other hand, is small and simple while being just as expressive as C++. It achieves that through a remarkable observation: many primitive features can be avoided with a powerful enough compile-time computation mechanism with introspection.

Zig is so revolutionary in its design that it's hard to compare it to anything else. Or, put another way, any comparison is sure to miss out on some very foundational piece. So yes, in some sense Zig is like C (simple, low-level), but it's as expressive as C++. In another sense, Zig is like Scheme (minimalistic elegance), and yet it feels familiar and it does not rely on macros for its core power. In other ways it's like Go (easy to learn, easy cross-compilation), but it gives full control and is more flexible.

At this point, Zig is sui generis.


> Zig is so revolutionary in its design that it's hard to compare it to anything else.

I mean, Zig has an interesting combination of functionality, but I wouldn't say that the individual pieces are revolutionary. Who knows though, the whole could be more revolutionary than the sum of it's parts...


Zig's take on comptime is quite new (which is the main point of the article being discussed). Error handling and async/await as well.


Zig's take on comptime is slightly different from C++ and D, but that doesn't make it revolutionary. It has first class types, which allows you to pass them to a function along with normal function parameters, instead of having a separate syntax for it like C++ and D. But it's not really new.


That's a little like saying that smartphones weren't really new because we had mobile phones and calculators and cameras and video cameras and handheld games before.

Zig's comptime is revolutionary and really new in that it replaces many other separate features and is the basis of Zig's minimalism. Figuring out how to discard features is much more difficult than adding them, and simplicity is something that's impossible to add once it's lost. The message is not the existence of a rather general compile time execution, but that with a particular design of such a mechanism you can do away with other things.

[deleted by user]

To say something truly original you don't have to invent new words.


It's the only mainstream language I know where structs and modules are the same thing.


Yeah, I had written something like "Zig is pretty much designed around the comptime system" at first, should've probably kept it :D


The unapologetic embrace of compile-time metaprogramming in a clean model drives most of my interest in Zig. I use metaprogramming heavily in C++ to dramatically reduce code size and increase type safety. Contrary to popular assumption, in C++20 metaprogramming is relatively concise, tidy, and maintainable, unlike the eldritch horror that was metaprogramming in C++11. But I don't think anyone would describe it as elegant or designed. I haven't had time yet to try porting C++ metaprogramming to Zig in order to see what is possible. Most other systems-y languages have a much weaker metaprogramming story than C++.

Many developers underestimate the importance of metaprogramming in systems languages for ensuring quality and security. Defects scale proportional to the lines of code. Strong metaprogramming facilities can reduce the lines of code required several-fold.


C++11 gave us variadics which opened a new metaprogramming world. For true horror try cons-based typelists in C++98.


I agree with most of this, but I would say that D was probably designed to be an alternative to C++ or Java. Zig shares many of C's design philosophies, though I think D's syntax is a little closer to C's than Zig's is to C's.

D also has a betterC mode that removes some of those "kitchen sink" features, but I couldn't say how popular it's use is. Many people who use D like some of those features. Zig's more minimalistic approach to language features is closer to C's approach.

Zig likely has been influenced by D (among other languages), but I wouldn't be surprised if D's new importC functionality has been influenced by Zig.

I don't use D (or C or C++) at work, but learning D and using it at home has made me a much better programmer in the languages I use for work. Maybe I would have learned those lessons with C or C++, but I guess I took more easily to D than them.


> I agree with most of this, but I would say that D was probably designed to be an alternative to C++ or Java

You're agreeing with me then! D was pretty explicitly created due to WalterBright's (incidentally a prolific 60k+-karma HN user!), frustration with C++ and presumably the problems that stem from its requirement to maintain almost a 100% C compatibility. Bright is the founder of Digital Mars, a small company developing C and C++ (and D) compilers, so he has first-hand experience. Since 2007 D has been co-developed by Andrei Alexandrescu, a well-known C++ expert and one of the pioneers of template metaprogramming.

[deleted by user]

Having written a few toy compilers for C, to me D is the most elegant of the C families of languages.

For example, consider the versatility of the '.' operator. It does pointer dereference, static field access and class/namespace member/method access. In all those cases I want to access "a thing from this other thing". Having 3 operators in a language (., ->, ::) is easier for a compiler writer but it puts more cognitive load on the developer.

Now consider type extensions. As a language designer you could think: "if a developer wishes to add methods to a type Vector that exists in a third party library, then in their project, the developer would use some special syntax" like maybe:

    class Vector { void json() { ... } }
and maybe the developer also needs to have it even repeated across a header file and an implementation, but... why? It's an idea, but it's not a law of the Universe. There is a better way.

Instead, the compiler can help you out. Just declare a regular function anywhere like so:

    void json(Vector v) { ... }
which is not special in anyway, and then be free to invoke it as:


Why if it isn't our familiar dot operator! It also does type extensions via some syntax sugar? What an elegant solution.

This is just one feature of D, and it's filled with features. Though I don't really use D in my day to day work, it's wonderful to see how the authors thought about hard problems in language design and implemented elegant solutions. I think D deserves more recognition for the things it absolutely hit out of the park as a general purpose programming language.


Wait, the thing the article calls "array of variant arrays (or AoVA)" is not the canonical way to decompose tagged unions? Seriously? [A×B×C] ≃ [A]×[B]×[C], that's the most obvious thing, isn't it?

Edit: Actually, never mind, as the comment points out, the correct equation should be [A+B+C] ≃ [A]×[B]×[C] and it's not obvious at all whether it's actually correct.


Things like that can be done by specific collections (e.g. most ECS implementations store components per "archetype").

But from language perspective the limiting factor is that you're always allowed to take a reference to any instance of an enum. This means the tag must be stored in every instance. If you allow mutable references to enums, you must have enough space for writing any variant there through the reference.


Don't you mean something like [A+B+C] ≃ [A]×[B]×[C]


Of course not: you are mistaking sum types and product types.

Even if we assume they are product types, [A×B×C] ≃ [A]×[B]×[C] is still not correct. The former doesn't allow the number of As and Bs and Cs to differ: the latter does. So the latter strictly speaking admits more values than the former.

Since both of them allow infinite number of values, to make a size comparison we will use generating functions. A list of A (or [A] in your notation) allows an empty list (one possible value), a one-element list (as many values as A itself), a two-element list… which becomes 1+A+A^2+…=1/(1-A). The beautiful thing about calling them sum types or products types is that you can manipulate it just by summing or multiplying respectively.

So the number of values for [AxBxC] is identified by 1/(1-ABC). For [A]*[B]*[C] it's 1/(1-A)/(1-B)/(1-B) which simplifies to 1/(1-A-B-C+AB+AC+BC-ABC). Now it becomes obvious† this form admits more values and you can in a sense quantify how many more!

†: Okay perhaps it's only obvious if I also include a Venn diagram but diagramming is beyond what I can do on HN.


Yeah, and if you carefully expand 1+(A+B+C)+(A+B+C)^2+(A+B+C)^3+..., there'll be strictly more summands than in (1+A+A^2+A^3+...)×(1+B+B^2+B^3+...)×(1+C+C^2+C^3+...) due to more flexible ordering of elements, so they're not isomorphic.

I still think it's the most obvious thing that [A+B+C] at least maps surjectively onto [A]×[B]×[C] :)


It’s a section/retract, definitely not an isomorphism. You’ll need to carry around the index data, and any sort of array update will incur a crazy upfront cost.


Eh, I imagine it should be possible to maintain an out-of-line skip-list or an array of tombstones or something to deal with deletion/tag mutation (although in my experience with ASTs that's not that common of a requirement); and allocation can be done quite effectively with the help of the good old .bss section: just grab couple of millions of every type of node at the program's startup, virtual memory is almost free.

[deleted by user]

You can solve this problem if a few clear lines of GNU C. We declare two sister types, one of which is packed:

  typedef struct taggedunion {
    unsigned char tag;
    union tu {
      uint64_t   u64;
      uint32_t   u32;
      uint8_t    u8;
    } u;
  } taggedunion_t;

  typedef struct __attribute__((packed)) packed_taggedunion {
    unsigned char tag;
    union tu u;
  } packed_taggedunion_t;
Then accesses to our dynamic array will convert between the packed stored representation and the unpacked one in a straightforward way:

  taggedunion_t array_get(packed_taggedunion_t *a, size_t i)
    return (taggedunion_t){ a[i].tag, a[i].u };

For any language to be labeled as a "better c" language, it must solve this problem, having better enums is a requirement

It's actually one of the reason I initially gave Zig a try, not Comptime, it was Tagged Union


But it's C. The idea here is to have higher-level, type-safe languages do this transformation automatically.

[deleted by user]
[deleted by user]

Your packed union eliminates pure padding, yes, but it still costs 9 bytes to store a u8 value and misaligns accesses for wider elements. These are both significant misses on "solving the problem." You also need to copy each element on access, which could be expensive for larger plain-old-data types, or especially for non-POD types.


But this still takes 9 bytes (at least) to store a tagged uint8_t.


The article's approach requires 8 bytes, plus a separate array of 1 byte tags.

The article reduces it to 4 bytes, plus an array of 1 byte tags, by dropping the requirement for a 64 bit member.

If we have to grow the array, we have to do two reallocations.

Caching is worsened: the tag and content are going to be in separate cache lines.


The final approach in the article is an array for each unique object size plus tagged indices/pointers. This would take one byte per uint8_t and doesn't suffer from the problems you mention, though it does have others. If memory pressure is your main problem it's a big win.


This problem space feels like some variation of a packing problem [1]...

It would be nice to start with the end result (the human friendly structure) and generate data structure recommendations to reduce memory waste, respect alignment rules, and improve spatial locality of reference.




I was checking out gameboy emulators recently written in C++, Rust, and Zig that have seen recent development.

I couldn't get the Zig projects to build, which is understandable since the language seems to have had a lot of breaking changes in its build system and the language in general (which is why it comes with a caveat not to use it in production yet).

But given the sheer number of comments I've read over the years recommending Rust as a replacement for C++, I was more surprised to see that I couldn't really find a single Rust project that had correct timings (the frame rates were incorrect and audio distorted) and worked with most ROMs I tried.

Meanwhile, the C++ projects I found built without issue or hassle, had correct timings, ran all the games I've tried, had lots of features, etc.

What gives? Is emulation just not a good use case for Rust?


Building large-scale C++ projects has become less of an issue because of CMake slowly becoming the default build tool.


An alternative opinion (that I happen to hold) is that the unbelievable pain and frustration of dealing with CMake has been at least one of the forces that have led to the Cambrian explosion of new programming languages and build systems in recent-ish years.


While CMake is pretty annoying the problem from my experience are people using it incorrectly and doing stupid stuff that you can't change if you're using the project as a library.

If every library would be properly "packaged" using CMake would be far less annoying or frustrating. Just recently we had the case where a 3rd party library author of a sizeable c++ library set -Werror in his library which wreaked havoc.

[deleted by user]

>Meanwhile, the C++ projects I found built without issue or hassle, had correct timings, ran all the games I've tried, had lots of features, etc. What gives?

Aside the fact of the C++ projects being from seasoned C++ developers (which exist 3 decades now in huge quantities), not people starting with Rust as their first "native" language and using an emulators as a toy project? C++ exists for 3 decades more on top of Rust's entire life of 8 years since v1.0. And given the over time adoption distribution, most Rust devs are like 1-4 years old Rust users max.

Or the fact that those C++ are probably much more mature, and with more manyears in them compared to the Rust ones, some even being major community projects?


> What gives?

There are still more good C++ programmers than good Rust programmers.



Many people are still learning Rust and an emulator is probably a good way to get your hands dirty.

I'm doing something similar with a compiler project and my code is gradually getting better and more idiomatic.

But I'm still doing a lot of stupid things and getting the hang of it.


it's not at all about "quality of programmer" and you should never reduce extremely vague arguments to that. it's literally about time for quality program to be made an completed.

those C++ emulators have often been in development for 3-5x as long as a similar Rust one.


I recently found to be a very complete implementation


I did see that one, but it doesn't seem to build on Mac.


Related: for all the flak GC gets, C# was used to produce two amazing emulators:




ryujinx doesn't use the GC for their sub-systems, they use a mix of manual memory allocations throught their allocators or RC

For such project to succeed, you have to be able to optimize your memory allocation strategy

If you just rely on the GC, you'll have bad surprises (why does my game stutter during gameplay?!), they already have multiple due to their double dip on the JIT, nothing prevents the JIT compiler to compile during your gameplay, causing massive stutters, just like how shaders compilation is an issue at runtime


Personally I think C# is a great language to get stuff done. The big problem it has is the portability (especially when it comes to using it with a GUI).


GC often gets more flak than is deserved.

Often such flak ignores the differences between throughput and latency.

For long, lived processes you’ll end up writing some kind of garbage collection system.


Fwiw, ryujinx uses a jit on the hot path, and most (but not all!) bizhawk cores are not actually c#.


Ryujinx uses Microsoft's JIT .NET technology [1] which is best available for C# codebases. Nothing new here since it needs to translate Switch's ARM instructions to x86 machine code somehow.



C++ has an ages old tried and tested ecosystem for game stuff which probably makes a lot of rendering and audio tasks easier. It could take Rust years to catch up even supposing the language is inherently better.


As somebody who has written the same gameboy emulator in C++, Rust, and Zig (as well as C, Go, Nim, PHP, and Python) - I have yet to find a place where language affected emulation correctness.

Gameboy audio is kind of a pain in the ass (at least compared to CPU, which is fairly easy, and GPU, which is easy to get "good enough” if you don’t care about things like palette colours being swapped mid-scanline) - and some languages take more or less code to do the same thing (eg languages which allow one block of memory to be interpreted in several different ways concurrently will make the “interpret audio RAM as a bunch of registers” code much shorter with less copying) - but in my case at least, each one of my implementations actually has the same audio distortions, presumably because I’m misreading some part of the hardware spec :P

(Also yes, the zig version is currently failing because every time I look at it the build system has had breaking changes...)


Oh that's so cool! Thanks for sharing. I skimmed a few of these and I'll definitely take a closer look later. I enjoyed your observations on each language as well.

From a reader standpoint, I enjoy reading emulators written in C the most. It sucks when C is missing features for complex tasks, but emulation seems to fit nicely into what it can do without black magic fuckery. Of course as someone not super experienced with C, writing it feels like an endless sea of "oh god why is this segfaulting".

Rust OTOH bothers me to look at because there are so many sigils and language doodads for functionality that seems like it should be really straightforward but whatever, I'm sure that's just because I've barely used it and can't understand it. I've been learning C++ recently because learning materials are usually tailored to it for stuff I'm interested in (vulkan currently), while ignoring a constant nagging feeling that I should stpo being curmudgeon and take a closer look at Rust.

I've tried Zig but am not looking too hard atm for many of the reasons you mentioned in your repository, I'm hoping the language will stabilize at some point and that LLMs will help with some of the repetition/verbosity.


Odin would very likely be a great choice for the stuff that you're trying to do. It's a very straight forward language and has a "vendor" ( part of the libraries shipped with the compiler that has lots of bindings for gamedev-related things, i.e. OpenGL, Vulkan, DirectX, glfw, stb libraries, some audio libraries, etc.

On top of that it has swizzling in the language, some automatic array programming features and a standard set of modern niceties like good tagged union support, custom allocators that are standardized, and so on. It's a nice language for pretty much any use case when you want good fundamentals and straight forward solutions without much magic but for gamedev I don't think there's a C/C++ alternative that is as well suited as Odin overall.

If you want a brief (but enough to get me excited about trying Odin out) overview of the language you can find one here:

P.S. I've written a few posts on HN about what I like about Odin in more detail so I won't reiterate those things exactly here, but one of them is recent so you can find it in my comment history.


Argh, Odin looks so close to perfect for me. The vendor libraries look fantastic. But I'm working up to deploying on Android and using OpenXR. There's support and/or examples of that for C++ and Rust, and I could probably get Zig working either by cross compilation or by exporting to C. But I don't see any documentation or examples for cross compilation for Odin, and that use case seems to fall just outside the range of its built-in batteries.

Do you think it's doable?


I wouldn't know about Android, really, but I do know that there is some discussion right now about how some consoles are off limits due to not being allowed to ship non-C/C++ code to them and this will be addressed by compiling Odin to a subset of C in the future if I remember correctly.

I would probably inquire on the Odin discord server about this particular thing as it's very likely someone has already stumbled upon it:


wrt Zig, according to some it's a great fit for emulators, so maybe that's why there are a few :^)


...probably the fact most of the C/C++ project you'd find in emulation exist longer than Rust itself ?

I'd imagine most of them in Rust were written for fun rather than trying to best the established ones.


I'm currently writing an emulator in Rust (for Apple 2). For some aspects I'm better than my C++ counterparts.... BUT those counterparts have years and years of little refinements that make a huge difference on the fact that some games run or don't. There's no question about that.

Besides Rust is just fine. I use a bit of threads, a bit of OpenGl and none of these is a problem. I'd venture to say that the rust compiler is real good and that allows me to code things without worrying about optimisation. Older emulator have started years ago and they often had to optimize by hand which leads to not so readable code; just your usual code legacy story.

Finally, the cargo tool is a modern way to build the code and I've been able to port my code to Linux, windows, MacOs and raspberry pi without a single code modification (besides expected stuff like audio frequency, file system, etc being different).

The Rust crowds are just too new to have produced good emulators. Give us time :-)


I'm also noodling on an emulator, shooting for cycle accuracy (to the point where you'd be able to run vapor lock demos). Mine passes the 6502 functional tests but not yet the Tom Harte suite. To add to the difficulty, I want to run it on an RP2040 and bit-bang the video. Is yours up in an open source repo? Mine isn't (yet) but I'm open to joining forces.

I started mine about 6 years ago, put it on the shelf, and am getting back to it. Overall I think Rust is working well for it (and for pico DVI out), but I can also see the advantages of Zig for something this low-level and specialized.


133Mhz, that's 133 cycles per instruction for a 6502 (assuming you emulate only the 6502). Seems doable but then there's the BCD stuff and that's rather tough emulate (you'll definitely need a lot of cycles to reproduce that)...

But that sounds real cool. Would you share you project ?


It's a bit tricky, but my decimal adc is 19 lines of code, all of which is fairly straightforward, mostly bit fiddling stuff that I expect will compile to a few instructions each.

I need to go through an open source releasing process but I'll get that going. Thanks for your interest!

[deleted by user]

Are you serious, a single specific software you like doesnt exist in a specific language and you blame the language for that? You see how that makes no sense right?


That's not the situation. The software does exist and doesn't work. "Is this a because of the language?" seems a reasonable question.


It's not a reasonable question at all. My first reaction was: "maybe the authors did it as a toy project". If nobody is paying them to do a professional and bit-perfect GameBoy emulator then there should be no expectations that the software is going to be good.


That's an insightful reply, and an important point to consider. There's been a number of great answers in this thread, and the general consensus seems to be "the language is a great fit, but the projects haven't had time to mature."

These seem (to me) like reasonable responses to a reasonable question.


> What gives? Is emulation just not a good use case for Rust?

You fell for Rust PR (which is understandable, considering that both this site and Reddit are full of loud Rust fans/influencers). The language provides memory safety and some fancy ML-like syntax and that is it. It won't implement good emulator for you or make you a better SW developer.


The most accurate IBM PC emulator is in rust.


I think its a perfectly fine use case but we're not yet at the point where that community has reached a high enough level of rust adoption that there are a significant number of mature projects, even for something as relatively simple to emulate as a gameboy.


Emulation seems fine for Rust. I don't think there's a good answer to this anecdote. One option is that when I learned C++ I built an emulator to learn how to write an emulator, whereas people writing emulators in Rust are perhaps often trying to learn Rust. Maybe that leads to a difference.

It's basically impossible to say though, we'd need to do a full survey of the C++ and Rust emulator ecosystem and look at which bugs are most prevalent, control for authors experience and skill, etc...

edit: I'll also note that when I built an emulator there were guides for it in C++. I don't know what materials exist in Rust but building an emulator is actually a project that I would consider C++ for (if for learning purposes) just because it's so well documented how to build one using the language.


> What gives?

My guess based on no evidence whatsoever: The Rust community seems to have lots of low effort enthusiasts - people whose primary motivation is "use Rust" rather than "solve a problem". People whose motivation is to solve a problem (in this case build an emulator so they can play a game or whatever) tend to be a lot more oriented towards the end-result, rather than what tool they're using. I'm pretty confident this isn't an issue with the language.


I think it's more "learn rust" than "use rust", and nowadays a lot of partly written learning projects are published on GitHub. A lot of the similar "learn C++" projects happened earlier and either stayed private or are quite good by now.


Most of the developers I know that are learning Rust do not come from a systems language background. They have to learn the domain as well as the language, which is a high bar. Learning to write systems-like code well takes a lot of time and experience.


Timings and audio handling sound like logic things and not something Rust's advantages over C++ would inherently support better.

If I had to hazard a guess, it's that the C++ projects are just more mature and have seen more "in-the-wild combat" to get to the state they're in now. Other factors that may contribute (in no particular order):

  - existing emulators are Good Enough and polishing a new one is less "urgent" than other problems in the current zeitgeist;
  - hobby project vs. no-longer-a-hobby project status; and
  - emulator projects in C++ are of similar quality are just as prevalent, but are significantly older and instead of being discoverable on GitHub are abandoned on Sourceforge or personal websites.

That would be my guess as well. Emulation, interpretation of decades-old binary formats, and similar problems are hard to get right on the first try, they often need years of testing and breaking until they are right. A project that has been around for 20 years will always have an advantage in this space over a newcomer project.


Funny how being on GitHub makes you discoverable, while maintaining your own domain name does not…

Actually no, that’s not funny at all.


You should add Nim to the list! There's a few implementations, though I'm not sure how complete they are. As others point out they're often used as ways to learn a language.

Fun writeup: Improving RosettaBoy performance:


I don’t think Rust is inherently bad to write emulators. For example one of the best cycle-accurate IBM XT emulators is written in Rust:


Most people developing GameBoy emulators these days do it as a toy project, not a serious effort to create an ultra-accurate general purpose emulator. It's like writing a raytracer or a Pong clone.

The best GameBoy emulators like Gambatte predate Rust by almost a decade and are often written in C++. Since GameBoy emulation is pretty much a solved problem there's no strong motivation to innovate in the scene.

I've written several emulators in Rust, I'd argue that it's very well suited to this exercise, but it's also not an area where Rust shines particularly. There's no complex memory management in emulators usually since most memory buffers are just statically allocated constant-size arrays that mimic the physical RAM on the console.


So the question is why a project created in a language that has been around since before the devices being emulated are more mature than ones in a language that is 8 years old?

Sorry to be so snarky. It probably depends on people's time working on it and their familiarity with async processes. This is an inherent problem with any media like that. With sound, it's primarily that it can't come before the image but can lag behind the image. Stuff like this.

I don't think any programming language is going help with that because it's how we perceive things, not just how correct your code is.


> But given the sheer number of comments I've read over the years recommending Rust as a replacement for C++,

> What gives?

This is The Hype Wave, where something is new and exciting so it's recommended over the things of the past without regard to whether it's as stable/full featured/correct. People are excited about Rust so Rust has been jammed into just about everywhere to varying degrees of success/correctness. This is the same as Javascript/Node and Go over the last decade-ish. Zig and Mojo are probably next.


Which Rust emulators? Which C++ emulators?

Perhaps the C++ emulators were serious, long-standing projects, and the Rust emulators were just hobby projects.

Most hobbyists only care about getting a game on the screen.


There's another strategy that has good storage efficiency and keeps the ability to iterate over elements. vec #1 is a list of tags, vec #2 keeps the byte offset of each element, and vec #3 (not really a vec) is the packed variant data pointed to by vec #2. This:

- is half the number of vecs of the author's final solution (6 vs 3)

- wastes no bytes for padding (unless needed for alignment)

- lets you iterate in a cache-friendly way since the data is laid out in order in memory regardless of type

- even lets you access items by index in O(1)

Generally it has the same performance characteristics as a Vec<T>, but for heterogeneous data.


But if you need to mutate such collection, you’ll end up writing a memory allocator (to handle removals, changes to larger variant, and to deal with fragmentation).

[deleted by user]

Storing byte offset inline, great idea. Thanks for mentioning that.

Just note one disadvantage: if the byte offset is stored in memory, you have a data dependence on the traversal. So even though it's cache-friendly, it may cause serious memory stalls in the processor's pipeline.


One trick for this is to leverage the branch predictor to guess where the next item will be (I first saw this trick for linked list traversal as in practice linked lists are often laid out linearly). Something like

  // or guess based on the size of the current item
  let guessed_next_offset = current_offset + (current_offset - previous_offset);
  let actual_next_offset = byte_offsets[++i];
  previous_offset = current_offset;
  current_offset = guessed_next_offset;
  if(current_offset != actual_next_offset)
    // need to make sure the compiler doesn’t ‘optimise out’ this if and always run the below line
    current_offset = actual_next_offset;
  // ...
In the ordinary case, where the guessed offset is correct, the cpu predicts the branch is not taken, and no data dependency of ... on actual_next_offset is introduced. If the branch is mispredicted then that speculatively executed work with the wrong current_offset is dropped. This is a bit slow but in the case where this happens all the time, the branch predictor just gives you the slightly bad perf of the traversal with the data dependency (computing the guess will be negligible) except you pay if the guess was actually right.

I'd love for someone to benchmark these variants to tease out their actual characteristics (particularly for x86).


Offsets can add a significant amount of space relative to small T, if their size is not optimized (e.g., 64-bit size_t and uint8_t T). So as long as we are careful about offset size this seems like a reasonable approach.


Indeed part of this is pointed out in the video about Carbon linked in the OP: juxtaposition gives you a ‘free’ pointer going from each node of your data structure and for some tasks you don’t really need more than that.


I still have yet to understand if Zig actually provides memory safety in the same manner Rust does or not, it seems like its another C like language without memory safety? Or did I happen to miss that bit?


Zig provides tools to avoid memory related bugs, but it doesn't enforce it via compile time checked RAII/burrow checking etc.

Nullability (and error handling) is compile time checked though.


To me the biggest win in Rust is just that, the memory related bugs are nearly non-existent. Throw in a group of people working on something and this makes the project move just so much faster in both review time and time hunting bugs.


IMHO the real big win w/ Rust is not so just memory safety from (some) leaks or segfaults but the fact that the borrow checker rules apply also to concurrency related scenarios and so race conditions and deadlocks become much more difficult to accidentally produce.

Developers can generally be taught to handle memory ownership issues fairly well, esp when helped out with RAII smart pointers, etc.

But in my experience when you throw concurrency in, even very high quality engineers tend to stumble and accidentally introduce race conditions.

In this respect Rust has something over the garbage collected languages, too. Because Java, Go, JS, C#, etc will effectively shield you from "use after free" but they will absolutely not stop you from passing an unlocked, etc. reference to another thread and screwing yourself up royally; and Rust will.

[deleted by user]

Zig gives buffer overflow safety and null pointer safety but not use after free, double free, or stack pointer escape safety.

Zig also gives you memory exhaustion safety which rust does not.


What do you mean by memory exhaustion safety?


Is there a memory exhaustion scenario where Rust does something other than panic?


You can use the fallible allocation APIs such as `Vec::try_reserve`


Rust-the-language doesn't do any dynamic allocation, so it doesn't exhaust memory at all. The Rust stdlib currently guarantees an abort on OOM, but in the future this will be changed to a panic (which you can always configure to be an abort if you want). Both of these behaviors are memory-safe (it's unclear what "memory exhaustion safety" is referring to).


Also to clarify: there are platforms (like linux) which de facto have nonfailable allocation, if you run out of memory the system will oom kill your process, and neither zig nor rust will save you from that.

I believe In practice the most common place where failable OOM is a big deal is in embedded systems programming


No language can save you from OOMs, but because Zig pervasively uses explicit memory allocation, it makes it easy to greatly mitigate OOM risks by front-loading all the fallibility:

1. Calculate and allocate all your required memory immediately upon process startup (including memory from OS resources like sockets)

2. Call mlockall to prevent the pages from being swapped out

3. Write "-1000" to /proc/self/oom_score_adj to avoid the OOM killer

4. Use your preallocated memory as a pool for all further allocations

With the above approach, the application has full control over how to handle application-level OOMs (e.g. applying backpressure to connecting clients, or shrinking non-critical caches) once it is past the start-up stage.


> memory exhaustion safety

Sure this more like DDOS mitigation, so it memory related safety, not memory safety.


>Zig gives buffer overflow safety and null pointer safety but not use after free, double free

It can provide the latter two through the use of the `GeneralPurposeAllocator`, which tracks UAF and double free.

Stack pointer escape safety is being actively researched (there are a few tracking issues, see [1]). I'm personally interested in this, I've written a decent-ish amount of Zig for toy/personal projects, and anecdotally stack pointer escape is the only type of memory error that has bitten me more than once (though to be fair, one of these cases was calling into a C API, so even Rust wouldn't have helped).

More broadly, the ability to catch all forms of UB in Debug/safety builds is an accepted proposal [2], though whether or not it will be possible in practice is a different (and interesting!) question.




The way `GeneralPurposeAllocator` works is kinda scary though, since it may result in whole memory pages used only for very small allocations, effectively multiplying the memory usage of the program. Also kinda goes against the memory exhaustion safety, since now you're more likely to exhaust memory.


An allocator that does heap quarantining at the page level is not a "general purpose allocator". It is a debug tool.


Yeah, I don't think it's right to say Zig doesn't have use-after-free and double-free safety. If you want that, you can just use a general purpose allocator. Note that you can't forget to choose an allocator since they are explicit.

Is this somehow harder than, say, choosing not to use "unsafe" in Rust?

Maybe all that is missing is a linter to help enforce whatever memory-management policy you've decided on. That's not really needed for small, coherent teams, but would be important for using Zig in larger projects with multiple teams and/or disparate individual contributors. (Perhaps such a thing exists and I just don't know about it.)

You might also be able to use an arena allocator where free is a no-op. That has different tradeoffs, but is also safe for use-after-free and double-free.

As you say, stack escape is the main thing where Zig doesn't have a good memory-safety story yet (at least not that I've heard). I guess there are a few others that concern me when I see them on a list, though I haven't hit them in real life.


C has use-after-free and double-free safety if you patch out free. Not really a solution, is it?


What do you think the difference is between “patching out free” and allocators as a first-class feature? I’ll bet you can think of a few rather significant ones if you try.


Zig has custom allocators as a first-class feature. It does not have memory safety as a first-class feature.


Static analysis at the IR level would be awesome. It could catch use-undefined, stack escape, and probably uaf/df as well so you don't have to lean on gpa's (potentially expensive) tricks. Plus there are times when you maybe don't want to use page allocator.

As an aside. I'm not certain I understand how double free is memory unsafe (in the sense of "causing vulnerabilities")


A double free breaks invariants for the memory allocator. For example, the freed memory may have been handed out to someone else and if you free it again, it will be marked as unused even though that code relies on their stuff being available. One very classic way of exploiting a double-free is a sequence like this happens:

1. Some code allocates memory.

2. The code frees the memory, but keeps a stale reference to it around. It is marked as unused by the allocator.

3. Some other code allocates memory. Maybe it's reading the password file off of disk. The allocator has some unused memory lying around so it hands it out–but it turns out that this is actually just a reuse of the buffer from earlier. It is now marked as "in use" again by the allocator.

4. The code from earlier has a bug and frees the allocation again. This means that the allocation is now marked as "unused".

5. Another allocation request hands out this memory again. Maybe it's a buffer for user input? Well, it's been scribbled all over with other data now.

6. Someone asks to authenticate and the password checking code gets called. It has the password right here to check against…oh, wait, that memory got freed out from under it and overwritten with some attacker-controlled content!


> I'm not certain I understand how double free is memory unsafe (in the sense of "causing vulnerabilities")

Perhaps there are some allocators where doing that hits UB. UB in memory allocation is probably always a memory safety issue. I would say if your code accepts any allocators where double-free could be UB then you've got a safety issue.


> I still have yet to understand if Zig actually provides memory safety in the same manner Rust does or not

It doesn't.


So its C with nicer macros and maybe some more sensible ways of dealing with arrays and such?

For me that's not enough to move the needle.


It's more than that in that it has a sane syntax and module and build system.

I'm a Rust person by both day job and hobby, but I can see the niche Zig is in as being quite useful in particular for embedded work.

One thing it definitely has over Rust is way better support for explicit per-structure allocator/allocation management. The allocator_api stuff in Rust has just been sitting unstable for years at this point, while Zig shipped with support for this immediately. Which is key for work in all sorts of systems level contexts.

Probably the language I want is somewhere between the two.

[deleted by user]

I mean I really have a hard time going backwards on the whole idea of lifetimes and send/sync type traits which force escape hatches if you want to do something idiotic. The key piece here is, any project of significant size and usage, someone will try to do something stupid. Rust makes the stupid so obvious with unsafe blocks and annotations, that reviewing code from some random person on the internet is made easier. The tooling made the task of reviewing code mostly limited to the code being changed rather than the whole project.

In C/C++ land you need to consider not only the code being changed but also the entire code base and how that code is used. It's not enough to look at the diff. You need to understand the context.

Not to say that isn't also true with Rust to some degree, but the degree is usually in terms of logic and less "is some pointer/reference going to cause a race/segfault/buffer xflow?"

The langauge itself doesn't define allocation, Box is in the stdlib and this allows for nostd libraries to deal with things as I guess most of us might expect I'd think. It would be cool to allow for a global allocator though to enable std on more platforms with better semantics, no disagreement there.


It doesn't, but it puts a RED BANNER in every place where memory is allocated, and you need to explicitly pass an allocator of your choosing whenever memory is allocated. It also contains no hidden allocations or function calls. So it forces you to think about your memory allocation patterns, and also includes some tools to catch memory leaks [1].



That doesn't really help.


Just as your comment :-p


If you didn't find "thinking about memory allocations doesn't help much with memory safety" useful I don't really have much more to share.

[deleted by user]

I found this article to be clarifying:


In effect this answers my question pretty well, the answer seems to be a resounding "No there's no memory safety" at least in the practical or realistic sense that results in few to no memory lifetime errors.




Rust’s type system ended up being Turing-complete anyway. It seems like it fell into the same trap as C++ templates: it’s an accidental programming language in a programming language, but with way worse syntax.

The lesson for language designers may be that every useful type system is doomed be Turing complete, so you may embrace it from the start, instead of trying to make it declarative.


Fascinating! Now I'm curious, what causes this? Where does Turing completeness come from?

Polymorphism? Inference? Higher-kinded types? (Probably not that last one, I don't think Rust has them.)

Put another way, what would it take a for a type system to NOT become Turing complete?


All you need is a way to branch and a way to loop, so anything with conditional types and recursive types will be Turing Complete


Turing completeness doesn't have to come from a single feature, it often comes from a combination of features.

e.g. the interaction between subtyping (e.g. inheritance) and generics (with variance) is tricky:

It can be highly nontrivial to tell if a language actually has a Turing complete type system: the 2007 Kennedy&Pierce paper made it likely that java was turing complete; but it took until 2016 until it was finally proven that to be turing complete (

> What would it take a for a type system to NOT become Turing complete?

An analysis of all possible interactions between all features in the type system, building a formal proof that the type system is not turing complete. This is not really realistic for the style of complex generic type systems that programmers are now used to, it would need to be a vastly simpler language.


Surely they knew that it would become a programming language in its own right. That's the fate of all such statically typed languages which give you type-level power but that aren't dependently typed. (Of course dependently typed languages might end up with other languages like its own tactics language.)


I don't think anyone has ever been surprised to find that a type system (with inference) is turing complete.


it's easier to make something Turing-complete than not... most interesting example is probably rule 110.


The point is that it’ll end up that way, so just embrace it from the start.


the lisp way. the problem is it results in software which is difficult to reason about due to all the compile-time execution, even if you have state-of-the-art compile-time debuggers (if any language has them, it is lisp).


I haven’t so far found any zig code terribly hard to reason about, and it’s important to know that compile time type crafting (currently) doesn’t support decls. Although you can madness your way around that, theres idiomatic ways to manage this.

Comptime is definitely very powerful, and even the top people using zig frown upon “magic”.

But to be completely honest, “magic” shit just happens all the time if you enable it. Rust macro abuse to get “magic” comes to mind. If it happens in rust, it’ll certainly happen in zig.


That something can be Turing-complete in the worst case doesn’t mean that you hit those worst-cases on a frequent, or even occasional basis.


> instead of trying to make it declarative

Wait, no. A language being Turing complete and declarative are completely independent things.


That's the point!


> Rust’s type system ended up being Turing-complete anyway. It seems like it fell into the same trap as C++ templates

I hear it is a common trap.


author here. That's a really good observation kornel. Rust's philosophy of avoiding post-monomorphization errors at all costs is usually sung in high praises (followed by a ridicule of C++ template errors) - but it comes at a cost. And when you stray away from the declarative way and start plastering const generic bounds everywhere you really feel the pain.

Anyways, I think engaging in this topic as a community is super important. Cause that's the only way to push PLs forward and explore this massive space.


This is sort of the approach that I've considered. Type systems are just predefined code/behavior baked into a language, but there's technically nothing stopping you from arbitrarily extending the compiler like Lisps do with macros to support a type system that's written in the language itself.


I did this! Singeli is an Elixir-like compile-time language where types and variables are first-class values, on top of C-ish semantics that are meant to be more like "portable assembly". It's designed for high-performance code where you have to be able to control the specific instructions, but the overall strategy could be useful in other domains too.

And a podcast on it came out Friday:


I listened to that, it was good! I was surprised there is an entire podcast about array programming. Looking forward to future episodes, and also to trying AoC with BQN again in a few months.

[deleted by user]

Genuine question:

How do type system that are so abstract and complex to become turing complete help?

I definitely find it extremely helpful to be able to write containers of any type (like std::vector<T>), that saves a ton of code duplication, but beyond that what more is needed and why? What we're competing with here is: just write a function that operates on the data types you want and gets the job done.

You're writing code for the CPU that has to actually do something, on actual known types.

What programming task is simplified by having an "any" type or a type system that allows you to write pong-played-turn-by-turn-using-compiler-error-messages? It's cool that you can have an "any" type just like universal sets in set theory, but what real-life programming scenario does this simplify (you can already write containers that can contain anything you want without using such as thing as an "any" type)?

At least not the kind of programming tasks I do, but admittely I think fairly low level and prefer my types to have exact known amounts of bits, known signed integer convention and endianness so I can efficiently use shifts and get the bits I need, preferably with as little undefined behavior as possible.

Asked differently: If one were to design a programming language that only has basic types (primitives, structs/classes, ...) and templates to allow functions/classes to operate on any type (but not more than that; substitute template type with the actual type, compile this, nothing more), what feature will users of the language be missing and complain about?


> Asked differently: If one were to design a programming language that only has basic types (primitives, structs/classes, ...) and templates to allow functions/classes to operate on any type (but not more than that; substitute template type with the actual type, compile this, nothing more), what feature will users of the language be missing and complain about?

This is a really good question. There are some languages that work as you describe: SML and some others in that family. There are generic functions and types, but the type parameters are basically just placeholders. You can't do anything with a value whose type is a type parameter, except store it and pass it to things that also take type parameters.

That gives you enough to write a nice reusable vector type. But it doesn't let you easily write a nice reusable hash table. You can, but users have to pass in an explicit hash function that accepts the key type every time they create a hash table.

It might be nice if a type itself could indicate whether it's hashable and, if so, what it's hash function is. Then, if you create a hash table with that key type, it automatically uses the hash function defined by that type.

Now you need some sort of constraints or type bounds in your generics. That's what traits in Rust and bounds in Java and C# give you. (The literature calls it "bounded quantification".) It's a big jump in complexity. But it does mean that now you can call functions/methods on arguments/receivers whose type is a type parameter, and those calls can be type checked.

Bounds are themselves types, so what kinds of types can you use in bounds? Can they be generic? If so, what kinds of type arguments are allowed? Can you use type parameters from the surrounding type?

For example, which of these are OK and which aren't (using Java-ish syntax):

    class A<T extends Foo> {}       // 1.
    class A<T extends Bar<Foo>> {}  // 2.
    class B<T extends B<Foo>> {}    // 3.
    class B<T extends Bar<T>> {}    // 4.
    class B<T extends B<T>> {}      // 5.
Any kind of bounded quantification will give you 1-3. What about 4 and 5? This is called "F-bounded quantification". Why would you want such a thing?

Collections with fast look-up are important, which is why we extended our generics to enable us to write nice reusable hash tables. But some data types aren't easily hashed but can be easily ordered. A sorted collection is faster than an unsorted one.

How would we write a generic sorted collection? We could require you to always explicitly pass in an ordering function for any given element type, but it would be nice if the element type itself could supply is order function.

You could define a "Comparable" interface that a type can implement to support comparing an instance against another object of some type, like:

    interface Comparable<T> {
      int compareTo(T other);
And then implement it on your type, like:

    class Color implements Comparable<Color> {
      int r, g, b;

      int compareTo(Color other) => ...
In our sorted collection, elements all have the same type, so the bound that we need looks like:

    class SortedCollection<T extends Comparable<T>> { ... }
Notice that we have "T" inside the bound. That's F-bounded quantification.

Using type parameters inside a bound isn't the only place recursive types like this show up. Let's say you wanted to make a generic type comparable. You'd do something like:

    class Pair<T> implements Comparable<Pair<T>> {
      T a, b;

      int compareTo(Pair<T> other) => ...
Now here, the implements clause is using not just the type parameter of the enclosing type, but the entire type.

We had a couple of fairly modest goals:

* Be able to create reusable hash tables where the hash function is inferred from the key type.

* Be able to create reusable sorted collections where the comparison function is inferred from the element type.

And in order to get there, we needed generics, bounds, and even F-bounded quantification.

Adding even a little more usefulness to our collection types will quickly have us reaching for variance annotations, associated types, and even more exotic stuff.


> It might be nice if a type itself could indicate whether it's hashable and, if so, what it's hash function is. Then, if you create a hash table with that key type, it automatically uses the hash function defined by that type.

> Now you need some sort of constraints or type bounds in your generics. That's what traits in Rust and bounds in Java and C# give you.

Isn't having a function "Hash", called in your template, that takes your type as argument (and give compiler error if the function doesn't exist for this type, as a consequence of substituting in your type) sufficient for this?

In other words, duck typing


Yes, C++'s duck typing approach is one solution to this.

It takes the solution out of the type system, which keeps the type system simpler.

But it effectively turns your compiler into an interpreter, and an interpreter which may fail.

One way to think of C++'s notoriously huge, incomprehensible template compile time errors is that they are effectively stack traces of the template expansion interpreter running at compile time. When you see one of those errors, you have to figure out which chain of compile-time execution led to it.

Everything that's frustrating about dynamically typed errors that makes users reach for static types is exactly true of C++'s template system as well. (And, conversely, everything that's powerful and simple about dynamic types is true of C++'s template system.)

It's actually even worse in C++ because of SFINAE. The "interpreter" running at compile time in C++ doesn't just abort on the first error. It's like an interpreter for a dynamically typed language that also supports overloading. Any time a function has an error, the interpreter backtracks and tries the next overload. If all overloads fail, then it keeps unwinding.

So what you get isn't just a call stack on a template expansion error, it's a call tree of every place it tried to get to that failed.


This was a fantastic write up. Very interesting, thanks so much!


It's about when you want to enable optimisations or conveniences depending on the parameterized type.

For example, in the std::vector<T> type, if T supports being moved, you want to use that when growing your vector for performance. If T doesn't support being moved, you will have to copy it instead.

Boom: you've ended up with template metaprogramming.


> You're writing code for the CPU that has to actually do something, on actual known types.

(Partial) specialization, which is a feature used to get templates to do actual something on actual types is what principally allows templates to be turing complete.


Basically design in dependent types into the language right from the beginning (like Iris) - such that types are first class and can be computed and manipulated like any other language construct.


> Rust’s type system ended up being Turing-complete anyway. It seems like it fell into the same trap as C++ templates: it’s an accidental programming language in a programming language, but with way worse syntax.

Most anything in software can easily end up Turing-complete though. It isn't that high of a bar unfortunately. I think most mature type and template systems end up as Turing-complete. To single out Rust for this is to ignore that this is just standard fare for mature typing systems.

TypeScript's types are Turing complete:

Python's type hints are Turing complete:

Java generics are Turing complete:


"It isn't that high of a bar unfortunately." --> "It isn't that high of a bar."


> Most anything in software can easily end up Turing-complete though.

not really, if you actually pay attention when youre designing the type system:


My experience with Turing-completeness in any custom DSL or similar is that you will generally get Turing-complete systems unless you work really really hard to avoid them. And it only takes one wrong move by anything anywhere in the system, and you'll end up with Turing-completeness.

Instead of trying to avoid Turing-completeness, just expect it, embrace it and instead deal with its consequences to contain the fallout.


> Instead of trying to avoid Turing-completeness, just expect it, embrace it and instead deal with its consequences to contain the fallout.

this seems like such a cop out. its like saying "oh failure is a given, so don't even try to succeed". at least currently, Go generics are NOT Turing complete. the generics were designed in part specifically to avoid that. so just because Rust (and others) failed, doesn't mean its impossible.


I don't think most people care at all that their type system is turing complete. It basically never comes up.


Agreed. Adding to this point, why should I care? Am I missing anything obvious on why you don't want it?


Not really. In theory it means your compiler might never terminate... that does not usually happen.


> this seems like such a cop out. its like saying "oh failure is a given, so don't even try to succeed".

It is a pragmatic cop-out yes. I found it is easier to make progress by assuming that you'll get to Turing Complete rather than investing in the time to avoid it. I found that the downsides of Turing Completeness are usually overhyped or primarily theoretical.

> so just because Rust (and others) failed, doesn't mean its impossible.

It definitely is not impossible. But I don't know if it is worth it.

In the end I want fast compile times, type safety, easy to maintain code and good error messages. I do not care if the type system is Turing Complete.


> In the end I want fast compile times, type safety, easy to maintain code and good error messages. I do not care if the type system is Turing Complete.

I think the problem is some languages with Rust go too far with generics, which probably triggers the turing complete. for example, this is valid Rust code:

    let mut handles = Vec::new();
    for i in 0..10 {
        let handle = do_work(i);
but you have to follow the code all the way to "do_work" before you ever find the type of anything. Go does not allow this. you need to either declare a concrete type:

    var handles []int
or a explicit generic type:

    type slice[T any] []T
    var handles slice[int]
I think Rust is lose lose, because the underlying type implementation is Turing complete, and I would argue the code is actually less readable because of the overuse of type inference.

> I think Rust is lose lose, because the underlying type implementation is Turing complete

I am not a Rust coder so I can't really comment.

Currently, I do TypeScript, which also has a Turing complete type system, and I love it. Of course all things in moderation. Even though I could make a completely obtuse type design for projects, I try to not write code that others can not understand.


"if you actually pay attention" or, in other words, if you produce novel research into the area that's worthy of publishing, and noting that this is only just the start and has its own limitations:

> The cost is that of requiring a whole program analysis and disallowing programs that would result in infinite instantiations (Section 5.3). Clearly, this is the beginning of the story, not the end.

[deleted by user]

The point is that Zig basically circumvents issues surrounding that, by introducing "comptime", where you can just write regular Zig to achieve the same things.

The article showcases a nice example of having this very direct power.

But it really comes up more often than you think as soon as you actually have it.

It's easy in Zig to allocate precisely based on computed values and then have the sizes as part of your types etc. It all falls out of some simple ideas and it's all just regular Zig code.

"Types in Zig are values of the type type" from:

So instead of making it hard to write incorrect programs, Zig makes it easy to write correct programs.


The problem is that it ends up being dynamically-typed, like C++ templates. See:

> So instead of making it hard to write incorrect programs, Zig makes it easy to write correct programs.

Well maybe in theory, but the current state of Zig is that it makes it hard to write programs no matter how correct because the compiler keeps crashing ¯\_(ツ)_/¯


What are you talking about? The tagged versions are usually quite stable and also plan bug fix follow ups.

If you follow master, you’ll occasionally run in to crashes, which is true of any developing language. If you don’t want that, follow tagged versions.


>instead of making it hard to write incorrect programs, Zig makes it easy to write correct programs.

Or May be rephrasing it ( To avoid the word "instead" which may anger Rust supporters );

Rust Makes it hard to write incorrect programs, Zig makes it easy to write correct programs.

I think this single sentence captures the philosophical difference between Rust and Zig. And of course there is no right or wrong in philosophy.


I fully agree. It’s an interesting trade off that is worth thinking about.


Here is a really good link that shows the power of Zig's comptime:


I am certainly sometimes envious of comptime and what it makes practical, but it’s worth noting that it results in dynamically-typed generics, whereas Rust goes for statically-typed generics, which is in keeping with its goals. There are some significant general maintainability improvements in statically-typed generics; when you use dynamic generics, subtle changes in one place can cause obscure compile errors in a completely different and seemingly unrelated place (commonly called post-monomorphisation errors); this doesn’t happen with static generics.

So… I’m not sold on your wording that it’s circumventing issues, as it’s choosing a different set of trade-offs. In shedding types-are-a-language-of-their-own, you also shed confidence about what’s a breaking change, and make call sites more fragile. Decide for yourself whether it’s worth it.


What do you mean by dynamically-typed generics and statically typed generics here? I've looked up "post-monomorphization errors" and found some things about assertions about generic types failing because of the choices made by the user who passed in types or constants that do not work with the generic code. It seems like Zig libraries have the option of generating the errors at the right place if they place their assertions in the function that returns the type to the user, but they also have the option of generating the errors in the wrong place if they place their assertions in methods of the type.

> So… I’m not sold on your wording that it’s circumventing issues, as it’s choosing a different set of trade-offs. In shedding types-are-a-language-of-their-own, you also shed confidence about what’s a breaking change, and make call sites more fragile. Decide for yourself whether it’s worth it.

Client code can just look at all the members of all the structs so there's not really much hope for enforcing that changes cannot break any client code using compiler-adjacent tooling.


It’s easiest to see the distinction in otherwise-similar systems, so I’ll choose Rust generics (statically-typed) and C++ templates (dynamically-typed).

In Rust, the generic constraints (the traits that the type must satisfy) are a contract, part of the signature. The caller must satisfy them, and then the callee knows nothing else about the type it has received. Therefore, changes inside the body of the generic method will never† cause any code that uses the function to stop compiling.

In C++, templates don’t have that, so you have to seek knowledge of what conditions your type must satisfy some other way, and it’s easy to accidentally depend on additional details (since it’s not statically checked), so that changes in the template that you thought were harmless actually break someone else’s code somewhere else that uses your template in ways you didn’t expect. has a decent example, though it’s from 2014 and refers to Zero and One traits that were removed from the standard library before Rust 1.0, and the compiler messages would be better today as well.


† In practice there’s at least one way of leaking details, so post-monomorphisation errors that aren’t compiler bugs can actually happen, though it’s very rare: if you return an `impl Trait`, the body leaks whether it implements auto traits like Send.


I'm still in the honeymoon phase with this language and learning, but I agree it's a trade off.

For example your LSP isn't going to help you as much while you edit code.

However being able to express arbitrary compile time constraints and pre-compute stuff without having to go through a code generation tool is really powerful. You can actually use all the knowledge you have ahead of time as long as you can express it in Zig.

So far it seems like Zig is carving out a very strong niche for itself.


is there anything fundamental about using the same language at compile time to generate fully static typing that is boiled away? I don't know, but it doesn't seem so?


Can you show how you'd implement a Turing machine in Rust types?

I am not seeing it, so am likely overlooking that.


Agreed. The strong agree is that Rust's type system---and many other languages' type systems---is a parallel language to its runtime language. Parameterized types are like functions; instantiating a type parameter is like a function call; `impl ... where` does pattern matching. You get a pure functional language based on pattern matching and implemented with memoization. These type systems are often Turing complete on purpose, but yet it's surprisingly hard to get an infinite loop. Like, both Rust's and Java's type systems are Turing complete, yet I've never hit an infinite compilation loop in either by accident.

For more on type systems as programming languages: plus its links at the top.

What I would like to see is a programming language where the runtime language and the comptime language are the same, or nearly the same, and where the comptime language is type safe. Zig isn't this: its `comptime` language is essentially dynamically typed. In Zig, if a function `f` takes a `comptime` argument `t: Type`, it can call `.print()` on it, and the compiler will just assume that that's fine. If you call `f`, you had better read the docs because the type system won't tell you that `t` needs to have an `print()` method. If you're calling `f` directly, that's pretty straightforward, but the trouble comes when you actually call `h`, and `h` looks at the value of some string and based on that decides to call `g`, and `g` calls `f`, and the docs for `h` weren't entirely clear, so now you're seeing an error in `h`, which you didn't even call. Instead, if `f` is going to call `.print()` on a `t`, then its argument `t` can't just be a `Type`, the compiler should check that it's a `Type with method .print()->String`. This requirement would then flow to `g` and `h`, so the type signature for `h` is guaranteed to tell you that `t` is required to have an `print()` method.

For more on merging runtime and comptime languages in a type safe way, see 1ML:

EDIT: Deleting my criticism of C++ templates lest it distract from the more substantial things I had to say above.


> What I would like to see is a programming language where the runtime language and the comptime language are the same, or nearly the same, and where the comptime language is type safe

I have aproximately 0 knowledge of it, but I think TemplateHaskell should do that.


Nope. Haskell's type system guarantees that code you construct in Template Haskell is well-formed, but the code you construct is only type checked once, at the end. So if you have a function that constructs code, and the code it constructs has a type error in it, you won't find out unless you call the function. Just like Zig.


> What I would like to see is a programming language where the runtime language and the comptime language are the same, or nearly the same, and where the comptime language is type safe.

MetaOCaml might the closest one.


> and where the comptime language is type safe

You need dependent types in order to do this, which means doing away with Turing-completeness. (Moreover, the principle 'Type is of type Type' as found in Zig comptime leads to type-unsafety. So you need to replace that with some notion of universes.)


Well, you can have dependent types but limit their expressiveness. Rust, for example, has dependent (comptime) types, but they're very extremely limited:

    fn foo<const N: usize>() -> [f32; N]
I imagine having arbitrary comptime code, but more limited use of comptime values in type position.

Also, do you know the exact issue with "Type is of type Type"? I know that can lead to _non-termination_, and non-termination completely breaks proof assistants. For example, you can prove `False` with:

    fn make_false() -> False { return make_false(); }
But if you're not building a proof assistant, a function like `make_false()` is fine. Does it lead to any additional problems?

I think there's still value in pushing people towards something declarative. For example, a static type system might have an `any` type for things that aren't expressible statically but that doesn't mean type system designers should embrace dynamic typing from the start instead of trying to make it static. Similarly, it makes sense to have panics/exceptions in a language as an escape hatch for errors where the application can't reasonably do anything gracefully, but it's (arguably) a bad experience to take that to the extreme and panic for all errors.

I think there's value in accounting for the possibility that there will be edge cases that preclude hard-and-fast rules like "purely static" or "purely declarative", but I dislike the philosophy of projecting the 1% use case (e.g., "dynamic" or "turing complete") onto the 99% use case (where e.g., static and declarative would be ideal). I like when languages design for the 99% case and allow for escape hatches for the remaining 1% with the understanding that these escape hatches are intended to be used judiciously.

To put it differently, embrace that there may be escape hatches in the initial design, but prefer to think of them as "escape hatches" with the entailed understanding that they should be rarely used.


100 times this. C++ is progressively making more and more of the language and stdlib available at compile time (constexpr, consteval, constinit). Once this is accomplished, there will be two very complicated Turing-complete programming languages available at compile time, one that is convenient to program (C++) and one that is not (C++ templates).

Zig jumps straight to the finish line by making the main language available at compile time out of the gate, and by using it as the "generics language", Zig's generics are both simpler and more powerful.


How does this AoVA data structure work in practice? Don't you lose index-based access, in the sense that arithmetic on indices is potentially no longer meaningful in the array sense? Iteration will not preserve insertion order.

I think TLV (tag-length-value; length can be implied by the tag) is more commonly used in this context due to its better caching properties, and it least it provides meaningful forward iteration. See getdents/inotify/Netlink messaging.


> How does this AoVA data structure work in practice? Don't you lose index-based access, in the sense that arithmetic on indices is potentially no longer meaningful in the array sense? Iteration will not preserve insertion order.

The caption to figure 4 suggests that the AoVA pattern is not a good fit if you need to retain the full order of inserted elements:

> Compared to the SoA layout from before, we have a partial order instead of a total order. So upon insertion, we get back a tagged index that holds both the enum tag and the index in the particular variant array.

So by my reading, ordered access is considered out of scope here. (I suppose you could recover ordered iteration by storing a global index in each element, but that still wouldn't help with ordered random access, and it would likely lead to some really branchy code.)


Yeah I imagine an array write that changes the type of an index would be insanely expensive…


In practice, changing tag type is rarely done.

In any case it wouldn't be particularly expensive; you'd have to make a new member of the destination type which is as expensive as an append. The "hole" that remains in the original vector can be filled in constant time by taking the last member of it and moving it there, then shrinking its size by 1.


I'm sure there are classes of problem where that's not unusual. Even in the stated example of an AST, there are use-cases for code-fixers (eg lint fixers) that operate on the AST.


Don’t those things usually work on concrete syntax trees? And shouldn’t the solution be to use the right data structure for the job (using language feature to make this easy) rather than pushing one system (e.g. the compiler) to be worse for the sake of another tool (the linter)?


AoVA lacking its own total ordering on indices might be a problem for some use cases, but not for the one that is suggested here: AST nodes.

In these cases, the arrays can be just one component (could think of it as an arena) of a heap-ish structure. [1]

The cost is that your indices now need to be two dimensional (tag_idx, va_for_tag_idx). But the number of tags is known at compile time and you can optimize storage by packing so that tag_idx is the upper 4-5 bits and va_for_tag_idx uses the rest.

See: [1]


> upon insertion, we get back a tagged index that holds both the enum tag and the index in the particular variant array.

These are essentially pointers. If you want to iterate, you store the pointers in an array in the order you want to use. It’s the same thing a program would do if it allocated memory from a heap.

Storing things based on their size is also done by garbage collectors and general-purpose allocators. They might get some efficiency from knowing all possible object sizes, though. Also, like an arena, they could gain some efficiency from having a simpler way of deallocating.


> In it he describes how a parsed clang AST regularly consumes 50x more memory than the original source code!

I guess that seems like a lot but the context I'm missing is - how good could it get? I mean, if you want to preserve source locations of each token and you need to encode enough information in the AST to properly recover, what's the ideal growth factor over the original? 1.5x? 15x?


As a point of comparison, I think simdjson tape is only ~3x larger than the original document. Much of this could be saved if one did things like stuffing numbers into only one tape slot or not copying strings that have no escape sequences in them (instead referencing their location in the original document).

The largest overhead you could achieve is apparently 8x for a document that is mostly the characters "[]" or also 8x for a document that is mostly the characters "0,"


There's a ton more context that needs to be held in a clang AST node compared to a JSON doc. Just to mention a few: file source, token range, expr/stmt/decl statements and their relation, macro expansion related info, and tons of diag to be able to walk back to the original source token, and padding added by the compiler in those data structures.


If you can achieve a memory savings of 30%, say, that would be pretty big news. If you can only achieve a 30% memory savings by doing things to the compiler that make it harder to work on the compiler in the future, then it's probably not worthwhile. But if you can achieve an 80% memory savings by doing violence to the compiler, that might be worth the trouble.

As for what's an ideal source->AST expansion factor for a language that has a user-friendly compiler that's also compiler dev-friendly, that's hard to say. Clearly 50x is workable.

In TFA the 50x expansion factor is used as a motivator for automating a particular type of optimization. It'd be very interesting to see a Rust vec-of-enums that automatically deconstructs enum values into tags and opaque values that it could store in a struct-of-arrays like TFA does in Zig. The places to bury unsafe use into for this would not be many.


Maybe I'm just dumb but isn't the endgame of vec of enums going to be an ECS?


And the end game of ecs is a columnar rdbms

[deleted by user]

Source code is surprisingly compact.

As for one data point on how much better it can be, here is Zig's own parser, parsing Zig's own parser:

    # Source bytes:       139 KiB
    # Tokens:             24646 (120 KiB)
    # AST Nodes:          10998 (140 KiB)
Each token is 5 bytes which is pretty minimal (1 byte tag + 4 byte file offset). AST Nodes are also encoded compactly and non-uniformly, in this case coming out to roughly 13 bytes each. Despite these minimal encodings, the parse tree comes out to almost 2x the size of the source file in this case. 2x is a lot better than 50x though.

Source: zig ast-check -t lib/std/zig/Parse.zig | head -n7


> 2x is a lot better than 50x though.

the 50x metric was using something like this AST analysis mechanism (and not a system metric like resident set size)?


I think you should just watch the linked talk. It’s excellent. Though they don’t put exact numbers on it iirc (perhaps too early to be confident – they might be small due to missing data they don’t yet realise they’ll need).


As an aside and unrelated to the content of the article ... but the form of the article, with footnotes in the margin, is really useful.


Thanks! It's inspired by this


Another quick question for you, what are you using to make those hand-drawn looking diagrams? They look amazing.


Thanks for sharing this! Looks fantastic, I'll definitely incorporate this into my own blog once it's finished.


If you’re interested in the general concept of sidenotes and labelling figures in the margin, here’s my take on it for further inspiration:, with exhibiting figures.

Do search around for other sidenote implementations, too. I like the markup and presentation I end up with, but you’ll also find other approaches to mobile support in particular, involving things like checkbox hacks to toggle visibility, or just flat-out requiring JavaScript.


Generally speaking: sidenotes are excellent, footnotes are fine, endnotes are terrible. On the web, as a typically-not-paged medium, “footnotes” practically always actually means endnotes.

[deleted by user]