A Modest Compiler for JavaScript

In the last year or so, I’ve been working on mcjs, a Modest Compiler for JavaScript: a toy (hence the modesty) implementation of a JavaScript VM including both an interpreter and a tracing JIT compiler, with an internal design inspired (loosely) by LuaJIT. It’s still far from any sort of completion but it’s a very enjoyable learning activity nonetheless.

This blog exists because this little side project has now accrued enough “history” that it seemed like a good idea to write down and collect some notes, if only for myself to remember why I made a certain choice or another. And if some reader somewhere finds it useful or interesting, then it’ll be all the more worth it :)

(Many thanks to Vincenzo for reviewing this post prior to publication.)

Brief history of its predecessor: mclua

(You can skip this one, no hard feelings.)

I started mcjs as my second attempt at writing a compiler. In 2021, after temporarily changing my living arrangement in response to the COVID situation in Italy, I started devoting a part of the proverbial “nights and weekends” to learning how compilers really work, a topic for which I had long felt a strong but unsatisfied curiosity and interest. It was also a way to practice my software development skills while the “curfew” (sorry, couldn’t find a page in English) prevented us all from spending our nights in pubs as we would have otherwise.

There is plenty of good reading material online, ranging from academic papers to blog posts to tweets. With this sort of DIY curriculum, I managed to cover a bunch of topics, including various intermediate representations (SSA, A-normal, CPS), basic optimizations such as constant folding and dead code elimination, register allocation, and more.

This was all put into practice by working on mclua, a Modest Compiler for the Lua programming language. The intention, if you can believe the naïvety of it, was to purposely go against the grain and write an Ahead-of-Time compiler instead of the classic interpreter, pretending that I didn’t know any better (I do, I promise), and then try to learn as much as possible from this “artificially misguided” endeavor. To top it off, the compiler itself was also written in Lua.

I can say now that the experience has been a lot of fun while it lasted. Lua is an unusual choice for compiler, but if performance is of no concern, one can take advantage of its “dynamicness” to build some pretty comfy abstractions. There is also a certain satisfaction that comes from working with the most bare-bones of tools: no debuggers, no IDE/IntelliSense, just the ’lua’ standalone interpreter and a ton of unit tests. I can join the choir of those saying how delightful Lua is in its minimalism, “obviousness”, and flexibility.

In the end, though, the misguided-ness of this attempt caught up to me and I decided to stop working on mclua. By this time, I had already learned a ton about compilers, I had already felt the magic of seeing your own compiler crunch an apparently complicated function into a handful of sharp and specific IR instructions, while the relative dearth of real-world standalone Lua code that I could use for testing my compiler on was starting to feel more and more limiting.

So, after some time, I decided to pivot to JavaScript and JIT compilers, which brings me to mcjs.

So what is mcjs, really?

I wanted to have more fun with compilers and programming languages implementations than I was already having, so I started mcjs as yet another side project.

This time, I wanted to do things more like “the grownups”: mcjs is a bytecode interpreter with a JIT tier. Although the source language is no longer Lua, I tried to take some lessons from LuaJIT, of which I admire the good trade-off between runtime performance and implementation simplicity.

Here is what I want mcjs to be and not to be:

Compatibility. It should implement enough JavaScript and Node.js APIs to run a non-trivial library or application. It won’t implement all of Node.js or run an actual system. That’s fine!
- To avoid scope creep, I’m just going to ignore some of the more advanced features of the language, or heavily simplify them. For example, generators and async/await will probably not make it; I will only implement a simplified variant of ES modules, while skipping CommonJS entirely (or maybe I’ll change my mind and do it the other way around!).
Performance. It should be within 10x of interpreter-only V8/Node.js. In other words, any performance that is not obviously the result of a bug will be fine.
- Still, the JIT should be measureably faster than the interpreter.
- I don’t think that the interpreter will ever be as fast as LuaJIT’s, which is hand-coded in Assembly with hand-picked registers by a master of the craft. I’m fine with a dumb portable interpreter. I might refine the bytecode design and/or add an intermediate compilation tier based on “quick and dirty” copy-and-patch at some point.
Learning. This the real goal: knowing how the runtime interpreted sausage is made. Of particular interest to me are optimization techniques, JIT compilation techniques and efficient bytecode design. I’m going into this without a detailed understanding of what I’ll need to study. I have a clear plan laid out in broad strokes in my head; I’ll just see what’s needed and study it when time comes.

Finally, creating any sort of real open source project and community is out of scope. I wouldn’t even know where to start, honestly. Still, if you have any comment or critique, or want to discuss any of the topics presented here, or want to fork the code for whatever reason, I’d love to hear about it!

Did it have to be JavaScript?

I didn’t pick JavaScript after thinking very long and hard about it. It seemed cool, and that was it. But there were some things on my mind that tipped the scale:

There is a world-economy-moving amount of JavaScript code readily available.
- In particular, it’s useful to have many “real” packages that are designed to run in a standalone process (in Node.js), accomplish some generic task, and are simple enough to test my compiler on without implementing too many “system” APIs. In contrast, most of the simple-enough open source Lua code I could get my hands on seems to have been made to extend a larger application and it relies on the the application’s extension APIs to be run and tested.
- The prospect of one day running simple but “real” programs or libraries is good for motivation!
There are multiple production-quality parsers ready to use, multiple implementations you can compare yours to (and they even write fantastic blog posts), and a lot of high quality documentation.
I kinda like using it, especially with all the features from ES6 and later.
JavaScript has an official spec. I still think it’s not as good as using an existing implementation as a test oracle, but I appreciate being able to look up a detail and read a clear, black-on-white definition of how things ought to work.

There are also plenty of drawbacks: JavaScript is a much more complicated language, notoriously plagued by weird, unintuitive semantics of even the most basic parts of it. Still, the point here is having fun and learning, and I’m going to do plenty of both with JS.

Alright, show me what you got

The code for mcjs is currently hosted on GitHub. It’s not particularly refined or polished for public fruition. What you see is what you get, really. It’s written in Rust, so building and running it on a basic level should be as easy as running cargo build and cargo test.

Some work is already done:

A good chunk of the syntax is supported already. All the basic expressions, statements and declarations are in, alongside an extremely simplified version of ES modules (import/export).
The interpreter is capable of running the JSON5 library to a non-trivial degree (although there is still a lot to be fixed). A copy of it is included in the source tree as a test asset.
The debugger has most of the basics (setting/clearing a breakpoint, next instruction, continue, restart, view details of the values and objects on the stack/heap).
I started some work the JIT compiler, but I decided to “shelve” it in favor of going further with the interpreter first. This is because the JIT implementation and design is by its nature very tightly coupled with the interpreter’s internals. I realized that the interpreter is still too far from complete and susceptible to change in the near future, so I would risk reworking the JIT completely many times before having the whole VM work fine. So, it’ll just wait.

But much more are needed in order to reach the goal. I think the most important tasks are:

Complete the implementation of the basic parts of the language’s semantics, at least the basic parts of it. Objects have to work just right.
Garbage collector. Right now it just leaks everything!
Increase the coverage of the standard library.
Fix, fix, fix bugs
Finally, resume work on the JIT!

Considering the above, my current strategy is to:

work on the interpreter until it is able to run more and more of the ’language’ section of the official ECMAscript’s test262 suite;
keep running it until the interpreter passes most tests (modulo exclusions that I will justify on a case-by-case basis).

The above was the primary motivation for finishing up the “ESM-lite” deal and the debugger. This being a “nights and weekends” sort of deal, I haven’t been able to reach a satisfying level of productivity, but I’d say the situation is not too dire yet. For the same reason, there is a non-zero chance that I’ll just rewrite the entire roadmap or change my goals more or less radically. My feeling is that I’ll stick with it, though.

So, this is it. I will write more posts, relating on the “interesting bits” of the code, and on the challenges that I face as I go along with it. I’m aiming at a frequency of 1 post per 1-2 weeks. Looking forward to it!

Brief history of its predecessor: mclua#

So what is mcjs, really?#

Did it have to be JavaScript?#

Alright, show me what you got#

Brief history of its predecessor: mclua

So what is mcjs, really?

Did it have to be JavaScript?

Alright, show me what you got