[ONGOING] Extending LLVM's Kaleidoscope: Code-gen edition

8888-08-08

Prologue

Hey everyone, how everyone's doing? I've graduated Berkeley and have been back at my parents' place for a while now, getting ready to start Igalia :) I'm still doing good hahhaa :)

I always thought getting back to OC after being far far away in Berkeley wonderland means that everything's gonna change for me: I won't have to do homework anymore :) I'll go to bed early blabla bla; But here I am writing this blog at 2 AM hahhaha. I realized that changing the environment doesn't necessarily change me personally; I'd need to change myself on my own :) Being next to my family makes me really grateful, but now I'm starting to miss my high school teacher as well as my nanny in Viet Nam. I really wanna go back soon :)

Anyways, let's talk business hahah :) This article documents what I've learned about the basics of LLVM's code generation process, self-contained in lowering from AST to LLVM-IR. This includes basic stack variables, addition, subtraction, etc etc to strings, structs as well as garbage collection.

As is tradition, here are three songs for you by Zedd: Papercut, Addicted To A Memory, and Done With Love. All three songs showcase extremely well the emotional depth of a person in and out of love.

I hope you enjoy the songs (and the blog post as well!) :)

Introduction

Codegen in LLVM is an extremely well-thought-out process. For a simple stack-centric codegen process, the framework users can abstract away SSA form; the alloca process together with mem2reg allows an extremely fast assembly generation even for primitive stack allocations.

In this article, I'll report on the codegen process of sammine-lang, an extension on the kaleidoscope tutorial. Besides code generation for scalar values, funcitons and control flow, the article also touches on code gen for aggregated data types such as structs :) I hope everyone enjoys :)

The below picture demonstrates sammine's ability to generate code for fibonacci, a classic mathematical problem :) [TODO: show that sammine can indeed code fibonacci and output assembly for it]

Aggregated datatype's also generated with sammine: [TODO: show that sammine can indeed code record]

On the side of coding

A few brief words on building tools and ast-ir inspection

Codegening

Alloca(tion of the stack) and mem2reg

todo talk about allocation of the stack, first draws out an example of the global var in llvm ir

Variables

discuss three modes:

creation: in var def and the map that keeps it, this done using alloca
modification: in binary op =, this stores to address of alloca
read: this loads from address of alloca.

todo: talks about in relation to the walk and visitor pattern

Control flow

discuss how alloca makes this easier.

Functions and Extern (via PrototypeAST)

todo: talk about a caveat that we need to allocate "alloca" addresses at the start of the function. todo: relate to how strong the visitor pattern is

Case study: printf

Back in Berk, I sometimes would play Factorio in my free time. The game is intuitive and interesting. But for some reasons, the game really stresses me out. "Wait a minute, I love problem-solving, don't I? But I feel so stressed trying to build the factory with these new science packs and these new different energy types." Now, in this summer of 2025, when I'm writing my own compiler, I suddenly realize the reason. I love problem sovling, but I was solving the wrong problems. I wasn't interested in trying to build and maintain factories. Rather, compiler always seems to have a softer spot in my heart. That and blog writing, hence I'm writing this :)

Ah, I still don't know what I'm rambling about. I guess what I'm trying to say is in that in my experience, while I enjoy problem-solving in general, it also matters what type of problems I'm solving. Factorio, leaning hard into resource management and logistics, just isn't my forte, which is creative expression (writing), coding and abstraction (compiler engineer). Maybe front-end and/or back-end development isn't what you love, then you should consider switching to compiler engineering, ahhahahha :P

Anyways, let's now talk about the lowering of printing..

Right now, in sammine, I'm adopting a python-esque way of printing:

print(x);  # x and y are variables in this case
println(y);
println(2.4 + 5.3); # printing should also support expressions

With int and double powered by alloca, how should we go around lowering this?

We know that in libc, the signature for printf() is bla bla bla hahaha

print(2)
print("x")
println(2.4)

What we'll do is, depending on the type via the AST, we'll access the

Garbage collection

Let's talk a bit about records (or structs)

Shadow stack GC + Reference counting definition

Alright, let me provide some definition about shadow stack scheme as well as the mark and sweep algorithm.

Shadow stack scheme: the reason the shadow stack scheme got its name is because the stack in this scheme is invisible to the machine :) Each function, when we lower them from AST to LLVM-IR, will add onto itself a so-called "stack_entry" struct. This struct will maintain records of how many objects are allocated in the current function as well as where the caller of the current function's stack entry is. It got its name due to the fact that it doesn't have to mess with the stack model of any specific platform. Indeed, according to the llvm docs: " has a significant portability advantage because it requires no special support from the target code generator, and does not require tricky platform-specific code to crawl the machine stack." Unfortunately, having to create each "stack_entry" for each function and perform bookkeeping on a non-native level, it's slower than other schemes.

Ref counting:

Shadow stack

When a function y is called from a function x. The function y needs to know that x is its caller. Unfortunately, at the llvm IR, everything is abstracted away with the call() syntax.

-> This means y doesn't actually know who its caller is.

We'll solve this by maintaining a list of stack entries:

Each time a calle is called, the caller needs to construct and push a new stack entry onto the global linked list of stack entries.
When the callee wakes up, it know exactly who its caller is, just by querying the global linked list.
One problem remains: should the callee pops the stack entry when its done? or should the caller? (maybe something to do with main)

Case study: malloc

Epilogue

The article, as you can tell, is directed towards new grads and/or beginners in LLVM. If you've benefited from the article or if you'd like to support my writing these blogs, please consider getting me a red bull :)

I also want to extend my thanks to all the developers that helped contributed to the Kaleidoscope. Without them, the journey to generate code, as well as the creation of this article, would be ten-fold harder. I realize that we all stands on the shoulder of the previous generations and of giants and I would like to pay my tribute to them.

Jasmine Tang

My blog