Jasmine Tang

My blog

[ONGOING] LLVExtending Kaleidoscope: Code-gen edition

Edit:

Hi everyone! My friend Julius is looking for compiler full time roles and/or internship opportunities for 2025/2026.

Julius is an undergrad at Western Governors University specializing in computer architecture and compilers.

He's contributed to LLVM and is knowledgeable of the compiler architecture, as a result, he's well‑versed in C, C++, Python, LLVM, and Assembly.

His impressive open source projects, which have a combined stars of 200+, range from developing OS kernels to writing hypervisors for AMD-v.

His resume is here. I hope everybody can give him a referral or reach out to him at his LinkedIn

2025-06-27

Prologue

Hey everyone, how everyone's doing? I've been back at my parents' place for a while now, getting ready to start Igalia :) I'm still doing good hahhaa :) Getting back to OC after being far far away in Berkeley wonderland means that I have to build back as well as get in touch with old support network. I guess it's just takes a while to get used to. [TODO: talk about health and personal issues persist, nanny]

This article documents what I've learned about the basics of LLVM's code generation process, self-contained in lowering from AST to LLVM-IR only :) More experience and time is needed to venture out further away to write more sophisticated articles. Oh well, maybe after I've been at Igalia for a while hahaha :)

As is tradition, here are three songs for you by Zedd: Papercut, Addicted To A Memory, and Done With Love. All three songs showcase extremely well the emotional depth of a person in and out of love.

I hope you enjoy the songs :)

Introduction

Codegen in LLVM is an extremely well-thought-out process. For a simple stack-centric codegen process, the framework users can abstract away SSA form; the alloca process together with mem2reg allows an extremely fast assembly generation even for primitive stack allocations.

In this article, I'll report on the codegen process of sammine-lang, an extension on the kaleidoscope tutorial. Besides code generation for scalar values, funcitons and control flow, the article also touches on code gen for aggregated data types such as structs :) I hope everyone enjoys :)

The below picture demonstrates sammine's ability to generate code for fibonacci, a classic mathematical problem :) [TODO: show that sammine can indeed code fibonacci and output assembly for it]

Aggregated datatype's also generated with sammine: [TODO: show that sammine can indeed code record]

A few brief words on building tools and ast-ir inspection

A very, very smart CS person in my compiler class at UC Berkeley once told me something along the line of "use/build your tool very well" and I feel like I want to share it in this blog. The use case of when he told me might be different , but the principle still remains the same: regardless if it's about the compiler, terminal tool, or your IDE, it really pays off to learn how to use your tool so good that it completely boosts your productivity into a different dimension.

After having to cmake --build build -j the compiler project for 50+ times a day, I built brt.nvim to shorten the dev cycle; now I press three keystrokes to build, run and test the compiler project (Hence the name BRT :)) :)

The same applies for developing sammine itself :) There were a lot of frictions for different features. bla bla bla

Alloca(tion of the stack) and mem2reg

todo talk about allocation of the stack, first draws out an example of the global var in llvm ir

Variables

discuss three modes:

  • creation: in var def and the map that keeps it, this done using alloca
  • modification: in binary op =, this stores to address of alloca
  • read: this loads from address of alloca.

todo: talks about in relation to the walk and visitor pattern

Control flow

discuss how alloca makes this easier.

Functions and Extern (via PrototypeAST)

todo: talk about a caveat that we need to allocate "alloca" addresses at the start of the function. todo: relate to how strong the visitor pattern is

Epilogue

The article, as you can tell, is directed towards new grads and/or beginners in LLVM. If you've benefited from the article or if you'd like to support my writing these blogs, please consider getting me a red bull :)

I also want to extend my thanks to all the developers that helped contributed to the Kaleidoscope. Without them, the journey to generate code, as well as the creation of this article, would be ten-fold harder. I realize that we all stands on the shoulder of the previous generations and of giants and I would like to pay my tribute to them.