[ONGOING] Extending LLVM's Kaleidoscope: Code-gen edition
8888-08-08
Prologue
Hey everyone, how everyone's doing? I've graduated Berkeley and have been back at my parents' place for a while now, getting ready to start Igalia :) I'm still doing good hahhaa :)
I always thought getting back to OC after being far far away in Berkeley wonderland means that everything's gonna change for me: I won't have to do homework anymore :) I'll go to bed early blabla bla; But here I am writing this blog at 2 AM hahhaha.
Anyways, let's talk business hahah :) This article documents what I've learned about codegen in LLVM through my compiler. Originally the codegen was produced in llvm-ir but i've since retrofitted it in MLIR. This includes basic stack variables, addition, subtraction, etc etc to strings, structs as well as garbage collection.
As is tradition, here are three songs for you by Zedd: Papercut, Addicted To A Memory, and Done With Love. All three songs showcase extremely well the emotional depth of a person in and out of love.
I hope you enjoy the songs (and the blog post as well!) :)
Introduction
Codegen in LLVM is an extremely well-thought-out process. For a simple stack-centric codegen process, the framework users can abstract away SSA form; the alloca process together with mem2reg allows an extremely fast assembly generation even for primitive stack allocations.
In this article, I'll report on the codegen process of sammine-lang, an extension on the kaleidoscope tutorial. Besides code generation for scalar values, functions and control flow, the article also touches on code gen for aggregated data types such as structs :) I hope everyone enjoys :)
The below picture demonstrates sammine's ability to generate code for fibonacci, a classic mathematical problem :) [TODO: show that sammine can indeed code fibonacci and output assembly for it]
Aggregated datatype's also generated with sammine: [TODO: show that sammine can indeed code record]
Codegening
Alloca(tion of the stack) and mem2reg
alloca is one of llvm's instruction that allocates memory on the stack. The users can use load() and store() to store
values into said memory location on the stack. A problem with this approach is that since it's on the stack and not
the registers, operations will be slower than usual. But, LLVM does offer mem2reg that promotes memory references to register
references.
LLVM tutorial uses mem2reg as one of its main component in the chapter for mutable variables and the LLVM performance also mentions said pass in more details:
An alloca instruction can be used to represent a function scoped stack slot, but can also represent dynamic frame expansion.
When representing function scoped variables or locations, placing alloca instructions at the beginning of the entry block should be preferred.
In particular, place them before any call instructions. Call instructions might get inlined and replaced with multiple basic blocks.
The end result is that a following alloca instruction would no longer be in the entry basic block afterward.
The SROA (Scalar Replacement Of Aggregates) and Mem2Reg passes only attempt to eliminate alloca instructions that are in the entry basic block. Given SSA is the canonical form expected by much of the optimizer; if allocas can not be eliminated by Mem2Reg or SROA, the optimizer is likely to be less effective than it could be.Variables
Simple variable codegen (think i64, f64 etc) happens with three operations:
- creation: in var def and the map that keeps it, this done using alloca
- modification: in binary op =, this stores to address of alloca
- read: this loads from address of alloca.
In MLIR, when we emit a variable definition, we emit the alloca for said variable in the beginning block of
the current function that the variable is in. Then we'd set allocas[var_name] = var_alloca.
Suppose we have let a = 5.5 where 5.5 is the floating point expression, after we have codegen'd the ConstantFloatOp(5.5),
we would use mlir::StoreOp to store the mlir::Value of the float op to the alloca.
The modification is described in the case of let a = 5.5 as well; we're just storing a new value into the alloca of a variable
, obtained via the mapping of allocas.
For the read case, it's a bit more interesting :) in sammine, since everything's a value, we'd just read the value at said alloca, and then return that value.
I hope this makes sense :)
Control flow
discuss how alloca makes this easier.
Control flow is a bit fun. We'd still follow the llvm kaleidoscope structure here. Perhaps pseudo code can help illuminate the structure better. One thing to add is since mlir::Block doesn't use phi, the addArgument here is essentially replacing the phi :)
In this setting, we're assuming the type checker has already run to see if the if expr's type is unit() or not.
emitIfExpr(ast):
cond = emitExpr(ast.bool_expr)
hasResult = true if inferred type of if_expr is unit()
thenBlock, elseBlock = two new blocks in current region
mergeBlock = new detached block
if hasResult: mergeBlock.addArgument(convertType(ast.type))
emit condBranch(cond -> thenBlock, else -> elseBlock)
for each (block, body) in [(thenBlock, ast.then), (elseBlock, ast.else)]:
set insertion to block
val = emitBlock(body)
if block not already terminated:
branch to mergeBlock (passing val if hasResult)
if either branch is non-terminated:
attach mergeBlock to region
return mergeBlock.arg[0] if hasResult, else null
else: // both branches terminated (e.g. return)
delete mergeBlock
return nullFunctions
Function, i think is pretty simple. We'll do two passes over each function definitions.
In the first pass, we declare them by creating an mlir::FuncOp in
Extern (or reuse)
Let's now talk about the lowering of printing..
Arrays and bounds checking :)
Closures and 2 different ways to get them working
TODO: Talk about the general structure
End-to-End Example
Generics
Initially I was unsure on where to put this part but I guess I can mention generics in both sections.
In shortness, when we generate code in the codegen phase, there won't be any generics anymore; the generics would've been dissolve away by the type checker. For a generic of type T in function f, as in the following code, the typechecker sets the boolean of is_generic() to true . Then when the type checker sees a call expr of f with a concrete type like i32, it literally creates another AST clone of the function f, called f.i32 (this is called monomorphization).
When the codegen stage comes in, for a function ast, if its generic (via the boolean is_generic), then it won't generate code for it. But if it's monomorphized (non-generic), then it goes ahead and generate the code for it normally :)
I'll have to reserve talking about the type checking aspect of generics here since the topic of the article is about codegen :) But if you'd like to support me in pumping out articles faster, maybe you can buy me a redbull? :)
Turning on optimizations
Epilogue
Remarks
The article, as you can tell, is directed towards new grads and/or beginners in LLVM. If you've benefited from the article or if you'd like to support my writing these blogs, please consider getting me a red bull :)
I also want to extend my thanks to all the developers that helped contributed to the Kaleidoscope. Without them, the journey to generate code, as well as the creation of this article, would be ten-fold harder. I realize that we all stands on the shoulder of the previous generations and of giants and I would like to pay my tribute to them.
Claude
You'll realize that in this edition of extending kalei, a lot of work's been done. This is due to no small parts to Claude, allowing me to loo