[ONGOING] What after Kaleidoscope: Error location
8888-08-08
Prologue
Hi everyone, wow, it's been a quick minute since I last updated my blog. How's everyone doing :) I'm still chugging away at life applying to compiler jobs as well as schooling at Cal. Hope everybody's doing great :)
The article is to new people who might be asking:
"Ok, I've done the llvm tutorial already. What's next?" More specifically, the article is about adding primitive support for error reporting to the toy compiler.
As is tradition :) I also wanna share some songs with you. The two songs are Valentine and Into my arms, both by COIN, one of my fav band.
I hope you enjoy :) (both the songs and the blog hahah)
Introduction
Compiler diagnostics is an important aspect in one's compiler. From erroring out on the programmer's mistake to providing warnings, the ability to accurately locate the location of the pesky problems to said programmer is proven to be very valuable.
The article then discusses a way to improve upon the llvm Kaleidoscope toy compiler in the diagnostic aspect to the user.
Tokens (and the lexer)
Let's start with the tokens and the lexer. When given the source code as a string (maybe via a file), the lexer scans from left to right on that string, each time updating the index on that input string and returning a token. This is the basis of our class Location.
class Location {
size_t internal_index = 0;
// todo, Jasmine please doooo thissss
// override ++ operator
// override -- operator
// override == operator
// override | (to combine multiple Location)
// int overload
};
Suppose our tokens are modeled as:
class Token {
public:
TokenType type;
std::string lexeme;
Location location;
Token(TokenType type, std::string lexeme, Location location)
: type(type), lexeme(std::move(lexeme)), location(location) {}
};
AST nodes (and the parser)
After we've produced our tokens, the parser takes in the stream of tokens and produces our AST. One of the question is: "If something went wrong, how do we notify programmers?"
The simple thing to do is to combine all the AST's token's location into the AST's location. Below shows a pseudo code for that
curr_tok_location = first token location of AST
For all tokens T that made up the AST:
curr_tok_location |= T.location
AST.Location = curr_tok_location
Reporting
Nice, now that I'm listening to a true crime show on Netflix, writing this part, I feel a little bit hype and scared hahah.
You'll probably wondering why we're having these location and haven't done anything yet.