Compiler Weekly: Truth
This week I took yet another look at improving the parser logic that I have been using. In the past I tried to use parser-combinators to build up my parsers with the idea that they might be smaller, simpler or better. This time I decided to simply write the parser from scratch.
The program for this week is a logic language. You can give it
facts or ask it
queries. Facts take the form of
SOMEONE is FACT and queries take the form of
who is QUERY?. The program keeps track of all the facts and performs the desired set operations to find the result. You can use
not as well as
() to build up complex queries.
The lexer is very simple. Using logos again we come up with a few tokens but nothing unexpected here:
The parser is much different than my previous attempts.
First off the AST is split into multiple enums rather than just a single one. I found this to be a cleaner way of representing the different levels of the program. At the top we have
Statement. A statement actually does something within the program. It changes state by either setting a fact or printing the results of a query. Below that we have
Expression. Expressions are used to build up queries and can be nested to build something more complex.
The parser itself takes the lexer and starts reading tokens. It is a recursive decent parser. That means that it starts at the top and looks at the next token that comes in. If its a
Token::Who then it calls the
query parser and goes down that path. If its a
Token::Identifier it call the
fact parser and goes down that path. Each path is very similar to the parser combinators we build before but rather than having a data structure that puts the parsers together, we are just using normal function calls.
One thing I would also like to point out is that I discovered the
? operator in Rust. It makes it super easy to use
Option in your code. Rather than having to use
is_ok you can just add a
? and if there was an error rust will return that error.
The final parser is about the same size as before. However, error handling is a lot better and its easy to debug and step through what is happening. Overall I am happy with this approach and will probably continue using it.
The runtime for this program is kind of simple but interesting none the less. It keeps a
HashMap of all the facts it knows. So if you tell it
rust is cool then it stores
cool -> ["rust"]. Then if you query it with something like
who is cool?, its just a quick lookup into the table. More complex queries take the form of set operations. Union, intersect and inverse are used to create
Finally the REPL is very simple as well. It reads the input line by line, if there is an error then it prints it, if not it runs the parsed statement in the engine and prints any results.
Here is some output of running the program:
After this week I am pretty happy with the technology stack that I am using for the compiler frontends now. Logos plus a hand built parser seems to be ideal for me. So next week I will probably look into writing some compiler backends and see if I can get some native code compilation working using LLVM.26/10/2021