Compiler Weekly: Truth
This week I took yet another look at improving the parser logic that I have been using. In the past I tried to use parser-combinators to build up my parsers with the idea that they might be smaller, simpler or better. This time I decided to simply write the parser from scratch.
The program for this week is a logic language. You can give it facts
or ask it queries
. Facts take the form of SOMEONE is FACT
and queries take the form of who is QUERY?
. The program keeps track of all the facts and performs the desired set operations to find the result. You can use and
, or
and not
as well as ()
to build up complex queries.
Lexer
The lexer is very simple. Using logos again we come up with a few tokens but nothing unexpected here:
|
Parser
The parser is much different than my previous attempts.
First off the AST is split into multiple enums rather than just a single one. I found this to be a cleaner way of representing the different levels of the program. At the top we have Statement
. A statement actually does something within the program. It changes state by either setting a fact or printing the results of a query. Below that we have Expression
. Expressions are used to build up queries and can be nested to build something more complex.
The parser itself takes the lexer and starts reading tokens. It is a recursive decent parser. That means that it starts at the top and looks at the next token that comes in. If its a Token::Who
then it calls the query
parser and goes down that path. If its a Token::Identifier
it call the fact
parser and goes down that path. Each path is very similar to the parser combinators we build before but rather than having a data structure that puts the parsers together, we are just using normal function calls.
One thing I would also like to point out is that I discovered the ?
operator in Rust. It makes it super easy to use Result
and Option
in your code. Rather than having to use unwrap
or is_ok
you can just add a ?
and if there was an error rust will return that error.
|
The final parser is about the same size as before. However, error handling is a lot better and its easy to debug and step through what is happening. Overall I am happy with this approach and will probably continue using it.
Runtime
The runtime for this program is kind of simple but interesting none the less. It keeps a HashMap
of all the facts it knows. So if you tell it rust is cool
then it stores cool -> ["rust"]
. Then if you query it with something like who is cool?
, its just a quick lookup into the table. More complex queries take the form of set operations. Union, intersect and inverse are used to create and
, or
and not
.
|
Finally the REPL is very simple as well. It reads the input line by line, if there is an error then it prints it, if not it runs the parsed statement in the engine and prints any results.
|
Here is some output of running the program:
|
Conclusion
After this week I am pretty happy with the technology stack that I am using for the compiler frontends now. Logos plus a hand built parser seems to be ideal for me. So next week I will probably look into writing some compiler backends and see if I can get some native code compilation working using LLVM.
26/10/2021