Compiler Weekly: LLVM Backend
All of my compiler projects so far have been focusing on the frontend. I spent a lot of time focusing on parsing languages but the runtime is usually very small. Now it’s time to take a closer look at the backend and build up some new tools.
I chose to work with LLVM because it has some great properties. I wanted a tool that could create native binaries. In my opinion, there is a huge advantage of having native code over using an interpreter or vm. I also know that LLVM has a well supported and known API. It’s a great way to get started building compilers that you know will work because other people have done it.
This week, my project will use LLVM to produce hello world
binaries. There wont be much flexibility but it will show the basics needed to setup your own LLVM compiler.
I set out on this project with the following goals:
- The compiler should be 100% self contained.
- The compiler should not rely on another compiler.
- The compiler should have minimal runtime dependencies.
In summary, running it should be the only step to produce a runnable binary. I refuse to run the results of my compiler through clang
. I will allow using binutils
because thats what other compilers do, so calling ld
is fine.
I started by looking at LLVM bindings for rust. The two big options were llvm-sys
and inkwell
. I chose to use inkwell
because it’s a simple wrapper around llvm-sys
that makes things slightly easier to use. I might have to use llvm-sys
directly for future projects depending on how much tweaking to LLVM I end up doing.
The first goal is getting it to produce LLVM IR that will print hello world. I built this function that borrows the LLVM Context and builds up a function called main
.
|
The main
function goes ahead and passes a reference to the string "hello world!"
to a call to puts
. That will print the text to the screen. That is the minimum that we need for a program. Now the real challenge is turning that LLVM module into a working binary.
The next stage is to create a target machine that can compile that IR into an object file. This is where LLVM will do its optimization passes as well.
|
Finally to pull it all together, we need a way to link this object file with other libraries. Since our code uses main
and puts
we will need to link libc
. Finding the flags needed to do this was not straightforward but there is a trick that I discovered.
Let’s say that we have an assembly file called a.s
that contains roughly the same code that we are generating. Run clang -v a.s
, using the -v
flag tells clang to run in verbose mode. That will print out all of the arguments that it passes to ld
under the hood. Using that technique, I was able to build out the following linker tool:
|
You can see that there are a lot of additional flags that we need to pass to ld
. The process works as follows:
- Create a temporary object file with the result of our LLVM compilation
- Create a new temporary file to store the results
- Get all of the flags required ready to link our object file with
libc
- Run the linker and read the output
Once we have that working we can put it all together:
|
Here we create the LLVM context and use that to create our application module. Then we build up the target machine and use it to compile our module into a x86_64 object file. After creating the object file we need to link it using ld
to produce a runnable binary. Finally we take that binary and write it to a file called a.out
that the user can run. In the end the results look like this:
|
All of that work to create a hello world
binary! Now we have all the pieces in place to build full compilers. They can parse complex grammars and produce native binaries. In the coming weeks we can use these building blocks to create anything we want.