In an earlier post, I talked about GCC (the GNU Compiler Collection), and work that I did to make the internals of GCC more robust for GCC 5.

gnu logo

This post is about something more user-visible: as of GCC 5, GCC's code-generation backend can now be built as a shared library.

When might you want to generate machine code? The primary reason is for speed: anything that parses an input format and repeatedly acts on it, such as language interpreter, or a regular expression engine, might benefit from doing some work up-front to translate the input (or some part of it) directly into machine code.

The new library in GCC 5 is named libgccjit, since it can be used to implement Just-In-Time compilation within a language interpreter. Although I primarily had JIT-compilation in mind when writing it, the library can also write the code it generates out to disk as an executable, allowing you to build more conventional ahead-of-time compilers.

As the author, I may be biased, but I believe libgccjit makes it relatively easy to build a compiler, so, as a challenge, let's see if we can write a compiler in one blog post...

To make it easy, let's construct a compiler for an esoteric programming language that we shall refer to as “brainf”.

Here's what a simple .b script looks like:

[
  Emit the uppercase alphabet
]

cell 0 = 26
++++++++++++++++++++++++++

cell 1 = 65
>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<

while cell#0 != 0
[
 >
 .      emit cell#1
 +      increment cell@1
 <-     decrement cell@0
]

This example makes use of whitespace and comments for legibility, but could have been written as:

++++++++++++++++++++++++++
>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<
[>.+<-]

Clearly it's not a particularly useful language, except for providing compiler-writers with something that's easy to parse.

brainf scripts operate on an array of bytes, with a notional data pointer within the array. The operations are:

Character Meaning
> idx += 1
< idx -= 1
+ data[idx] += 1
- data[idx] -= 1
. output (data[idx])
, data[idx] = input ()
[ loop until data[idx] == 0
] end of loop
Anything else ignored

The libgccjit API is currently accessible in various ways:

To keep things as simple as possible, we'll write our compiler in Python.

We use the “gccjit” Python module to call into libgccjit, to populate a gccjit.Context, building a function named “func” in gccjit's internal representation. We then compile the gccjit.Context, either to an in-process machine code function, or to an executable on disk.

class Paren:
    def __init__(self, b_test, b_body, b_after):
        self.b_test = b_test
        self.b_body = b_body
        self.b_after = b_after

class CompileError(Exception):
    def __init__(self, compiler, msg):
        self.filename = compiler.filename
        self.line = compiler.line
        self.column = compiler.column
        self.msg = msg

    def __str__(self):
        return ("%s:%i:%i: %s"
                % (self.filename, self.line, self.column, self.msg))

class Compiler:
    def __init__(self):
        self.ctxt = gccjit.Context()
        if 1:
            self.ctxt.set_int_option(gccjit.IntOption.OPTIMIZATION_LEVEL,
                                     3);
            self.ctxt.set_bool_option(gccjit.BoolOption.DUMP_INITIAL_GIMPLE,
                                      0);
            self.ctxt.set_bool_option(gccjit.BoolOption.DUMP_GENERATED_CODE,
                                      0);
            self.ctxt.set_bool_option(gccjit.BoolOption.DEBUGINFO,
                                      1);
            self.ctxt.set_bool_option(gccjit.BoolOption.DUMP_EVERYTHING,
                                      0);
            self.ctxt.set_bool_option(gccjit.BoolOption.KEEP_INTERMEDIATES,
                                      0);
        self.void_type = self.ctxt.get_type(gccjit.TypeKind.VOID)
        self.int_type = self.ctxt.get_type(gccjit.TypeKind.INT)
        self.byte_type = self.ctxt.get_type(gccjit.TypeKind.UNSIGNED_CHAR)
        self.array_type = self.ctxt.new_array_type(self.byte_type,
                                                   30000)
        self.func_getchar = (
            self.ctxt.new_function(gccjit.FunctionKind.IMPORTED,
                                   self.int_type,
                                   b"getchar", []))
        self.func_putchar = (
            self.ctxt.new_function(gccjit.FunctionKind.IMPORTED,
                                   self.void_type,
                                   b"putchar",
                                   [self.ctxt.new_param(self.int_type,
                                                        b"c")]))
        self.func = self.ctxt.new_function(gccjit.FunctionKind.EXPORTED,
                                           self.void_type, b'func', [])
        self.curblock = self.func.new_block(b"initial")
        self.int_zero = self.ctxt.zero(self.int_type)
        self.int_one = self.ctxt.one(self.int_type)
        self.byte_zero = self.ctxt.zero(self.byte_type)
        self.byte_one = self.ctxt.one(self.byte_type)
        self.data_cells = self.ctxt.new_global(gccjit.GlobalKind.INTERNAL,
                                               self.array_type,
                                               b"data_cells")
        self.idx = self.func.new_local(self.int_type,
                                       b"idx")

        self.open_parens = []

        self.curblock.add_comment(b"idx = 0;")
        self.curblock.add_assignment(self.idx,
                                     self.int_zero)

    def get_current_data(self, loc):
        """Get 'data_cells[idx]' as an lvalue. """
        return self.ctxt.new_array_access(self.data_cells,
                                          self.idx,
                                          loc)


    def current_data_is_zero(self, loc):
        """Get 'data_cells[idx] == 0' as a boolean rvalue."""
        return self.ctxt.new_comparison(gccjit.Comparison.EQ,
                                        self.get_current_data(loc),
                                        self.byte_zero,
                                        loc)

    def compile_char(self, ch):
        """Compile one bf character."""
        loc = self.ctxt.new_location(self.filename,
                                     self.line,
                                     self.column)

        # Turn this on to trace execution, by injecting putchar()
        # of each source char.
        if 0:
            arg = self.ctxt.new_rvalue_from_int (self.int_type,
                                                 ch)
            call = self.ctxt.new_call (self.func_putchar,
                                       [arg],
                                       loc)
            self.curblock.add_eval (call, loc)

        if ch == '>':
            self.curblock.add_comment(b"'>': idx += 1;", loc)
            self.curblock.add_assignment_op(self.idx,
                                            gccjit.BinaryOp.PLUS,
                                            self.int_one,
                                            loc)
        elif ch == '<':
            self.curblock.add_comment(b"'<': idx -= 1;", loc)
            self.curblock.add_assignment_op(self.idx,
                                            gccjit.BinaryOp.MINUS,
                                            self.int_one,
                                            loc)
        elif ch == '+':
            self.curblock.add_comment(b"'+': data[idx] += 1;", loc)
            self.curblock.add_assignment_op(self.get_current_data (loc),
                                            gccjit.BinaryOp.PLUS,
                                            self.byte_one,
                                            loc)
        elif ch == '-':
            self.curblock.add_comment(b"'-': data[idx] -= 1;", loc)
            self.curblock.add_assignment_op(self.get_current_data(loc),
                                            gccjit.BinaryOp.MINUS,
                                            self.byte_one,
                                            loc)
        elif ch == '.':
            arg = self.ctxt.new_cast(self.get_current_data(loc),
                                     self.int_type,
                                     loc)
            call = self.ctxt.new_call(self.func_putchar,
                                      [arg],
                                      loc)
            self.curblock.add_comment(b"'.': putchar ((int)data[idx]);",
                                      loc)
            self.curblock.add_eval(call, loc)
        elif ch == ',':
            call = self.ctxt.new_call(self.func_getchar, [], loc)
            self.curblock.add_comment(b"',': data[idx] = (unsigned char)getchar ();",
                                      loc)
            self.curblock.add_assignment(self.get_current_data(loc),
                                         self.ctxt.new_cast(call,
                                                            self.byte_type,
                                                            loc),
                                         loc)
        elif ch == '[':
            loop_test = self.func.new_block()
            on_zero = self.func.new_block()
            on_non_zero = self.func.new_block()

            self.curblock.end_with_jump(loop_test, loc)

            loop_test.add_comment(b"'['", loc)
            loop_test.end_with_conditional(self.current_data_is_zero(loc),
                                           on_zero,
                                           on_non_zero,
                                           loc)
            self.open_parens.append(Paren(loop_test, on_non_zero, on_zero))
            self.curblock = on_non_zero;
        elif ch == ']':
            self.curblock.add_comment(b"']'", loc)
            if not self.open_parens:
                raise CompileError(self, "mismatching parens")
            paren = self.open_parens.pop()
            self.curblock.end_with_jump(paren.b_test)
            self.curblock = paren.b_after
        elif ch == 'n':
            self.line +=1;
            self.column = 0;

        if ch != 'n':
            self.column += 1


    def parse_into_ctxt(self, filename):
        """
        Parse the given .bf file into the gccjit.Context, containing a
        single function "func".
        """
        self.filename = filename;
        self.line = 1
        self.column = 0
        with open(filename) as f_in:
            for ch in f_in.read():
                self.compile_char(ch)
        self.curblock.end_with_void_return()

    # Compiling to an executable

    def compile_to_file(self, output_path):
        # Wrap "func" up in a "main" function
        mainfunc, argv, argv = gccjit.make_main(self.ctxt)
        block = mainfunc.new_block()
        block.add_eval(self.ctxt.new_call(self.func, []))
        block.end_with_return(self.int_zero)
        self.ctxt.compile_to_file(gccjit.OutputKind.EXECUTABLE,
                                  output_path)

    # Running the generated code in-process

    def run(self):
        import ctypes
        result = self.ctxt.compile()
        py_func_type = ctypes.CFUNCTYPE(None)
        py_func = py_func_type(result.get_code(b'func'))
        py_func()

# Entrypoint

def main(argv):
    from optparse import OptionParser
    parser = OptionParser()
    parser.add_option("-o", "--output", dest="outputfile",
                      help="compile to FILE", metavar="FILE")
    (options, args) = parser.parse_args()
    if len(args) != 1:
        raise ValueError('No input file')
    inputfile = args[0]
    c = Compiler()
    c.parse_into_ctxt(inputfile)
    if options.outputfile:
        c.compile_to_file(options.outputfile)
    else:
        c.run()

if __name__ == '__main__':
    try:
        main(sys.argv)
    except Exception as exc:
        print(exc)
        sys.exit(1)

The above script is a brainf-to-machine-code compiler.

We can use it to compile a .b script and run it in-process. This is a similar to how a language interpreter or virtual machine might convert frequently-executed functions into machine code:

$ python bf.py 
     emit-alphabet.b
ABCDEFGHIJKLMNOPQRSTUVWXYZ

Success!

As well, as this Just-In-Time compilation use-case, we can also use it to compile .bf files into machine code executables:

$ python bf.py 
     emit-alphabet.b 
     -o a.out

which can then be run independently:

$ ./a.out
ABCDEFGHIJKLMNOPQRSTUVWXYZ

and we can inspect them using standard tools.

$ file a.out
a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, not stripped

For example, we can look at a disassembly:

$ objdump -d a.out |less

which shows that libgccjit has managed to optimize the function somewhat (for example, the runs of 26 and 65 increment operations have become integer constants 0x1a and 0x41):

0000000000400620 <func>:
  400620:       80 3d 39 0a 20 00 00    cmpb   $0x0,0x200a39(%rip)        # 601060 <data_cells>
  400627:       74 07                   je     400630 <func+0x10>
  400629:       eb fe                   jmp    400629 <func+0x9>
  40062b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  400630:       48 83 ec 08             sub    $0x8,%rsp
  400634:       0f b6 05 26 0a 20 00    movzbl 0x200a26(%rip),%eax        # 601061 <data_cells+0x1>
  40063b:       c6 05 1e 0a 20 00 1a    movb   $0x1a,0x200a1e(%rip)        # 601060 <data_cells>
  400642:       8d 78 41                lea    0x41(%rax),%edi
  400645:       40 88 3d 15 0a 20 00    mov    %dil,0x200a15(%rip)        # 601061 <data_cells+0x1>
  40064c:       0f 1f 40 00             nopl   0x0(%rax)
  400650:       40 0f b6 ff             movzbl %dil,%edi
  400654:       e8 87 fe ff ff          callq  4004e0 <putchar@plt>
  400659:       0f b6 05 01 0a 20 00    movzbl 0x200a01(%rip),%eax        # 601061 <data_cells+0x1>
  400660:       80 2d f9 09 20 00 01    subb   $0x1,0x2009f9(%rip)        # 601060 <data_cells>
  400667:       8d 78 01                lea    0x1(%rax),%edi
  40066a:       40 88 3d f0 09 20 00    mov    %dil,0x2009f0(%rip)        # 601061 <data_cells+0x1>
  400671:       75 dd                   jne    400650 <func+0x30>
  400673:       48 83 c4 08             add    $0x8,%rsp
  400677:       c3                      retq
  400678:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  40067f:       00

0000000000400680 <main>:
  400680:       48 83 ec 08             sub    $0x8,%rsp
  400684:       e8 97 ff ff ff          callq  400620 <func>
  400689:       31 c0                   xor    %eax,%eax
  40068b:       48 83 c4 08             add    $0x8,%rsp
  40068f:       c3                      retq

We also set up debugging information (via gccjit.Context.new_location and gccjit.BoolOption.DEBUGINFO), so it's possible to use gdb to singlestep through the generated binary and inspect the internal state idx and data_cells:

(gdb) break main
Breakpoint 1 at 0x400790
(gdb) run
Starting program: a.out

Breakpoint 1, 0x0000000000400790 in main (argc=1, argv=0x7fffffffe448)
(gdb) stepi
0x0000000000400797 in main (argc=1, argv=0x7fffffffe448)
(gdb) stepi
0x00000000004007a0 in main (argc=1, argv=0x7fffffffe448)
(gdb) stepi
9     >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<
(gdb) list
4
5     cell 0 = 26
6     ++++++++++++++++++++++++++
7
8     cell 1 = 65
9     >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<
10
11    while cell#0 != 0
12    [
13     >
(gdb) n
6     ++++++++++++++++++++++++++
(gdb) n
9     >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<
(gdb) p idx
$1 = 1
(gdb) p data_cells
$2 = "32", '00' <repeats 29998 times>
(gdb) p data_cells[0]
$3 = 26 '32'
(gdb) p data_cells[1]
$4 = 0 '00'
(gdb) list
4
5     cell 0 = 26
6     ++++++++++++++++++++++++++
7
8     cell 1 = 65
9     >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++<
10
11    while cell#0 != 0
12    [
13     >

Let's try another brainf script (by Erik Bosman):

$ time python examples/bf.py mandelbrot.b
AAAAAAAAAAAAAAAABBBBBBBBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDEGFFEEEEDDDDDDCCCCCCCCCBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
AAAAAAAAAAAAAAABBBBBBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDEEEFGIIGFFEEEDDDDDDDDCCCCCCCCCBBBBBBBBBBBBBBBBBBBBBBBBBB
AAAAAAAAAAAAABBBBBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDEEEEFFFI KHGGGHGEDDDDDDDDDCCCCCCCCCBBBBBBBBBBBBBBBBBBBBBBB
AAAAAAAAAAAABBBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDEEEEEFFGHIMTKLZOGFEEDDDDDDDDDCCCCCCCCCBBBBBBBBBBBBBBBBBBBBB
AAAAAAAAAAABBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDEEEEEEFGGHHIKPPKIHGFFEEEDDDDDDDDDCCCCCCCCCCBBBBBBBBBBBBBBBBBB
AAAAAAAAAABBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDDEEEEEEFFGHIJKS  X KHHGFEEEEEDDDDDDDDDCCCCCCCCCCBBBBBBBBBBBBBBBB
AAAAAAAAABBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDDEEEEEEFFGQPUVOTY   ZQL[MHFEEEEEEEDDDDDDDCCCCCCCCCCCBBBBBBBBBBBBBB
AAAAAAAABBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDDEEEEEFFFFFGGHJLZ         UKHGFFEEEEEEEEDDDDDCCCCCCCCCCCCBBBBBBBBBBBB
AAAAAAABBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDEEEEFFFFFFGGGGHIKP           KHHGGFFFFEEEEEEDDDDDCCCCCCCCCCCBBBBBBBBBBB
AAAAAAABBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDEEEEEFGGHIIHHHHHIIIJKMR        VMKJIHHHGFFFFFFGSGEDDDDCCCCCCCCCCCCBBBBBBBBB
AAAAAABBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDEEEEEEFFGHK   MKJIJO  N R  X      YUSR PLV LHHHGGHIOJGFEDDDCCCCCCCCCCCCBBBBBBBB
AAAAABBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDEEEEEEEEEFFFFGH O    TN S                       NKJKR LLQMNHEEDDDCCCCCCCCCCCCBBBBBBB
AAAAABBCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDEEEEEEEEEEEEFFFFFGHHIN                                 Q     UMWGEEEDDDCCCCCCCCCCCCBBBBBB
AAAABBCCCCCCCCCCCCCCCCCCCCCCCCCDDDDEEEEEEEEEEEEEEEFFFFFFGHIJKLOT                                     [JGFFEEEDDCCCCCCCCCCCCCBBBBB
AAAABCCCCCCCCCCCCCCCCCCCCCCDDDDEEEEEEEEEEEEEEEEFFFFFFGGHYV RQU                                     QMJHGGFEEEDDDCCCCCCCCCCCCCBBBB
AAABCCCCCCCCCCCCCCCCCDDDDDDDEEFJIHFFFFFFFFFFFFFFGGGGGGHIJN                                            JHHGFEEDDDDCCCCCCCCCCCCCBBB
AAABCCCCCCCCCCCDDDDDDDDDDEEEEFFHLKHHGGGGHHMJHGGGGGGHHHIKRR                                           UQ L HFEDDDDCCCCCCCCCCCCCCBB
AABCCCCCCCCDDDDDDDDDDDEEEEEEFFFHKQMRKNJIJLVS JJKIIIIIIJLR                                               YNHFEDDDDDCCCCCCCCCCCCCBB
AABCCCCCDDDDDDDDDDDDEEEEEEEFFGGHIJKOU  O O   PR LLJJJKL                                                OIHFFEDDDDDCCCCCCCCCCCCCCB
AACCCDDDDDDDDDDDDDEEEEEEEEEFGGGHIJMR              RMLMN                                                 NTFEEDDDDDDCCCCCCCCCCCCCB
AACCDDDDDDDDDDDDEEEEEEEEEFGGGHHKONSZ                QPR                                                NJGFEEDDDDDDCCCCCCCCCCCCCC
ABCDDDDDDDDDDDEEEEEFFFFFGIPJIIJKMQ                   VX                                                 HFFEEDDDDDDCCCCCCCCCCCCCC
ACDDDDDDDDDDEFFFFFFFGGGGHIKZOOPPS                                                                      HGFEEEDDDDDDCCCCCCCCCCCCCC
ADEEEEFFFGHIGGGGGGHHHHIJJLNY                                                                        TJHGFFEEEDDDDDDDCCCCCCCCCCCCC
A                                                                                                 PLJHGGFFEEEDDDDDDDCCCCCCCCCCCCC
ADEEEEFFFGHIGGGGGGHHHHIJJLNY                                                                        TJHGFFEEEDDDDDDDCCCCCCCCCCCCC
ACDDDDDDDDDDEFFFFFFFGGGGHIKZOOPPS                                                                      HGFEEEDDDDDDCCCCCCCCCCCCCC
ABCDDDDDDDDDDDEEEEEFFFFFGIPJIIJKMQ                   VX                                                 HFFEEDDDDDDCCCCCCCCCCCCCC
AACCDDDDDDDDDDDDEEEEEEEEEFGGGHHKONSZ                QPR                                                NJGFEEDDDDDDCCCCCCCCCCCCCC
AACCCDDDDDDDDDDDDDEEEEEEEEEFGGGHIJMR              RMLMN                                                 NTFEEDDDDDDCCCCCCCCCCCCCB
AABCCCCCDDDDDDDDDDDDEEEEEEEFFGGHIJKOU  O O   PR LLJJJKL                                                OIHFFEDDDDDCCCCCCCCCCCCCCB
AABCCCCCCCCDDDDDDDDDDDEEEEEEFFFHKQMRKNJIJLVS JJKIIIIIIJLR                                               YNHFEDDDDDCCCCCCCCCCCCCBB
AAABCCCCCCCCCCCDDDDDDDDDDEEEEFFHLKHHGGGGHHMJHGGGGGGHHHIKRR                                           UQ L HFEDDDDCCCCCCCCCCCCCCBB
AAABCCCCCCCCCCCCCCCCCDDDDDDDEEFJIHFFFFFFFFFFFFFFGGGGGGHIJN                                            JHHGFEEDDDDCCCCCCCCCCCCCBBB
AAAABCCCCCCCCCCCCCCCCCCCCCCDDDDEEEEEEEEEEEEEEEEFFFFFFGGHYV RQU                                     QMJHGGFEEEDDDCCCCCCCCCCCCCBBBB
AAAABBCCCCCCCCCCCCCCCCCCCCCCCCCDDDDEEEEEEEEEEEEEEEFFFFFFGHIJKLOT                                     [JGFFEEEDDCCCCCCCCCCCCCBBBBB
AAAAABBCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDEEEEEEEEEEEEFFFFFGHHIN                                 Q     UMWGEEEDDDCCCCCCCCCCCCBBBBBB
AAAAABBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDEEEEEEEEEFFFFGH O    TN S                       NKJKR LLQMNHEEDDDCCCCCCCCCCCCBBBBBBB
AAAAAABBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDEEEEEEFFGHK   MKJIJO  N R  X      YUSR PLV LHHHGGHIOJGFEDDDCCCCCCCCCCCCBBBBBBBB
AAAAAAABBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDEEEEEFGGHIIHHHHHIIIJKMR        VMKJIHHHGFFFFFFGSGEDDDDCCCCCCCCCCCCBBBBBBBBB
AAAAAAABBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDEEEEFFFFFFGGGGHIKP           KHHGGFFFFEEEEEEDDDDDCCCCCCCCCCCBBBBBBBBBBB
AAAAAAAABBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDDEEEEEFFFFFGGHJLZ         UKHGFFEEEEEEEEDDDDDCCCCCCCCCCCCBBBBBBBBBBBB
AAAAAAAAABBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDDEEEEEEFFGQPUVOTY   ZQL[MHFEEEEEEEDDDDDDDCCCCCCCCCCCBBBBBBBBBBBBBB
AAAAAAAAAABBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDDEEEEEEFFGHIJKS  X KHHGFEEEEEDDDDDDDDDCCCCCCCCCCBBBBBBBBBBBBBBBB
AAAAAAAAAAABBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDEEEEEEFGGHHIKPPKIHGFFEEEDDDDDDDDDCCCCCCCCCCBBBBBBBBBBBBBBBBBB
AAAAAAAAAAAABBBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDEEEEEFFGHIMTKLZOGFEEDDDDDDDDDCCCCCCCCCBBBBBBBBBBBBBBBBBBBBB
AAAAAAAAAAAAABBBBBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDEEEEFFFI KHGGGHGEDDDDDDDDDCCCCCCCCCBBBBBBBBBBBBBBBBBBBBBBB
AAAAAAAAAAAAAAABBBBBBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDEEEFGIIGFFEEEDDDDDDDDCCCCCCCCCBBBBBBBBBBBBBBBBBBBBBBBBBB

real  0m10.639s
user  0m10.458s
sys   0m0.150s

Compiling it to an a.out and measuring timings shows that most of this time is spent optimizing it to machine code, with the resulting code taking about a second and a half to to run:

$ time python examples/bf.py mandelbrot.bf -o a.out

real  0m9.419s
user  0m9.223s
sys   0m0.166s

$ time (./a.out > /dev/null)

real  0m1.470s
user  0m1.465s
sys   0m0.003s

So there you have it, with libgccjit we were able to build an optimizing compiler, with debugger support, in 230 lines of Python.

Trying it out

GCC 5 and libgccjit will be available in the forthcoming Fedora 22 release.

The Python bindings can be downloaded from https://pypi.org/project/gccjit/

The code above can be seen at
https://github.com/davidmalcolm/pygccjit/blob/master/examples/bf.py

Last updated: February 23, 2024