Before we start, here are the previous parts: Part 1 and Part 2.
Commits since last time
My last article was published on October 11th, so let's check the commits since then:
$ git log --after=2023-09-11 | grep -i "\[clang\]\[Interp\]" | wc -l
$ 717
$ git log --after=2023-09-11 | grep -i "\[clang\]\[bytecode\]" | wc -l
$ 57
The second count is needed because I renamed the directory the constant interpreter lives in from Interp/
to ByteCode/
and the new convention is to tag commits with [clang][bytecode]
.
The 774 commits in the last year is quite a step up from the 308 commits from the previous year. We agreed to switch to a post-commit review system for the new constant-interpreter, so I can be more productive without clogging up review queues upstream.
Detailed changes
This year, Yihan Wang contributed several commits, including the support for binary and unary operations on vectors.
The full list of changes includes:
Support for variadic functions.
Support for more builtin functions:
__builtin_parity
,__builtin_clrsb
,__builtin_bitreverse
,__builtin_classify_type
,__builtin_expect
,__builtin_rotate{right,left}
,__builtin_ffs
,__builtin_addressof
,__builtin_move
(etc.),__builtin_eh_return_data_regno
,__builtin_launder
, various overflow and carry builtins,__builtin_clz
,__builtin_ctz
,__builtin_bswap
,__builtin_vectorelements
,__builtin_is_aligned
,__builtin_align_up
,__builtin_align_down
,__builtin_os_log_format_buffer_size
,__builtin_convertvector
,__builtin_shufflevector
,__builtin_sycl_unique_stable_name
,__noop
, etc.Support for
_Complex
types, including multiplication, division, and support for integral complex types.Support for C++23
[[assume]]
.Support for
__datasizeof
andvec_step
Nullability attributes on function parameters.
Support for vectors: initializing from various expression types, unary operators, comparison operators.
Support for functions with explicit instance parameters.
Support for unions, including changing the active member.
Support for member pointers.
Support for statement expressions.
Improved support for virtual base classes.
Dynamic memory allocation, i.e.,
new
anddelete
as well as__builtin_operator_new
and__builtin_operator_delete
.Support for virtual function calls with covariant return types.
Support for ObjectiveC blocks.
Not listed are tons of correctness fixes, improvements to diagnostic output, and support for lots of new expression and statement types.
Wider testing
The bytecode interpreter has its own set of tests located in test/AST/ByteCode/
. Those tests usually run the same file with the current interpreter and the new bytecode interpreter and compare their outputs. Ideally, they output the same diagnostics and accept or reject the same constant expressions.
In early 2024, I started to additionally run the entire clang test suite (i.e. ninja check-clang)
with the bytecode interpreter. I'm tracking the results over time here. See Figure 1.
This results in much better testing coverage. Not only do I have a better overview of the overall state of the bytecode interpreter, but I also run every commit I make through the entire clang testsuite first and check for unexpected regressions.
Unfortunately some of the test files (especially test/SemaCXX/constant-expression-cxx{11,14, ...}
) are rather big and test a ton of things, but they still only count as one test case in the graph above. For example, test/SemaCXX/constant-expression-cxx11.cpp
currently fails and generates 143 errors. Fixing these bigger files is a lot of work and only progresses slowly.
Next steps
Apart from the known bugs, there are a few bigger things still missing in the bytecode interpreter:
Support for fixed point types
Support for
__builtin_bit_cast.
I have an old PR for this, which unfortunately went nowhere since it was requested to support bit casts of bit fields as well. This is something the current interpreter does not support. I plan on implementing this by copying the contents of aPointer
to a byte buffer, and use the same code path to implement__builtin_memcmp
, etc.Support for array fillers. When allocating large arrays with mostly or all default-constructed elements, it is useful to only allocate the elements that are actually used.
Support for typeid pointers
Almost all points in the list above are a lot of work, some probably require bigger refactorings to the internals of the bytecode interpreter.