troubleshooting red hat process automation manager featured image

LLVM is a collection of compiler toolchain technologies that underpins many modern programming languages, including Rust, Julia, and Swift. The LLVM community is welcoming to new contributors and a great place to get into compiler development.

This article is a fairly detailed guide for LLVM contributions. Generally speaking, it's fine to just put up a patch, and somebody will be happy to guide you through the more idiosyncratic parts of the LLVM contribution process. However, a lot of the early patch review feedback tends to be the same, and this article should help you avoid some of the common issues.

Setting up the development environment

To build LLVM, you will need a recent C++ compiler and CMake. To run the tests, you will additionally need Python. While not strictly necessary, it is a good idea to also install Ccache, Ninja, and LLD to reduce compile times.

After cloning the llvm-project repository, you need to run CMake to set up your build configuration. While there is extensive documentation of available CMake options, the configuration I recommend for most purposes looks something like this:

cmake -GNinja -Bbuild -Hllvm \
    -DLLVM_ENABLE_PROJECTS="clang" \
    -DLLVM_TARGETS_TO_BUILD="all" \
    -DCMAKE_BUILD_TYPE=Release \
    -DLLVM_ENABLE_ASSERTIONS=true \
    -DLLVM_CCACHE_BUILD=true \
    -DLLVM_USE_LINKER=lld

The two most important options are setting the build type to Release and enabling assertions. I would recommend against using a debug build to develop LLVM, because this requires a lot of memory to link and a lot of disk space to store, and will make running tests slower. In most cases, you will get better mileage out of using debug messages (available through the -debug flag) than running LLVM under GDB.

The LLVM build is very resource intensive: If you are on a low-powered machine with few cores, a full build may well take hours. There are two easy ways to reduce build times. The first is to leave the LLVM_ENABLE_PROJECTS option empty (or omit it entirely). While building Clang is useful to investigate issues reported as C/C++ code, it is not necessary to work on LLVM itself.

The second way to reduce build time is to set LLVM_TARGET_TO_BUILD=X86 (or whatever your native target is) to reduce the number of codegen backends being built. The main risk of doing this is that parts of the test suite will be skipped. This is usually fine when working on target-independent passes like InstCombine.

Once LLVM is configured, some useful commands for building and testing are:

# Build LLVM
ninja -Cbuild

# Run all LLVM tests
ninja -Cbuild check-llvm

# Run tests in a specific directory.
# -v will print additional information for failures.
build/bin/llvm-lit -v llvm/test/Transforms/InstCombine

Installing the built LLVM is generally not necessary for development; you can directly use the binaries inside the build/bin directory.

Finding an issue to resolve

The LLVM project consists of a number of subprojects, including LLVM proper, the Clang compiler, the LLD linker, the libc++ standard library, and many others. Even within LLVM proper, there are different areas. The primary separation is between the middle-end optimizer working on the LLVM intermediate representation (IR), and the back-end, which translates this IR into machine code.

Most of my work is in the middle-end optimizer, so I will focus on that in this article. Even within one area, there are many different ways to contribute—for example, one of the most useful tasks is to reduce miscompilation reports to a minimal reproducer and determine the root cause of the miscompile. Test case reduction is an art in itself though, and what I want to focus on here are code contributions.

What you want to work on ultimately depends on your interests. However, I think that one of the best entry points for new contributors are InstCombine issues, because these can often be solved by adding a simple missing peephole transform. At the same time, the InstCombine pass has some of the most stringent review requirements in the LLVM project, and as such is a good place to learn how things are done "by the book".

The good first issue label provides a wider selection of beginner-friendly issues from different areas, including Clang, Flang, and various tools like clang-tidy or clang-format.

As an example for this article, I will use the transform implemented in D124763, motivated by a strict-provenance optimization issue in Rust. (Generally, Rust I-slow issues are a great place to find missing LLVM transforms as well.)

Proving the transform correct

The basic transform we are interested in is folding A + (B & ~A) into A | (B & ~A), which can then be folded to just A | B. The left hand side of the addition only has bits from A, while the right hand side only has bits from ~A, so there is no overlap. This means that there is no "carry" and the addition becomes the same as a bitwise or (or bitwise xor, but this is less useful).

We can gain more confidence in the correctness of the transform by using the alive2 proof assistant, which is also available online. All you need to do is specify a source and target function as LLVM IR, and it will verify whether the transform is correct (online):

define i8 @src(i8 %a, i8 %b) {
  %a.not = xor i8 %a, -1
  %and = and i8 %b, %a.not
  %add = add i8 %a, %and
  ret i8 %add
}

define i8 @tgt(i8 %a, i8 %b) {
  %a.not = xor i8 %a, -1
  %and = and i8 %b, %a.not
  %add = or i8 %a, %and
  ret i8 %add
}

This particular proof is unusually simple. Let's look at another proof from a recent InstCombine change (online):

define i8 @src(i8 %x, i8 %y, i8 %z) {
  %xz = shl nsw i8 %x, %z
  %yz = shl nsw nuw i8 %y, %z
  %d = sdiv i8 %xz, %yz
  ret i8 %d
}

define i8 @tgt(i8 %x, i8 %y, i8 %z) {
  %d = sdiv i8 %x, %y
  ret i8 %d
}

This fold transforms (x << z) sdiv (y << z) into x sdiv y, removing a common shift factor from the signed division. What is interesting about this proof is the presence of the nsw/nuw flags on the left shifts, which indicate that no signed/unsigned overflow/wrapping occurs. You can try removing each of these flags in the online version, and see that a counter-example will be presented in each case, showing why the resulting transform would be incorrect.

The necessity of the nsw flag is intuitive: If the shift has signed overflow, then we might not actually have a common (signed) factor to remove. If we drop the nuw flag from the second shift, then we get the following counter-example (online):

ERROR: Source is more defined than target

Example:
i8 %x = #x80 (128, -128)
i8 %y = #xff (255, -1)
i8 %z = #x04 (4)

Source:
i8 %xz = poison
i8 %yz = #xf0 (240, -16)
i8 %d = poison

Target:
i8 %d = UB triggered!

If %x is -128 (INT_MIN) and %y is -1, then we will convert this into sdiv i8 -128, -1, which is immediate undefined behavior according to the LLVM IR specification. So what the nuw flag actually ensures is that %y is not -1 (or else %z is zero, in which case the input program is already undefined). As such, an alternative correct transform would be to drop the nuw flag, and add a precondition instead (online):

declare void @llvm.assume(i1)

define i8 @src(i8 %x, i8 %y, i8 %z) {
  %precond = icmp ne i8 %y, -1
  call void @llvm.assume(i1 %precond)

  %xz = shl nsw i8 %x, %z
  %yz = shl nsw i8 %y, %z
  %d = sdiv i8 %xz, %yz
  ret i8 %d
}

define i8 @tgt(i8 %x, i8 %y, i8 %z) {
  %d = sdiv i8 %x, %y
  ret i8 %d
}

This also shows a common pattern in alive proofs: Preconditions can be expressed using calls to the llvm.assume intrinsic.

While alive2 is a very important tool for ensuring the correctness of LLVM transforms, it is worth noting that it can produce false negatives (that is, it sometimes claims that an incorrect transform is correct). This usually happens in the context of loop optimizations, and is generally not a problem for InstCombine transforms.

Now that we have convinced ourselves that our transform is correct, let's move towards the implementation stage.

Writing tests

LLVM uses a somewhat unusual process where tests are generally written and committed before a functional change is made. This way, the functional patch only contains diffs of the generated IR or assembly.

Committing the tests ahead of time has one big advantage: It clearly shows what the result prior to the change was. This may not always be obvious, because input IR might be changed into a substantially different form by other transforms, as we will soon see. It actually happens surprisingly often that people try to submit a patch with tests that are already fully optimized without the patch.

For our transform, let's add one test to the llvm/test/Transforms/InstCombine/add.ll file:

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -passes=instcombine -S | FileCheck %s

; … a bunch of existing tests omitted here …

define i8 @add_and_xor(i8 %x, i8 %y) {
; CHECK-LABEL: @add_and_xor(
; CHECK-NEXT: [[XOR:%.*]] = xor i8 [[X:%.*]], -1
; CHECK-NEXT: [[AND:%.*]] = and i8 [[XOR]], [[Y:%.*]]
; CHECK-NEXT: [[ADD:%.*]] = add i8 [[AND]], [[X]]
; CHECK-NEXT: ret i8 [[ADD]]
;
  %xor = xor i8 %x, -1
  %and = and i8 %xor, %y
  %add = add i8 %and, %x
  ret i8 %add
}

While LLVM also has unit tests, the preferred testing approach is text-based, where the result of running optimizations (here via opt -S -passes=instcombine) is then verified using FileCheck. Test execution is driven by lit, which essentially just executes RUN lines in the test file with certain substitutions. For example, %s is replaced by the path of the test file.

FileCheck matches the transform result against the specified CHECK lines. These can use placeholders of the form [[XOR:%.*]] and [[XOR]], where the former will match a regular expression %.*, and the latter will be replaced by the matched text. In the above example, this makes the test robust against changes to variable names.

Thankfully, you don't have to write these CHECK lines by hand; the update_test_checks.py script is used to generate them:

llvm/utils/update_test_checks.py --opt-bin build/bin/opt \
    llvm/test/Transforms/InstCombine/add.ll

Unless the test has very unusual requirements, check lines should always be generated, without manual adjustments. This makes it easy to regenerate them in the future and makes the test pre-commit workflow much more pleasant, because you only need to rerun the script.

There are a number of flags that can be used for certain kinds of tests—the --function-signature and --check-attributes flags for tests checking inter-procedural attribute inference, for example. There are also multiple different scripts that can be used depending on the type of test:

  • update_test_checks.py: Used for middle-end LLVM IR optimization tests.
  • update_llc_test_checks.py: Used for back-end assembly tests.
  • update_mir_test_checks.py: Used for back-end machine IR (MIR) tests.
  • update_analyze_test_checks.py: Used for analysis tests, such as instruction cost models.
  • update_cc_test_checks.py: Used for Clang code generation tests.

Returning to the tests for our transform, you will note that in the preceding example, the input IR and the output IR look the same (without our patch). That's what a good test should look like.

Of course, we need more tests than this! First, we need to check for commuted patterns. The first test checked (~x & y) + x, but (y & ~x) + x, x + (~x & y) and x + (y & ~x) should all be supported as well, because both + and & are commutative operations. As such, we have four "commuted" patterns, each of which needs a test. Let's cover one of them as an example:

define i8 @add_and_xor_commuted1(i8 %x, i8 %y) {
; CHECK-LABEL: @add_and_xor_commuted1(
; CHECK-NEXT: [[XOR:%.*]] = xor i8 [[X:%.*]], -1
; CHECK-NEXT: [[AND:%.*]] = and i8 [[XOR]], [[Y:%.*]]
; CHECK-NEXT: [[ADD:%.*]] = add i8 [[AND]], [[X]]
; CHECK-NEXT: ret i8 [[ADD]]
;
  %xor = xor i8 %x, -1
  %and = and i8 %y, %xor
  %add = add i8 %and, %x
  ret i8 %add
}

The attentive reader will note that in this case the input IR does not match the output IR. In fact, even though we wrote a test case for (y & ~x) + x, it was actually converted into a test for (~x & y) + x, so it effectively becomes the same as our first test! This is part of the reason why we pre-commit tests, as such issues are hard to catch otherwise.

The reason why this happens is complexity-based canonicalization. A function argument has lower complexity than an instruction, and as such is moved to the right of commutative instructions. This is done to make the canonical IR more uniform (e.g., it implies that constant operands will always be on the right-hand side), but works against us when it comes to writing tests. Thankfully, there is a simple trick to avoid this:

define i8 @add_and_xor_commuted1(i8 %x, i8 %_y) {
; CHECK-LABEL: @add_and_xor_commuted1(
; CHECK-NEXT: [[Y:%.*]] = udiv i8 42, [[_Y:%.*]]
; CHECK-NEXT: [[XOR:%.*]] = xor i8 [[X:%.*]], -1
; CHECK-NEXT: [[AND:%.*]] = and i8 [[Y]], [[XOR]]
; CHECK-NEXT: [[ADD:%.*]] = add i8 [[AND]], [[X]]
; CHECK-NEXT: ret i8 [[ADD]]
;
  %y = udiv i8 42, %_y ; thwart complexity-based canonicalization
  %xor = xor i8 %x, -1
  %and = and i8 %y, %xor
  %add = add i8 %and, %x
  ret i8 %add
}

By replacing %y with a high-complexity division instruction, we avoid the operand reordering. This pattern is so common in InstCombine tests that it has its own term of art: thwarting.

The next thing to test are multi-use patterns:

declare void @use(i8)

define i8 @add_and_xor_extra_use(i8 %x, i8 %y) {
; CHECK-LABEL: @add_and_xor_extra_use(
; CHECK-NEXT: [[XOR:%.*]] = xor i8 [[X:%.*]], -1
; CHECK-NEXT: call void @use(i8 [[XOR]])
; CHECK-NEXT: [[AND:%.*]] = and i8 [[XOR]], [[Y:%.*]]
; CHECK-NEXT: call void @use(i8 [[AND]])
; CHECK-NEXT: [[ADD:%.*]] = add i8 [[AND]], [[X]]
; CHECK-NEXT: ret i8 [[ADD]]
;
  %xor = xor i8 %x, -1
  call void @use(i8 %xor)
  %and = and i8 %xor, %y
  call void @use(i8 %and)
  %add = add i8 %and, %x
  ret i8 %add
}

The rule of thumb for InstCombine is that transforms should not increase the number of instructions. There are exceptions to this rule (for example, replacing a udiv/sdiv with a sequence of cheap instructions is fine), but most transforms satisfy this condition.

In our case, we replace a single add instruction with a single or instruction, which can never increase the instruction count. As such, the transform should be performed independently of the number of uses, and we only need a single test that has an additional use of every instruction involved in the pattern.

If our transform instead replaced two old instructions with two new instructions, we would have to impose a one-use restriction, to ensure that both of the old instructions can be deleted. Otherwise, we might end up keeping both one of the old instructions, and inserting two new ones, which is likely not profitable.

Folds are generally expected to handle vectors as well, so we should test that:

define <2 x i8> @add_and_xor_vec(<2 x i8> %x, <2 x i8> %y) {
; CHECK-LABEL: @add_and_xor_vec(
; CHECK-NEXT:    [[XOR:%.*]] = xor <2 x i8> [[X:%.*]], <i8 -1, i8 -1>
; CHECK-NEXT:    [[AND:%.*]] = and <2 x i8> [[XOR]], [[Y:%.*]]
; CHECK-NEXT:    [[ADD:%.*]] = add <2 x i8> [[AND]], [[X]]
; CHECK-NEXT:    ret <2 x i8> [[ADD]]
;
  %xor = xor <2 x i8> %x, <i8 -1, i8 -1>
  %and = and <2 x i8> %xor, %y
  %add = add <2 x i8> %and, %x
  ret <2 x i8> %add
}

The above example uses <i8 -1, i8 -1>, which is a so-called splat vector where all elements are the same. Such cases are particularly easy to handle, and usually are covered by LLVM's pattern matching infrastructure. A minor variation on this are splat vectors where some elements are poison:

define <2 x i8> @add_and_xor_vec_poison(<2 x i8> %x, <2 x i8> %y) {
; CHECK-LABEL: @add_and_xor_vec_poison(
; CHECK-NEXT:    [[XOR:%.*]] = xor <2 x i8> [[X:%.*]], <i8 -1, i8 poison>
; CHECK-NEXT:    [[AND:%.*]] = and <2 x i8> [[XOR]], [[Y:%.*]]
; CHECK-NEXT:    [[ADD:%.*]] = add <2 x i8> [[AND]], [[X]]
; CHECK-NEXT:    ret <2 x i8> [[ADD]]
;
  %xor = xor <2 x i8> %x, <i8 -1, i8 poison>
  %and = and <2 x i8> %xor, %y
  %add = add <2 x i8> %and, %x
  ret <2 x i8> %add
}

Such poison values may be introduced by demanded elements analysis, which determines that certain lanes of the vector are unused. Splat values with poison elements are often, but not always, handled automatically as well.

Finally, there is the general case of non-splat vectors like <i8 -1, i8 -2>. Handling these wouldn't make sense in this context (because we require a bitwise not, which is a xor by -1), but other transforms could support them. However, this usually requires going out of your way to support such vectors, and there's generally no expectation that people actually do so. It's still good to add tests with a FIXME comment though.

Commuted, multi-use and vector tests are standard test categories that apply to many transforms. Additionally, one would want to cover various edge cases, which tend to be more specific. For example, a transform that inspects bit-widths is worth testing with i1 types, as well as "weird" types like i17. Transforms of loads and stores should test non-byte-sized types, as well as scalable vectors, etc.

For our example transform, an edge case worth testing is what happens when %y is replaced by a constant:

define i8 @add_and_xor_const(i8 %x) {
; CHECK-LABEL: @add_and_xor_const(
; CHECK-NEXT:    [[XOR:%.*]] = and i8 [[X:%.*]], 42
; CHECK-NEXT:    [[AND:%.*]] = xor i8 [[XOR]], 42
; CHECK-NEXT:    [[ADD:%.*]] = add i8 [[AND]], [[X]]
; CHECK-NEXT:    ret i8 [[ADD]]
;
  %xor = xor i8 %x, -1
  %and = and i8 %xor, 42
  %add = add i8 %and, %x
  ret i8 %add
}

It's good that we tested this, because once again the input IR does not match the output IR. A different InstCombine fold canonicalizes ~X & C to (X & C) ^ C, because this is a more favorable representation for certain other transforms.

This is bad news for us, because it means we have to handle an additional pattern. We should first use alive2 to make sure that it is correct as well (online):

define i8 @src(i8 %x, i8 %y) {
  %and = and i8 %x, %y
  %xor = xor i8 %and, %y
  %add = add i8 %x, %xor
  ret i8 %add
}

define i8 @tgt(i8 %x, i8 %y) {
  %and = and i8 %x, %y
  %xor = xor i8 %and, %y
  %add = or i8 %x, %xor
  ret i8 %add
}

And then we get to repeat all our testing work for this new pattern. This time there are three commutative operations involved, so we need eight commuted tests, plus multi-use tests, vector tests, and anything else we can think of. I'll refrain from covering these in detail here.

After writing so many tests, surely we are done now? Unfortunately, we made a very common mistake here and missed the most important type of test: Negative tests. People tend to focus on testing cases where the transform should trigger, and fail to test cases where it should not. Generally speaking, we need one negative test for every condition in the transform, such that each negative test makes exactly one condition fail.

For our fold, we should make sure that the xor constant is correct (so it is actually a bitwise not):

define i8 @add_and_xor_wrong_const(i8 %x, i8 %y) {
; CHECK-LABEL: @add_and_xor_wrong_const(
; CHECK-NEXT: [[XOR:%.*]] = xor i8 [[X:%.*]], -2
; CHECK-NEXT: [[AND:%.*]] = and i8 [[XOR]], [[Y:%.*]]
; CHECK-NEXT: [[ADD:%.*]] = add i8 [[AND]], [[X]]
; CHECK-NEXT: ret i8 [[ADD]]
;
  %xor = xor i8 %x, -2
  %and = and i8 %xor, %y
  %add = add i8 %and, %x
  ret i8 %add
}

And we should make sure that the add and xor use the same value:

define i8 @add_and_xor_wrong_op(i8 %x, i8 %y, i8 %z) {
; CHECK-LABEL: @add_and_xor_wrong_op(
; CHECK-NEXT: [[XOR:%.*]] = xor i8 [[Z:%.*]], -1
; CHECK-NEXT: [[AND:%.*]] = and i8 [[XOR]], [[Y:%.*]]
; CHECK-NEXT: [[ADD:%.*]] = add i8 [[AND]], [[X:%.*]]
; CHECK-NEXT: ret i8 [[ADD]]
;
  %xor = xor i8 %z, -1
  %and = and i8 %xor, %y
  %add = add i8 %and, %x
  ret i8 %add
}

We could also add tests where we use the wrong kind of instruction, but those are often not added—we tend to trust our pattern matching that far.

Implementing the transform

Now that we're done writing the tests, implementing the transform itself is downright simple. In the llvm::haveNoCommonBitsSet() function in ValueTracking.cpp, we add the following code:

// X op (Y & ~X)
if (match(RHS, m_c_And(m_Not(m_Specific(LHS)), m_Value())) ||
    match(LHS, m_c_And(m_Not(m_Specific(RHS)), m_Value())))
  return true;

This uses the PatternMatch infrastructure, which provides a concise way to match instruction trees. Here, we use m_c_And, where the c is for commutative, so it implicitly handles the case with swapped operands. m_Not is a helper to match the xor -1 pattern, and will also handle things like splat vectors with poison values. m_Specific requires matching a specific value, while m_Value (without arguments) is a placeholder that accepts anything.

The second pattern is slightly more complicated, but similar in structure:

// X op ((X & Y) ^ Y) -- this is the canonical form of the previous pattern
// for constant Y.
Value *Y;
if (match(RHS,
          m_c_Xor(m_c_And(m_Specific(LHS), m_Value(Y)), m_Deferred(Y))) ||
    match(LHS, m_c_Xor(m_c_And(m_Specific(RHS), m_Value(Y)), m_Deferred(Y))))
  return true;

New here is m_Value(Y) with an argument, which will store the matched value into the Y variable. m_Deferred(Y) is similar to m_Specific(), but can be used when the value is captured in the same match as it is used. If we used m_Specific() here, we would try to use the original (uninitialized) value of Y.

Interestingly, while we want to ultimately influence InstCombine behavior, the code change ended up in ValueTracking. The actual InstCombine fold that will ultimately make use of it are these two lines in the visitAdd() function of InstCombineAddSub.cpp:

// A+B -> A|B iff A and B have no bits set in common.
if (haveNoCommonBitsSet(LHS, RHS, DL, &AC, &I, &DT))
  return BinaryOperator::CreateOr(LHS, RHS);

As you can see, this is a generic fold from add to or if operands don't have common bits. Our fold is just a very special case of this.

This is a good chance to discuss the general layering of LLVM's peephole optimizations. While InstCombine is the main entry point for these, it will call into other layers. The most important participants are:

  • ValueTracking: Provides common analysis code. It can answer predicates like "is this value known non-zero?" or "do these values have no common bits?" and determine facts like "bit N of this value is always 1".
  • ConstantFolding: Folds instructions with only constant operands into other constants.
  • InstSimplify: Folds instructions into existing values or constants, but never creates new instructions. For example, X | 0 can be folded to X.
  • InstCombine: Performs folds that need to modify or create new instructions.

The first three are considered analysis passes and are used by many non-InstCombine transformation passes as well. As such, folds implemented there will also benefit other components.

There is a lot more we could talk about here, but it's not really possible to more than scratch the surface. This is the point where you dive into the codebase and try to figure out how to best implement a specific change.

After implementing the transform, we need to rerun update_test_checks.py on our test, and make sure that all tests folded (or did not fold) as expected. We should also run all the other InstCombine tests, as it is fairly common that some other tests may be affected:

build/bin/llvm-lit llvm/test/Transforms/InstCombine

If there are affected tests, they can be updated using update_test_checks.py as well. To be extra sure, we can also run all tests using ninja -Cbuild check-llvm, but for people on lower-end hardware, we may as well make use of pre-commit CI instead.

Posting the patch for review

At this point, you should have two commits. The first one contains the new tests with baseline check lines, and the second one contains the transform with updated check lines. Contributors with commit access can directly commit the tests, as test changes generally do not require review.

For everyone else, it is necessary to create a stacked review on LLVM's Phabricator instance, which also has some dedicated documentation (there are plans to move to GitHub pull requests in the future, but this has not happened as of this writing). While it is possible to use the arc command line tool to interact with Phabricator, people often struggle with it, and I personally prefer to use the web interface instead.

To use the web interface, we first need to create patch files:

git show -U99999 HEAD^ > patch_tests
git show -U99999 > patch_transform

The -U99999 ensures that the patch has full context, which means that it will include the complete file being changed, not just the parts directly surrounding the diff.

The patches can then be uploaded by creating a new differential revision (just paste the patch contents, and otherwise stick to the defaults). Pick a meaningful patch title and summary. For our running examples, the first patch could look like this:

  • Title: [InstCombine] Add tests for A | (B & ~A) -> A fold (NFC)
  • Summary: Tests for an upcoming A | (B & ~A) -> A fold.
  • Reviewers: (see below)

And the second could look like this:

  • Title: [ValueTracking] Determine that A and B & ~A have no common bits
  • Summary: This patch teaches ValueTracking that there are no common bits in the following two cases:
    • A and B & ~A (link to alive2 proof)
    • A and (A & B) ^ B (link to second alive2 proof)
    The second case is necessary, because this is considered to be the canonical form when B is constant. The goal here is to fold A + (B & ~A) to A | (B & ~A), which then folds to just A | B.

    Depends on DNNNNNN (put the ID of the first patch here).
  • Reviewers: (see below)

There are a few bits worth highlighting here:

  • There should be a [Category] tag at the start of the title. Usually, you can just use the name of the file you're modifying. For example, changes to InstCombine are usually tagged with [InstCombine].
  • Patches for non-functional changes like test additions are usually tagged with NFC somewhere in the title.
  • If you have any alive2 proofs, include them in the patch summary.
  • You can use "Depends on DNNNNNN" to create a stacked patch. It's also possible to do this by adding a "child revision" after the fact.

We're now missing only one bit: Reviewers. In LLVM, the patch submitter is responsible for picking out appropriate reviewers. While someone might come along based on the patch title (which makes the category tag so important), you will get the best mileage by specifying appropriate reviewers in the first place.

While LLVM has a CODE_OWNERS.txt file that specifies code owners for different areas, the unfortunate truth is that this file tends to be both outdated and incomplete. A better way to find reviewers is to look at the Git history of the file you're modifying, and add some people who either recently committed to it, or reviewed recently committed differential revisions.

For InstCombine, the primary reviewer to add would be spatel, but you'll find a few other candidates based on history as well (such as me, nikic).

With the patches submitted, it's time to wait for a review. For simple changes like these, someone will usually get around to it quickly. Should you not get a response within a week, send a "ping" comment, and keep sending it once a week. Waiting weeks for a review would be rather unusual for InstCombine, but can happen if you submit changes to an area that nobody has really worked on in a while. Just keep pinging.

Finally, once the patches have been approved, the reviewer will usually assume that you already have commit access, and will let you commit the changes yourself. If this is not the case, then you should follow up with a comment like "I don't have commit access, can you please land this for me? Please use 'Your Name <your@email>' for the commit". The last bit is important because Phabricator loses the authorship information from the patch, and the committer will have to add it back.

If you plan to do any kind of regular contribution to LLVM, it is a good idea to request commit access. The bar for this is very low, so feel free to request this sooner rather than later. The pre-commit workflow for tests is much more convenient if you don't have to create stacked reviews.

Finally, a word on CI: Patches on Phabricator are run through a "pre-merge" test run. Especially if you don't run the full test suite locally, the results may be helpful. Unfortunately, these test runs are somewhat flaky, so if you see failures that don't have an obvious relation to your patch, then you can usually ignore them.

Once the patch is committed, it will be run on a much wider range of "buildbots", which run tests on many different architectures and in many different configurations. These are also rather flaky, so the same applies: If you get a buildbot failure email that doesn't look related to your patch, don't worry about it. If it turns out to be your fault after all, a buildbot owner will let you know.

And that's it!

The LLVM contribution process has some unusual aspects that may be unfamiliar from other open-source projects. Part of it is the use of Phabricator rather than GitHub for reviews, but most of the differences center around a strong focus on correctness, starting with correctness proofs, over the pre-commit workflow for tests, and ending with the often very large ratio between test and code changes.

I hope that this article will be helpful for people who want to get into LLVM development, though I want to reiterate that it's not necessary or expected that you get everything "right" the first time around, and people will be happy to help if you run into trouble. The beginners category on Discourse, as well as the Discord chat, are good places to ask questions.

Last updated: December 21, 2022