Many traditional optimizations in the compiler work from a top-down approach, which starts at the beginning of the program and works toward the bottom. This allows the optimization to see the definition of something before any uses of it, which simplifies most evaluations. It’s also the natural way we process things. In this article, we'll look at a different approach and a new project called Ranger, which attempts to turn this problem upside down.
Let’s first look at the Value Range Propagation (VRP) optimization pass in GCC. We process statements to determine what kinds of ranges we can calculate and perform some optimizations based on those results:
func (unsigned b) { unsigned A; if (B < 10) { A = B + 10; if (A > 20) dead_call (); } }
First, the statement if (B < 10)
is examined. Without knowing anything else about B, we know that, if the condition is true, B must be in the range [0..9] for all uses of B in the then block.
Next, we process the A = B + 10
statement. We know the range for B, so it is now easy to determine that A must have the range of B + 10, which evaluates to [10..19].
When the next statement is considered, if (A > 20)
, we have calculated that A is in the range [10..19] and thus the comparison of (A > 20) must always be false. This allows the pass to remove the call to dead_call()
, knowing it can never be called.
When the top-down approach fails
One problem with this top-down approach is that the order in which things are encountered can affect the results. Consider, for instance, if our code snippet were rearranged:
A = B + 10; if (B < 10) { if (A > 20) dead_call (); }
This program is functionally identical; but, with the top-down approach, we now see the A = B + 10
statement earlier, which is before we know anything useful about the range of B. This means we learn little of use about the range of A at this assignment.
Next, we learn that t range for B in the then block is [0..9], just like last time.
When we finally see the expression if (A > 20)
, we don't know enough about the range of A this time to be sure it is safe to eliminate the call to dead_call()
. Moving the assignment statement earlier in the program has caused us to miss this optimization.
This is one reason why an optimization can sometimes optimize your code and other times not; the order in which statements are seen can affect the results.
Other issues
Another issue with the top-down approach is that we have to calculate a range for everything we see because we don't know if we will need it later or not. In the second example, we never actually needed the range for B, so it was a waste of time to calculate and carry it around. Such is frequently the case.
It can also be prohibitive to always calculate and carry all possible information. To help alleviate this burden, we develop "heuristics," which are basically educated guesses at what might be important. We calculate only the data we think we will need, and if we get it right, all is good! If we guess wrong, well, that is another missed optimization. Bug reports sometimes result in juggling heuristics to try to fix the problem, and sometimes cause another bug report to surface, because now we missed something else!
Reformulating the problem
Over the past 3 years, we have been working on a new compiler project at Red Hat called Ranger. It is an attempt to turn this problem upside down for VRP, and instead of using heuristics or always calculating ranges, we do it on demand. In this approach, we defer any calculations until we actually need a range and teach the compiler how to look back through the statements feeding the expression to come up with the answer. This way we only do work that is required to get the values we need, and nothing extra.
Let's look at our first example again:
if (B < 10) { A = B + 10; if (A > 20) dead_call (); }
We know the first conditional if (B < 10)
could be removed if B has a range of less than 10, so a query is made of what the range of B is. We can't determine anything about B, so we do nothing with it.
Eventually, we look at if (A > 20)
, and:
- Query the range of A.
- Then, Ranger looks at the definition of A and sees it is defined as
B + 10
. - It continues by querying the range of B at that specific point.
- Ranger then looks back and discovers that this code is gated by
if (B < 10)
. - That means we know the range of B is [0..9].
- We return to the definition of A and, as in the first example, determine that A has a range of [10..19].
- Finally, we return back to
if (A > 20)
, reporting the range [10..19] and again determine the call can be removed.
Eliminating ordering issues
An advantage Ranger has is that it doesn't matter how we got to the if (A > 20)
statement. Ranger will still come up with the same answer because it looks back through the instructions that it needs and does not need, depending on the order they are encountered.
Now, let's revisit the second sample:
a = b + 10; if (b < 10) { if (a > 20) dead_call (); }
We'll jump straight to the if (A > 20)
statement again, and:
- Ask the Ranger for the range of A.
- With the ability to look backward, it finds
A = B + 10
. - Along the way back to that defining statement, it also sees this block is gated by
if (B < 10)
, so it also knows B = [0..9] on this path. - We are only asking about the specific value of A at this point in the program, so we can evaluate A as "B + 10", using this gated value of B. This allows us to use B as [0..9] and effectively re-evaluate A in this block.
- We again calculate that A has a range of [10..19].
- This allows us to remove the call to
dead_call()
, just like before.
Moving to production
This simple example shows how we are attempting to remove the need for the top-down analysis order, which helps eliminate the need for heuristics and should result in more consistent optimization results.
Much of the research that has gone into this project has been to control the performance of the on-demand analysis, so that it is not more expensive than the much simpler top-down approach. The Ranger only does work that is actually needed, so we also see some significant time savings in optimization passes that don't need very many ranges. We hope to extend this approach in the future to other optimizations.
This work is live in a current GCC development branch and is now capable of building an entire Fedora distribution. We plan to integrate it with mainstream GCC in the next release, GCC 11.
If you are interested in more details, you can see documentation about the Ranger project, its approach, and our results on the GCC wiki.
Last updated: July 1, 2020