Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Optimize loops with long variables in Java

August 25, 2022
Roland Westrelin
Related topics:
CompilersJava
Related products:
Red Hat build of OpenJDKRed Hat build of QuarkusRed Hat JBoss Enterprise Application Platform

Share:

    The just-in-time (JIT) compiler in OpenJDK improves Java performance through a number of optimizations, particularly in loops. Until recently, many optimizations worked only when the loop index was an int variable. This article shows how the HotSpot virtual machine was upgraded to add the same optimizations for long variables. The article covers particularly out-of-bounds checking (also called range checks).

    Why optimization was added for long variables

    One of the important promises of Java, along with many other modern languages, is to catch out-of-bounds errors, such as when you mistakenly end a loop at array.length instead of at array.length-1. The HotSpot virtual machine eliminates range checks when possible as a performance optimization. As discussed in a previous article, the JIT compiler enables range checks when the compiler is not sure that an index will stay within the bounds of an array.

    The JIT compiler implements range checks as follows:

    for (int i = start; i < stop; i += stride)) {
      if (scale * i + offset >=u array.length) { // range check
        deoptimize();
      }
      // access to element scale * i + offset of array
    }

    In the previous code, >=u is an unsigned comparison, and the deoptimize() function causes execution of the thread to continue in the interpreter where the out of bound exception is thrown.

    Until recent versions of OpenJDK, optimization of range checks worked only if the loop variable i was an int. For a long, range checks would always be performed. Many other optimizations were also unavailable.

    The reason for restricting loop optimization to int indexes was that they were considered the only ones common enough to deserve special treatment. One reason is that loops commonly iterate over arrays. Because the size of a Java array is a 32-bit integer, an int is the natural choice for the loop variable.

    Usages are evolving, though. The Panama Project offers developers a better way to get access to off-heap memory areas. Offsets within memory are 64 bits, so loops that iterate over memory with that API tend to use a long loop variable.

    The lack of proper optimizations for long counted loops initially was such a pain point for the Panama Project that it implemented workarounds in the library code to detect cases where the offsets fit into a 32-bit integer. In such cases, the project included special code that the JIT could better optimize. But once the improvements covered in this article were rolled out, those workarounds became unnecessary and could be removed. The end result is simplified library code with better overall performance.

    The loop shape that needs to be optimized is as follows:

    for (long i = start; i < stop; i += stride)) {
      if (scale * i + offset >=u length) { // range check
        deoptimize();
      }
      // access to memory at offset scale * i + offset
    }

    The discussion in this article assumes a loop that goes upward (that is, an index that increases), along with a positive scale. The project has generalized the optimization to accommodate a decreasing loop and a negative scale as well, but we won't include those cases because they would overcomplicate the discussion.

    Recognizing and optimizing long counted loops

    Teaching the compiler to recognize the new loop shape with a long loop variable instead of an int loop variable is fairly straightforward, but is not sufficient to enable existing optimizations such as range check elimination, loop unrolling, and vectorization. Those optimization passes must be adapted to operate on a long counted loop as well. Doing so is not a simple matter of including long variables, because some of the optimizations have to deal with integer overflow. They were protected from overflow for int variables by promoting some 32-bit integer values to 64-bit integers. Supporting these same optimizations on 64-bit integers would require promotion to the next larger integer type (probably 128 bits), for which the compiler lacks support.

    Is there a way to upgrade the existing optimizations for a long without having to rewrite these optimizations and incur a high risk of introducing bugs?

    The solution we went with started by transforming the long counted loop into a nested loop with an int counted inner loop. The previous example was transformed roughly to:

    for (long i = start; i < stop;) {
      int j;
      for (j = 0; j < min(stop - i, max_int); j += (int)stride) {
        if (scale * (j + i) + offset >=u length) { // range check
          deoptimize();
        }
        // access to memory at offset scale * (j+i) + offset
      }
      i += j;
    }

    Here, max_int represents the largest signed 32-bit integer. The loop variable for the inner loop is also a 32-bit integer. That inner loop has the shape of a counted loop, so it's subject to existing counted loop optimizations such as unrolling. Adding an extra loop has some overhead, but if the original loop executes for a large number of iterations, most of the time should be spent in the inner loop and the cost of setting it up should be negligible.

    Note that this transformation is valid only for stride values that fit into a 32-bit integer, and pays off mostly if stride is a relatively small 32-bit integer. (Otherwise, the inner loop executes only a small number of iterations and the overhead of nested loops is bigger.)

    This extra transformation has the benefit of enabling several existing loop optimizations to long loop counters. But one important optimization still isn't triggered by the transformed loop: Eliminating the range check. Indeed, the range check in the previous loop nest is still expressed in terms of 64-bit integers: j+i and length. The next section discusses how we make sure the range check has a proper shape so that the compiler recognizes it and enables range check optimization.

    A new API point for range checks

    A prerequisite for range check elimination is to make sure the loop has a shape that the compiler can properly optimize. In particular, the range check in the loop has to follow a canonical shape like the following, with an unsigned comparison and a deoptimization if the range check fails:

    if (scale * i + offset >=u length) {
      deoptimize();
    }

    The array accesses in Java include a range check, so the compiler is free to generate the pattern that works best for optimization. In the Panama Project's memory access API, the range check is not built into the language. So the check would have to be performed explicitly by the API in code similar to:

    long o = scale * i + offset;
    if (o >= length || o < 0) {
      throw new SomeException();
    }

    The JIT compiler would then have to recognize this code pattern as a range check, which would be complicated to perform reliably. After all, there would be more than one way to write this logic.

    We have a more robust solution, which involves extending the core Java libraries with a new API point for range checks:

    long o = java.util.Objects.checkIndex(scale * i + offset, length);

    The checkIndex() call already exists for int arguments.

    We have made the JIT compiler aware of the new API function. The compiler can use its own implementation for checkIndex() as long as its behavior appears identical to the Java implementation to outside callers.

    The technique of replacing a standard implementation of a function with one provided by the compiler is called a compiler intrinsic. That implementation can be carefully crafted to allow optimizations.

    The new checkIndex() function and the corresponding underlying intrinsic are available as of JDK 16. Note that the new function is not restricted to the memory access API. It offers a reliable way of performing range checks that can be optimized well by the virtual machine, making the function a valuable addition to the core libraries for all developers.

    Optimizing long-range checks

    So far, we have discussed a transformation of the loop so that it becomes suitable for existing optimizations. In the same spirit, this section discusses how to transform range checks in order to trigger existing range check optimizations for long indexes. We need to reshape the nested loop to:

    for (long i = start; i < stop;) {
      int j;
      int scale' = ..;
      int offset' = ..;
      int length' = ..;
      for (j = 0; j < min(stop - i, max_int); j += (int)stride) {
        if (scale' * j + offset' >=u length') {
          deoptimize();
        }
        // access to memory at offset scale * (j+i) + offset
      }
      i += j;
    }

    The scale', offset', and length' variables end with a prime symbol (') to denote a variable that is derived from or related to another variable. These variables are 32-bit integers that are invariant in the inner loop. Because the range check is expressed as a 32-bit comparison that operates on the loop variable of the inner loop, which is itself a loop with a 32-bit index, existing optimizations are triggered.

    Assuming for instance that loop predication optimizes this loop nest, the result would be roughly:

    for (long i = start; i < stop;) {
      int j;
      int scale' = ..;
      int offset' = ..;
      int length' = ..;
      if (scale' * 0 + offset' >=u length') {
        deoptimize();
      }
      if (scale' * jmax + offset' >=u length') {
        deoptimize();
      }
      for (j = 0; j < min(stop - i, max_int); j += (int)stride) {
        // access to memory at offset scale * (j+i) + offset
      }
      i += j;
    }

    jmax is the largest value j takes in the inner loop for a particular iteration i.

    Assuming, as we did before, that the inner loop runs for a large number of iterations, the range check becomes essentially free.

    scale', offset', and length' have to be derivatives of scale, offset, length, and i, the variables of the initial range check. They are all 64-bit integers. Let's see how to compute them.

    scale' can be set to scale, but only if it fits in a 32-bit integer. Otherwise, there's no way to transform the range check. Another tricky issue here is that scale' * j could overflow the int range. One easy way around that problem is to adjust the inner loop's bounds so that overflow never happens, such as:

    for (long i = start; i < stop;) {
      int j;
      int scale' = scale;
      int offset' = ..;
      int length' = ..;
      for (j = 0; j < min(stop - i, max_int / scale); j += (int)stride) {
        if (scale' * j + offset' >=u length') {
          deoptimize();
        }
        // access to memory at offset scale * (j+i) + offset
      }
      i += j;
    }

    If scale turns out to be a relatively large 32-bit integer, the number of iterations of the inner loop is small and this transformation is unlikely to pay off. The Panama Project expects a small scale. Typically, this variable is the size of some basic data type.

    Let's note the range of values of j for some iteration i of the outer loop as[0, jmax]. Then scale * (i + j) + offset is in the range[scale * i + offset, scale * (i + jmax) + offset] (remember that this discussion assumes a positive scale). Let's call that interval [range_min, range_max]. It's then possible to express offset' and length' in terms of range_min and range_max. In the simplest case (range_min >= 0, no overflow), we set:

    • offset' = 0
    • length' = max(range_min, min(length, range_max+1)) - range_min

    Let's see why that works as expected—that is, why the transformed range check (expressed with j) succeeds whenever the initial range check (expressed with i) succeeds, and fails when the initial check fails.

    We know that j falls within [0, jmax]. Then range has the value scale * (i + j) + offset and range' has the value scale * j + offset'. Using variables we defined earlier, range is in [range_min, range_max] and range = range' + range_min. What happens for various values of length?

    When length is greater than range_max

    In this case, the range check always succeeds. length' is range_max+1 - range_min. The range check becomes:

    range' <u range_max+1 - range_min

    This expression is always true because range is within [range_min, range_max] and so range' is within [0, range_max - range_min].

    When length is less than range_min

    In this case, the range check always fails. length' is 0. The range check becomes range'<u 0, which is always false (no unsigned value can be negative).

    When length is somewhere in [range_min, range_max]

    If length is in [range_min, range_max], the range check succeeds sometimes and fails other times. length' is length - range_min. The range check becomes:

    range' <u length - range_min

    which is equivalent to:

    0 <= range' < length - range_min
    

    or

    range_min <= range' + range_min < length
    

    or

    range_min <= range < length

    Because we assumed range_min to be positive, this is the same as range <u length, the range check before transformation.

    length' must be a 32-bit integer, but is computed in terms of 64-bit values. Note, however, that range_max - range_min fits in a 32-bit integer, because loop bounds require scale * j to fit in a 32-bit integer. That, in turn, guarantees that length' can safely be stored as a 32-bit value.

    Extensive optimizations accommodate more Java loops

    This article covered recent optimizations in the OpenJDK HotSpot virtual machine that support loops with long loop variables, with a particular focus on range checks operating on it. I gave an overview of the code patterns that developers can expect to be properly optimized. The article also demonstrated how new APIs require special virtual machine support and how the entire Java platform evolves to meet changing usages. Finally, I showed how reshaping a loop with a long index enables a range of optimizations.

    The cases we didn't cover in this article—loops going downward and a negative scale—are discussed in a lengthy comment added to the JDK change request, augmented by this follow-up.

    Last updated: February 11, 2024

    Related Posts

    • Range check elimination in loops in OpenJDK's HotSpot JVM

    • Runtime profiling in OpenJDK's HotSpot JVM

    • How the JIT compiler boosts Java performance in OpenJDK

    Recent Posts

    • How to run a fraud detection AI model on RHEL CVMs

    • How we use software provenance at Red Hat

    • Alternatives to creating bootc images from scratch

    • How to update OpenStack Services on OpenShift

    • How to integrate vLLM inference into your macOS and iOS apps

    What’s up next?

    The microservice architectural approach is more than just about technology: It reaches into the foundation of your organization to allow you to build truly scalable, adaptive, complex systems that help a business adapt to rapidly changing competitive markets. In Microservices for Java Developers, you'll get a hands-on introduction to frameworks and containers through a handful of familiar patterns.

    Get the e-book
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue