In this article we will explore three different issues when writing Java applications that can compromise data confidentiality, integrity, or availability. Managed runtimes offer a myriad of benefits for writing secure software, yet introduce a layer of abstraction that makes gaps less evident at first sight. To characterize each issue, we will look into their nature, analyze simplified examples, and discuss how to avoid them.
Integral overflow and underflow
Primitive integral types in the Java language represent 8, 16, 32, and 64-bit signed two's-complement values, as defined in section 4.2 Primitive Types and Values of the Java Language Specification (ed. 23). This definition allows numeric ranges of -128 to 127 for a byte
, of -32768 to 32767 for a short
, of -2147483648 to 2147483647 for an int
, and of -9223372036854775808 to 9223372036854775807 for a long
.
The javac
compiler prevents the assignment of integral values to fields, variables, or parameters if an implicit type conversion might truncate data. As an example, trying to compile the statement byte a = 128
returns an "incompatible types" error because the value 128 is not in the byte
's range. Explicit type casting can be used to force these assignments even at truncation and data loss risk. The statement byte a = (byte)128
is legal but ends up assigning the value -128 to the variable. To fit a value into a narrower type range, its uppermost bits are discarded and the remainder interpreted as a signed two's-complement value. In the last statement, the uppermost 24 0s of the 128 int are discarded and the binary remainder 10000000 interpreted as the lowest 8-bit negative number.
The arithmetic computation of integral values in Java may exceed the type range of the result on either the lower or upper bound. These cases are known as underflow and overflow respectively, but we will generalize the concept with the latter term. Overflows are not flagged by the javac
compiler as erroneous, even if they could be anticipated at compile time. On the contrary, their behavior is well-defined by the language specification. Intuitively, we can think of the lowest and highest ends of a given range as connected in a way such that subtracting 1 to the lowest results in the highest, and adding 1 to the highest results in the lowest. At implementation level, signed two's complement values are computed using standard base 2 arithmetic with uppermost bits (carry over) discarded for the result to fit into its type.
Overflows are allowed for efficiency. Most variables in a program take values within sub-ranges of their type and arithmetic operations generally fit into the widest operand type. Most frequent cases should not be penalized by doubling memory space required for results, taking more CPU cycles to handle types wider than the architecture's register size or adding overhead to manage variable-length types. Moreover, the cost of preventing or detecting overflows would be incurred irrespective of their occurrence, as the compiler cannot precisely determine the range of a result in all cases and needs to make conservative estimations. For hash value calculation overflows are even desirable. Developers are responsible for analyzing each case and deciding which types and checks are appropriate.
Addition, subtraction, multiplication, and division in Java yield values of either int
or long
type, depending on the widest operand type. If both operand types are narrower than int
(i.e., they are byte
or short
), the result type is int
. Once the result type is decided, operands may need to be implicitly widened. Widening is always safe in the sense that it preserves the numerical sign and value.
The following table shows result types and implicit widening for arithmetical operations between integral types.
Applying these rules to simple binary expressions is straightforward. Complexity increases as expressions become larger and involve multiple intermediate computations that are subject to overflows.
Let's look at an example with a seemingly handled overflow in a ternary expression:
a OP b | byte b | short b | int b | long b |
byte a | int (widening of a and b to int ) | int (widening of a and b to int ) | int (widening of a to int ) | long (widening of a to long ) |
short a | int (widening of a and b to int ) | int (widening of a and b to int ) | int (widening of a to int ) | long (widening of a to long ) |
int a | int (widening of b to int ) | int (widening of b to int ) | int | long (widening of a to long ) |
long a | long (widening of b to long ) | long (widening of b to long ) | long (widening of b to long ) | long |
long MAX_CAPACITY = 1024L;
...
int hdrLen = Integer.max(0, ...);
int bodyLen = Integer.max(0, ...);
long available = MAX_CAPACITY - (hdrLen + bodyLen)
if (available < 0L) {
throw new Exception("No space available.");
}
The declaration of MAX_CAPACITY
and available
as long
reveals the programmer's intent to use a type range that prevents overflows. The subtraction to determine available occurs in the long
range and the second operand (hdrLen + bodyLen
) is widened. However, the intermediate subexpression hdrLen + bodyLen
is computed first in the int
range and may overflow. By the time the subtrahend is widened, data loss could have occurred and be irreversible. Either casting hdrLen
to long
or applying the distributive property with respect to addition would prevent the overflow:
long available = MAX_CAPACITY - hdrLen - bodyLen;
or:
long available = MAX_CAPACITY - ((long)hdrLen + bodyLen)
To wrap up the previous example, we will discuss how to squeeze types further and improve memory efficiency while maintaining safety. We only recommend this technique if casting and memory-saving benefits outweigh code legibility.
Based on the observation that MAX_CAPACITY
is greater than 0, we know that subtracting either hdrLen
or bodyLen
(known to be positive numbers) cannot underflow the int
range. Subtracting both will potentially underflow the int
range but not in a way such that it wraps-around and goes lower or equal to MAX_CAPACITY
. In the worst case, both hdrLen
and bodyLen
are Integer.MAX_VALUE
and the result is MAX_CAPACITY + 2
. The reason for this is that the positive sub-range size of any integral type in Java is one below half of the range size. Thus, the addition of two positive numbers cannot overflow the size of the full range. The previous example can be re-written with narrower types and without casts as follows:
int MAX_CAPACITY = 1024;
...
int hdrLen = Integer.max(0, ...);
int bodyLen = Integer.max(0, ...);
int available = MAX_CAPACITY - hdrLen - bodyLen
if (available < 0 || available > MAX_CAPACITY) {
throw new Exception("No space available.");
}
For the naked eye, we are trading casts and memory savings for an extra comparison. However, a just-in-time (JIT)-compiler can easily replace the two signed comparisons with a single unsigned equivalent. This is how the code generated by C2 in x86-64 looks like, with 32-bit registers and without instructions for sign-extended widening:
mov $0x400,%eax ; MAX_CAPACITY = 1024
... ;
sub %esi,%eax ; MAX_CAPACITY - hdrLen
sub %edx,%eax ; MAX_CAPACITY - hdrLen - bodyLen
cmp $0x401,%eax ; MAX_CAPACITY - hdrLen - bodyLen vs. 1025
jae 0x00007fd054614eee ; if above or equal (unsigned), jump to
; throw an exception
Java APIs are available for cases in which we need to either detect or prevent integral overflows and the extra overhead is affordable. With the Math::xxxExact
family of methods (e.g. Math::addExact
) an ArithmeticException
is thrown if an overflow occurs. With the java.math.BigInteger
class, integral values of arbitrary length—only constrained by heap memory—can be computed without data loss.
Resource allocation caused by untrusted input
Applications frequently need to process untrusted inputs in one way or another. Bounding resources committed during the verification process is just as critical as sanitizing inputs. A few decades ago, zip bombs were known for triggering excessive computation and storage space consumption out of small files, a type of threat that is still current in different forms. We will discuss strategies to minimize the leverage offered to an attacker to achieve more with less and cause denial-of-service disruption.
Input verification takes CPU cycles and memory depending on its complexity and size. The list of resources involved in input verification might extend to file descriptors, locks, handles to external devices, and other services from the operating system. The key principle from a defensive security perspective is to commit resources gradually, optimizing the cost-benefit ratio and transferring as much of the burden as possible to the attacker's side. Checks that discard most of the malicious input early at a low cost are generally preferred.
A particular case that is worth discussing is when large inputs need to be held entirely in memory for verification. Upfront memory allocation could be risky and should be avoided when possible. A chunk-by-chunk allocation strategy might be preferable as it forces the attacker to send the data over the network and bear part of the load. Checks that limit the maximum input size or abort the verification process, freeing up resources upon certain conditions, might be combined for resilience.
Let's think of an application that implements the server-side of a protocol that handles requests from untrusted clients over the network. It is common for this type of protocol to define structures of fixed and variable length. The size of the variable component is typically specified in a header field at a known offset and parsed first by a receiver endpoint. Checking this size against a maximum before allocating memory to hold the variable component is a starter, but might not be enough. An attacker that plays by the rules can promise a variable component of the maximum allowed size. Instead of sending the data, the attacker would open a new connection and repeat the behavioral pattern. After a while, the application will end up with multiple connections opened, waiting for data to be received and having wasted a large portion of memory.
We can run a cost analysis for this example and see in detail what resources each actor commits.
Resource | Application (server) | Attacker (client) | Balance |
Network traffic | Receives the message header, a short fixed-length stream. | Sends the message header, a short fixed-length stream. | Same cost. No leverage for the attacker. |
Sockets | Maintains a full TCP socket state per connection, service provided by the operating system. | Sends packets over a raw socket. Does not need to maintain a full TCP socket state. | The attacker has an advantage. The application has to limit the number of connections established from the same client. |
Threads | A well-designed application will multiplex operating system threads to handle multiple requests. | Can send requests from a single thread or parallelize for higher throughput. | No leverage for the attacker. |
CPU | I/O mostly, not CPU intensive in principle. This resource should be watched as more checks over untrusted inputs are added (e.g. hash calculation). | I/O mostly, not CPU intensive. | No leverage for the attacker in principle. Balance might lean towards the attacker under some conditions. |
Memory | Allocates internal structures to handle each request and variable-length buffers. | No memory allocation per request, stateless as connections opened are immediately abandoned. | The attacker has an advantage. The application has to limit memory allocation. |
The balance shown in the cost analysis suggests that the application has to limit the number of connections from a single client and avoid upfront memory allocation while keeping an eye on CPU consumption. If a successive connection from the same client is identified, any previous connection could be aborted and resources freed up. Memory allocation should be done incrementally as data is received. A more comprehensive analysis of the proposed scenario makes it clear that the protocol design puts the application at a high exposure and risk. Authentication is missing, either at TLS or application layers. With authentication comes a stronger firewall that hardens defense and enables both traceability and accountability.
Defensive security should be conceived as a multi-layer architecture. Authentication certainly increases difficulty for attackers, but highly resilient services should still plan for compromised credentials, clients with different levels of authorization privileges, unintentional anomalous behavior, and other contingencies. Maintaining resources under check, verifying data integrity, ensuring consistent program state transitions, generating informative security logs, and flagging unexpected behavior for automatic or manual intervention are worth considering during the design and implementation of a protocol or an application.
Time-dependent execution paths in the light of the interpreter and JIT compilers
Sophisticated attacks can exploit time differences between execution paths to reveal secrets or compromise data integrity. With a sufficiently large number of probes, any noise at network or endpoint levels can be filtered out to obtain a high-precision clock capable of exposing even subtle differences. Writing time-constant code is not an easy task. Java developers should not only master time-resistant algorithms but also delve into language semantics and JIT compiled code to ensure this property.
As a general principle, execution time should not depend directly or indirectly on any secret or derived secret. This is proven to be difficult because programmers are trained for optimizing code, using conditionals to shortcut paths, and improving efficiency. The same is true for compilers and code transformations typically aimed to squeeze every bit of CPU performance. Language semantics is the only limit for a compiler in this regard.
To highlight the importance of understanding language semantics, we will refer to the two types of AND Boolean operators in the Java language. The operator &
will unconditionally evaluate expressions on both sides, whereas &&
will short-circuit evaluation if the expression on the left is false. While the full-circuit variant might still be subject to time variations depending on the context, the short-circuit counterpart will make execution time dependent on data from the get go.
Let's now think of a routine that receives a secret cookie from a client and loads a session object. A viable structure to hold the data required for this task would be a hash table, where cookies are keys and session objects values. When entries are added or looked up, hash collisions must be handled. In case of collision, valid cookies in the map are compared against the client's cookie. If we imagine an adversary with sufficient capabilities—i.e., sending unlimited cookies and measuring with high-precision response times—the first piece of information that might be leaked is whether the cookie sent has a hash that collides with an existing entry in the table. The non-collision path is expected to be faster as fewer comparisons are made.
Whether leaking a hash is a problem depends on the key (cookie) entropy. Within a small key-space, it is possible to run an exhaustive search or conduct a rainbow table type of attack to find the existing cookie that matches the guessed hash. The routine that compares keys upon collision could be a more appealing target, though. In a byte-by-byte comparison, it is critical not to bail out immediately after detecting a mismatch. An early stop will reduce entropy to 256 values per cookie byte that can be guessed incrementally with multiple tries. While it may be more inefficient, guaranteeing a time-constant evaluation requires comparing every byte, irrespective of any confirmed result. An existing API for this type of comparison is available in java.security.MessageDigest::isEqual
.
For the final part of this article we will introduce a capture-the-flag (CTF) type of security puzzle and analyze how the Java interpreter and JIT compiler may affect execution times. The Attacker's goal in this challenge is to implement Attacker::guessSecret
and return the secret in the last call from Judge::verdict
. To their advantage, the Attacker is allowed to invoke any API and persist state within its class. See below:
public class SecurityPuzzle {
static volatile boolean stop = false;
public static void main(String[] args) {
int secret = new java.security.SecureRandom().nextInt(0, 10);
Defender.defendSecret(secret);
Judge.verdict(secret);
}
}
class Judge {
static void notify(boolean successfulAttack) {}
static void verdict(int secret) {
System.out.println((Attacker.guessSecret() == secret ?
"Attacker" : "Defender") + " WINS!!\nSecret was " + secret);
}
}
class Defender {
static void defendSecret(int secret) {
while (!SecurityPuzzle.stop)
Judge.notify(Attacker.guessSecret() == secret);
}
}
class Attacker {
static int guessSecret() {
SecurityPuzzle.stop = true;
return -1;
}
}
A random secret between 0 and 9 is generated at the beginning of SecurityPuzzle::main
. This secret is first passed to Defender::defendSecret
. While the stop signal is off, the Defender challenges the Attacker by repeatedly invoking Attacker::guessSecret
and comparing returned values against the secret. After each comparison, the Defender communicates to the Judge whether the Attacker's guess was successful or failed.
The default Attacker::guessSecret
implementation turns the stop signal on and returns -1 right at the first call. The Defender leaves its loop and Judge::verdict
is then called. The Judge asks the Attacker for their final guess, compares the returned value with the secret, and decides who wins. The default Attacker returns -1 again and fails. We can reason from the previous sequence of events that the Attacker should only turn the stop signal on after guessing the secret. In the meantime, the Attacker can return values that are not necessarily right but might reveal something about the secret.
Before we begin exploring exploitation alternatives for the Attacker, it is worth mentioning that the underlying threat-model in this puzzle is not realistic in modern Java, after the deprecation and upcoming removal of the Security Manager. Untrusted code should never run alongside trusted code. However, the timing-attack concept can be extrapolated to real scenarios and hence its relevance here. Timing-attacks are an instance of covert channels that can be used to leak data, violating security policies and compromising its confidentiality.
We will start our analysis in Defender::defendSecret
. There is nothing time dependent in the Java code at first sight: two integral numbers (the secret and the attacker's guess) are compared for a boolean
value that is later passed in a method invocation. Looking at Defender::defendSecret bytecode
might offer some clues:
static void defendSecret(int);
Code:
0: getstatic #7 // Field SecurityPuzzle.stop:Z
3: ifne 24
6: invokestatic #13 // Method Attacker.guessSecret:()I
9: iload_0
10: if_icmpne 17 // Attacker.guessSecret() vs. secret
13: iconst_1 // Attacker.guessSecret() is equal to secret
14: goto 18
17: iconst_0 // Attacker.guessSecret() is not equal to secret
18: invokestatic #19 // Method Judge.notify:(Z)V
21: goto 0
24: return
The execution path when Attacker::guessSecret
returns the correct secret is slightly longer than otherwise: the goto
bytecode at index 14 makes the difference. In practical terms, this difference means that the interpreter has to execute a few more machine instructions and will take more time. While the time difference seems negligible for a single execution, it accumulates over multiple tries. Based on this observation, an attacker can return sequential values within the valid range (0 to 9) and measure each with System::nanoTime
. The slowest path at the end of multiple tries will be the secret. Even if paths were equally long, time differences might still be noticeable due to processor instruction cache misses or because of branch prediction mechanisms.
After a few thousand executions, the main loop in Defender::defendSecret
is compiled by the C2 JIT compiler. The time difference between paths does not get any better, though. An attacker exploiting this alternative will intentionally provide invalid guesses (e.g., -1) while the interpreter runs Defender::defendSecret
and influence profiling data for compilation in x86-64 to be as follows:
0x7f92e448e060:
... ;
cmp %ebx,%ebp ; Attacker::guessSecret() vs. secret
jne 0x7f92e448e060 ; If different, jump back to loop start
mov $0xffffff45,%esi ; If equal (unlikely), continue here
mov %ebx,0x4(%rsp) ;
nop ;
callq 0x7f92e400f320 ; Call Uncommon trap (interpreter)
Because of profiling data induced by the attacker, the case in which the secret is equal to the guess has never occurred and is so unlikely that it is worth it for the compiler to trim the branch. Besides saving compilation time, this technique enables more powerful data flow analysis and run time optimization. A call to an Uncommon trap replaces the pruned branch. If this trap is ever hit, the native stack frame is deoptimized for execution to continue in the interpreter. As a result, invalid guesses exhibit a very low latency, the first hit to the secret a notoriously high latency and successive invalid guesses somewhere in between. The following latencies were measured in nanoseconds for guesses between 0 and 9 when the secret was 5:
__
guess 0: 14 |
guess 1: 14 |
guess 2: 14 |----> C2-compiled loop times
guess 3: 14 |
guess 4: 15 __|
__
guess 5: 30125 |----> Deoptimization
guess 6: 239 __|
__
guess 7: 50 |
guess 8: 51 |----> Interpreter times
guess 9: 49 __|
The implementation of the described attacks (interpreter path length and JIT deoptimization) are left as an exercise for the reader.
Limiting the number of probes that an attacker can make is certainly helpful as a defensive strategy, but not necessarily enough. More caution is advised for approaches that tamper with the clock's precision by causing random delays, as they can turn statistically insignificant given a sufficient number of probes. Bit-level operations are generally safer than comparisons and conditional jumps when a secret is involved. For example, the following could be a resilient implementation for Defender::defendSecret
:
class Defender {
static void defendSecret(int secret) {
java.util.List<long[]> countersList = new java.util.ArrayList<>();
long[] currentCounter = new long[2];
long currentTotal = 0L;
while (!SecurityPuzzle.stop) {
int cmp = Attacker.guessSecret() ^ secret;
byte cmpByte = 0;
for (int i = 0; i < 32; i++) {
cmpByte |= (cmp >> i ) & 0x1;
}
currentCounter[cmpByte] = currentCounter[cmpByte] + 1;
currentTotal++;
if (currentTotal == Long.MAX_VALUE) {
countersList.add(currentCounter);
currentCounter = new long[2];
currentTotal = 0L;
}
}
countersList.add(currentCounter);
for (long[] counters : countersList) {
while (counters[0] > 0) {
Judge.notify(true);
counters[0] = counters[0] - 1;
}
while (counters[1] > 0) {
Judge.notify(false);
counters[1] = counters[1] - 1;
}
}
}
}
In this implementation, the Attacker's guess is still compared against the secret but through an XOR bit operation. The time it takes for this operation is independent from the secret value. If the guess is equal to the secret, the result is 0. Otherwise, it is a number different from 0. If we apply an OR operation to every bit in the result we can simplify these two cases: cmpByte
is 0 if the Attacker's guess and secret are equal and 1 otherwise. This operation is also time-constant because it depends on the secret size (32 bits) but not on its value. Counters can be increased to record each successful and failed attempt. Judge notification can be deferred to a later point, out of reach for the Attacker's measurement capacity.
To understand how this defensive proposal works, we will track the secret and anything derived from it throughout the C2-compiled loop in x86-64:
xor 0x8(%rsp),%r14d ; The secret in 0x8(%rsp) is XORed with the Attacker's
; guess held in %r14d. The result is stored in %r14d.
;
... ;
;
sarx %r13d,%r14d,%r11d ; The secret-derived value in %r14d is shifted %r13d
; bits to the right. The result is stored in %r11d.
;
and $0x1,%r11d ; The secret-derived value in %r11d is ANDed with 1.
; The result is stored in %r11d.
;
or %r11d,%ebx ; The secret-derived value in %r11d is ORed with the
; value in %ebx (originally 0). The result is stored
; in %ebx.
;
movsbl %bl,%ebx ; The secret-derived value in %bl (byte) is
; sign-extended to int and stored in %ebx.
;
... ; Instructions repeating the previous sequence as a
; result of the pre/main/post loop optimization are
; omitted for brevity.
;
; Instructions that follow were found earlier in the
; generated code but moved here for clarity.
;
cmp %r10d,%ebx ; The secret-derived value in %ebx is compared against
; the currentCounter array length. This range check
; will never fail.
;
... ;
;
incq 0x10(%r13,%rbx,8) ; The secret-derived value in %rbx is used as an
; index to write an increased counter into the
; currentCounter array.
;
As we can see from the secret taint analysis, there is no execution path dependent on the secret's value or any derivatives. The following times were measured in nanoseconds for guesses between 0 and 9 when the secret was 5:
guess 0: 31
guess 1: 29
guess 2: 31
guess 3: 30
guess 4: 30
guess 5: 30 ---> guess matches secret, but is indistinguishable from the rest
guess 6: 30
guess 7: 31
guess 8: 30
guess 9: 29
On a final note, we want to raise awareness about the risk of timing attacks and advise developers to thoroughly analyze their code either when run by the interpreter or compiled into a native method. While some patterns are known to be problematic, a case-by-case study is ultimately required to ensure safety. java
command line arguments such as -XX:CompileCommand=print,class::method
in combination with the hsdis
Hotspot plug-in can offer a valuable insight into machine instructions executed when a method is compiled.