Production being affected by software issues is always an unwanted scenario. Diagnosing production issues, however, should never be an unplanned activity. Structured testing and QA efforts would ideally prevent any software bugs from entering production. So the dilemma is how to prepare for something unexpected in production that was not considered during the earlier testing and QA phases.
To troubleshoot production issues, the answer lies with adaptive tools that allow for dynamically diagnosing selected functionality with minimal disruption for the rest of the system. Think of a recently introduced new software component or function that, as a whole, is not consistently performing as expected, and its existing instrumentation is not providing the needed insight about the possible root cause of the issue. Since merely restarting a complex application might be unfeasible in production or make the issue go away for a certain period, you need to be able to safely modify running code on the fly to avoid time-consuming delays while diagnosing the issue at hand.
In more concrete terms, you could start diagnosis efforts by adding timings for selected application methods and monitoring their error rates when the system is still running and showing issues. This would allow you to narrow down the exact problem area in an iterative manner without affecting other parts of the application. Given that these retrospective code changes would be done in production, the modifications should be absolutely certain not to introduce additional and possibly more severe problems. This means the changes need to be done using generic, proven tools but in an application-specific manner.
The Java Instrumentation API allows modification of the bytecodes of methods on the Java Virtual Machine (JVM) at runtime. While technically this would make it possible to implement any wanted changes to an application to gain insight into its behavior, approaching an unclear production issue at a bytecode level would be the complete opposite of the high-level iterative approach described above.
Byteman to the rescue
Byteman, unlike many other bytecode transformers, operates at the level of Java, not bytecode. You give Byteman one or more rules that specify the Java code you want to be executed and the location in methods where you want it to be injected. Byteman works out how to rewrite the bytecode so it behaves as if the original Java code included the source-level changes you requested. Byteman also does the needed type checking and type inference, which are an absolute must for the safety of transformations.
Below is a simple example of Byteman rules that would help you understand how often an exception is thrown during a method's execution by keeping a tally of the exception count and printing it at a certain interval:
RULE Count exits via exceptions CLASS com.example.SomeClass METHOD someMethod AT EXCEPTION EXIT IF true DO incrementCounter("exceptions"); ENDRULE RULE Print exception exit count CLASS com.example.SomeClass METHOD someMethod AT ENTRY BIND exceptions = readCounter("exceptions"); IF exceptions % 10 == 0 DO trace("Exception exit count for someMethod - "); trace("" + new java.sql.Timestamp(System.currentTimeMillis())); traceln(": " + readCounter("exceptions")); ENDRULE
The combination of
METHOD, and location (
AT ENTRY, etc.) identify where the Java code provided in the rule gets injected and executed. All expressions appearing in the
DO clauses are just plain old Java code. An
IF expression must always be provided to determine when the
DO actions get run. The calls to the built-in convenience methods
traceln actually invoke corresponding methods of a Byteman-provided helper class (aptly named
Helper). The first pair enables counting of events during execution, and the second pair simply wraps calls which print to
IF expression is always type-checked to ensure it is boolean. By contrast, the type of rule variable
exceptions is inferred to be
int (using the known signature of
readCounter) and is used later to check type-correctness at points of use, such as the modulo (%) or String concat (+) operations.
With Byteman helper scripts, Java statements corresponding to these rules can be dynamically injected into a running Java application without affecting any other parts of the application except for this one particular method. Later these modifications could be removed with a Byteman helper script as well. In a way, this already provides previously unavailable information from the running application and allows you to see whether the method is executing as expected.
Obviously, there are few downsides with this initial approach. Writing rules manually for several methods for different kinds of tracing purposes would be tedious and error-prone. Creating rules for keeping track of, for example, method execution times would be easier if you utilized custom Java classes and methods. Last, but not least, statistics should not be written to stdout but instead be available over the standard JMX interface so that they can be consumed by commonly used monitoring tools.
Byteman automation tool
To make diagnosing issues on the fly with Byteman easier and to address the above downsides of the manual approach, a Byteman automation tool was recently introduced.
The tool automates the generation of Byteman rules to provide statistics from unmodified Java applications for metrics such as the number of calls per method, the execution times of methods, the exception exit count per method, the number of instances of a class, and the instance lifetimes. Any set of these statistics can be enabled and disabled dynamically and then monitored with standard tools using JMX.
The tool only requires defining the target methods in a simple text file and then generating the wanted rules with a selected set of command-line options. Below is an example that would create rules to provide metrics for method executions times and exception exit count for three different methods:
$ cat targets.txt com.example.SomeClass#methodOne com.example.SomeClass#methodTwo com.example.SomeClass#methodThree $ java \ -jar ./target/proftool-1.0.jar \ --input-file targets.txt \ --register-class com.example.SomeClass \ --register-method 'methodOne' \ --call-exectimes-min \ --call-exectimes-avg \ --call-exectimes-max \ --call-exit-except \ --output-file rules.btm $ wc -l rules.btm 108 $ tail -n 9 rules.btm RULE Exits via exceptions from method: com.example.SomeClass - methodThree CLASS com.example.SomeClass METHOD methodThree AT EXCEPTION EXIT HELPER org.jboss.byteman.automate.proftool.JMXHelper COMPILE IF true DO incrementMethodExitExceptCount($CLASS, $METHOD); ENDRULE
The generated rule employs a dedicated helper class part of the tool,
JMXHelper. The call to its instance method,
incrementMethodExitExceptCount(), updates a class/method-specific counter and makes the value available via JMX.
$METHOD are special rule variables provided by Byteman, identifying the method the rule was injected into. That is, in this case, their values will be
methodThree when the injected code runs.
Now we can use Byteman helper scripts to inject all the needed Java code into a running application on a JVM to start gathering these statistics:
$ bminstall <pid-of-jvm> $ bmsubmit -s proftool-1.0.jar $ bmsubmit -l rules.btm
(In case JMX metrics were not enabled on the JVM on startup, a simple utility is available in the repo for enabling JMX on the fly on a JVM.)
The above is all that is needed to see how long each of the three methods take to execute at a minimum, on average, and at a maximum and to see how often they exit due to an exception. Tools like JConsole or Prometheus could then be used for analyzing the situation and determining the next steps.
In this article, we saw how the Byteman automation tool provides a quick way to provide additional instrumentation for unmodified Java applications without even the need for restarts. This information may then be used as a basis for further troubleshooting and error correction.
If further details are needed, Byteman supports a lot of alternative "AT" locations for injecting code to track and respond to application events like countdowns, flags, and timers. The Byteman Programmer's Guide and other resources listed below provide full details on how to use these additional capabilities.
- Byteman homepage
- Byteman documentation
- Byteman automation tool
- Byteman Programmer's Guide
- Byteman automation tutorial
Other Byteman articles
- Using Byteman to find out why the TimeZone changed on a Java app server
- Enabling Byteman scripts with Red Hat JBoss Fuse and AMQ – Part 1
- Enabling Byteman scripts with Red Hat JBoss Fuse and AMQ – Part 2