Generally when the topic of Business Process Management (BPM) comes up we think of BPM software suites. There’s another side to BPM though, and that’s the practice of process management, which doesn’t require any software at all.
Traditionally the BPM practice has focused on continuous process improvement. There are various methodologies but it generally comes down to this:
- Collect metrics on the existing process
- Analyze those metrics
- Propose an optimization
- Simulate the optimization with the collected metrics
- Institute the validated optimization
- Do it all again
There’s nothing wrong with that. We’ve occasionally had good results with continuous improvement for processes that are core to a business. A good candidate, for example, would be a fee-for-service health insurance claim process --- it’s a process that’s been around for decades and will likely be around for additional decades. It’s also high volume, so even the smallest improvement can have a major impact.
Applying business process management, in practice
The unfortunate truth about applying the practice in this way, is that it is very time consuming even with the best software. It can take at least weeks, if not months or years, to get anything meaningful done, and there's quite a bit of human lead-time.
That may be acceptable for a core process, but for most other processes that lead time is just not acceptable. Even if a program or project is approved with estimates made by the most experienced practitioners, it will likely be considered a failure after the inevitable missed deadlines and cost overruns.
We need to provide obvious value much faster. By fast, I mean within hours for something critical, to no more than a few weeks for something significant.
Unfortunately, many of the BPM software suites have become unwieldy behemoths that include a variety of sophisticated functions in a very complicated package. They are generally architected such that the workflow transaction (request) is under complete control of the process engine. This can’t work in a microservices architecture where the various services can send messages to one another in an ad hoc manner. We need to change our viewpoint from being an orchestrator of services to a monitor of milestones.
We can still apply BPM patterns like Service Level Agreement (SLA) management to processes that include microservices. The process diagrams will look basically the same. The difference is that instead of controlling the movement of a transaction through activities, we will be monitoring milestones as the transaction moves through services. I’ll call them “tracking processes.”
If the services or message bus can ping events to the process manager at certain times, we can create processes that wait for these signals before proceeding on the the next milestone. In the case of SLA management, if the SLA deadline is approaching and we haven’t made progress toward a particular milestone, we can start a new workflow to deal with it. It could perhaps include a human task for some to investigate the delay, or send a message to the client notifying them of a delay.
Designing a workflow that tracks milestones, rather than orchestrates activities.
Back in the early days of “workflow”, we had control of the transaction --- usually a document --- from the start of the process to the end. As IT evolved into the services oriented architecture (SOA) and enterprise service bus (ESB) era, we had a little bit less control but for the most part the process engine orchestrated everything. There were frequent hand-offs to message queues but normally the message would come back to the process engine, which would continue to orchestrate the process.
The microservices world is different.
Instead of having a process engine or an ESB controlling a small number of large services, we have many small services that can potentially send and receive messages or respond to events from any of the other services. It’s more like a web. One initiating message or event to a particular service could affect the exchange of many hundreds of messages between the microservices before the initial request is considered complete. That can make BPM practitioners a bit uneasy due to the loss of control.
We may not have control any longer but we still can have visibility into the process. We can still apply our usual patterns for SLA and exception management, and human and compensating workflows. This can be accomplished through what I call a “tracking” process.
I have a process running today that interacts with microservices written with the microservices framework, Vert.x.
Vert.x includes an Event Bus and a cluster manager, among other features. A Vert.x cluster is made up of one to many nodes. A microservice is packaged as a jar module that includes a number of what they call Verticles (British spelling I guess). The verticles are deployed to any number of Vert.x nodes.
Once the verticles are deployed, the Event Bus manages the flow of messages and responses throughout the cluster. This all happens asynchronously, so there is no way us to control that flow from the process manager. And, we can still create a process in BPMN that looks like the traditional process. Here is an example:
This is a simplified version of a real process that’s been running for a couple of years on Vert.x. It receives business opportunities from an outside source. Once one is received, we need to save it locally. Then we run it through a machine learning classifier to see if it is the type of opportunity the client might be interested in. If it is, then a human needs to have a look at it. Otherwise, it is rejected.
We receive thousands of these every day. Due to the parallel nature of Vert.x we are able to spawn many requests over the cluster and get this work done quickly. The persistence part a quite performant so we don’t need many instances of that verticle in the Vert.x cluster. The classification part is slow and requires more resources. So, we have many instances of that verticle over the cluster.
The process above looks like a traditional process, but in fact we are not in control of the transaction here. In each activity, we are sending a message using the Vert.x Event Bus and then waiting until an event happens at a future time. Once that event is received we move on the the next activity which does the same.
Unfortunately, the classification activity doesn’t always complete in a timely manner. In this example we added a boundary timer so that if the classification takes too long, we notify a user and then terminate the process. The activities that involve microservices in the main process are modeled as subprocesses. Here is an example of the Persist Opportunity subprocess.
The first activity is a custom work item handler I created for Vert.x. It will send a message to the Vert.x cluster using the Event Bus.
That message may cause a number of other services to be called within Vert.x. We don’t care about that, all we need to know is when it’s all finished. I created a customization for Vert.x so that the process manager will be sent a signal when a particular Vert.x service is complete. When that happens the Catch Signal will be executed. At that point, control will be returned to the calling process which can move on the next activity.
So, there you go! We can model processes as we are accustomed to even though we are not in control of the transaction as it moves through the various microservices. You can definitely use these patterns to combine microservices-based activities with traditional ones, and apply our usual process management patterns to all of it.
Software suites, architecture, and process mining
Many BPM practitioners are used to utilizing a software suite that has some sort of process manager --- a component that has control of the transaction as it progresses through activities. The process is generally authored and visualized graphically in BPMN or BPEL. When applying BPM in the microservices world we don’t have that visibility or control.
A microservices architecture, more or less, forms a web where many services can call each other in an ad hoc manner. Such an architecture is rarely designed visually like we are used to in BPM. That will likely change as the MSA tools and frameworks mature but for now, each service is relatively independent and less attention is given to how the entire solution behaves as a whole.
Business processes that are realized with a traditional architecture or a microservice architecture can still benefit from the practice of process management. There still can be resource constraints, rework, SLA violations, lack of auditing, etc. The problem is that we can’t easily see and understand what is happening visually as we would with a traditional solution.
To solve this, we can apply a concept called "process mining". We can actually create the kind of process diagrams we are used to in BPM by collecting event logs from the MSA, then apply various algorithms to the events that can be used discover a process diagram. The logs can be in any format; however, there is a standard called XES that can be used to represent the data needed to produce process diagrams.
Generally we need to know the resources that were involved in an activity, it’s start and stop time, as well as some kind of identifier that can be used to correlate related activities. The identifier is the hard part since you won’t likely want to force microservice designers to accommodate this need. There are some creative ways to impute such an identifier by the proximity of execution time along with some other datum.
Once the logs are accumulated, they can be transformed to XES format such that they can be imported into existing process mining tools for analysis. I’ve used two such tools. There is an open source tool called ProM and a commercial tool called Disco. ProM isn’t very easy to learn, but once you do, it is quite powerful. It can produce a BPMN diagram that you can then import into your traditional BPM Suite so that process simulation can be done against the transaction logs.
In doing this you may find that the solution could benefit from more instances of a particular microservice. You may see that there are many messages traveling through just a few services and perhaps they can be broken down more. You may also find that human resources are causing a backlog. Perhaps transactions that originate in Europe are being processed in the United States and could benefit from having a node in the cluster local to the originator.
This is all stuff that we traditionally do in process optimization. By applying process mining, we can now do the same with processes running over microservices.
Last updated: January 22, 2024