REST and microservices – breaking down the monolith step by asynchronous step
A few days ago I had a rant about the misuse and misunderstanding of REST (typically HTTP) for microservices.
To summarize, a few people/groups have been suggesting that you cannot do asynchronous interactions with HTTP, and that as a result of using HTTP you cannot break down a monolithic application into more agile microservices. The fact that most people refer to REST when they really mean HTTP is also a source of personal frustration, because by this stage experienced people in our industry really should know the difference. If you’re unsure of the difference then check out the restcookbook or even Roy’s PhD thesis (it’s quite a good read!)
However, I digress, so back to the rant: My goal is to point people in the right direction and make some recommendations, hence this followup post.
REST and HTTP
To start with, it’s definitely wrong to assume that if you are building microservices you must stick with HTTP (although as has been shown in the last decade, a RESTful approach can be beneficial when developing with a service-oriented architecture). Take a look at some of these older InfoQ articles for inspiration, or at what we’ve been doing with WildFly/EAP and other projects/products over the past 7 years or so.
HTTP is not the only option – it has its drawbacks – not least of which is its text nature, and as we’ve found over on the Narayana project it’s still not really comparable performance with more mature approaches such as IIOP(!) This is despite binary HTTP/2.
It’s not just the performance nature of HTTP that may persuade you to look elsewhere. Traditional messaging products such as A-MQ, which support a range of patterns including brokered and broker-less messaging, also support messaging standards like AMQP or MQTT – making interoperability with heterogeneous systems possible.
I’m not suggesting microservices shouldn’t be developed with HTTP; however, when you’re developing with distributed systems you need to consider all aspects including but not limited to: reliability, performance, and coupling (as I mentioned in another article).
Don’t feel that you have to use HTTP, but likewise don’t feel you must stick with JMS (or even REST) if that’s what you’ve been using in the past. I know that using HTTP it’s relatively easy to test your service (spin up a browser), but check out Arquillian, for example, if you want to see ways of testing other approaches, including HTTP, without REST.
OK, so what about asynchronous HTTP? Is it impossible as some have stated? Of course it’s possible, and here’s where I give you some references to check out.
First let’s start with some of our well known projects that can be used to develop with the asynchronous message exchange pattern using HTTP: Vert.x and Undertow. Both are exceptionally popular projects with a range of customers having developed large scale applications with them.
Second? Well you don’t need to take my word for this; again go and check out a huge number of InfoQ articles on that exact topic, some of which date back almost a decade. Now if I had to recommend some books to read on the topic I’d definitely go with one from my old friend and ex-colleague Jim Webber or one from my old friend and current colleague Bill Burke (more on JAX-RS in a moment).
Whenever you’re working with HTTP it’s always important to understand the response codes that are available. We’re all pretty familiar with 200, 403, and 404 response codes (and maybe some of the 300’s – e.g., 301 when the service moves), but HTTP has a thing or two to help with asynchronous interactions, too. Specifically 202, where the standard says:
The request has been accepted for processing, but the processing has not been completed. The request might or might not eventually be acted upon, as it might be disallowed when processing actually takes place. There is no facility for re-sending a status code from an asynchronous operation such as this. The 202 response is intentionally non-committal. Its purpose is to allow a server to accept a request for some other process (perhaps a batch-oriented process that is only run once per day) without requiring that the user agent’s connection to the server persist until the process is completed. The entity returned with this response SHOULD include an indication of the request’s current status and either a pointer to a status monitor or some estimate of when the user can expect the request to be fulfilled.
I added highlighting to draw the eye to the bits which clearly mean “asynchronous processing”. Sure, it’s not one of those response codes you see much and if it is used then there’s a good chance you’re not seeing it as it is probably masked by the browser. The point is, however, that HTTP supports asynchronous invocations, so as a developer you can most certainly make use of them.
Of course if you’re looking to do asynchronous HTTP with a standards-based framework then you’re probably thinking of using JAX-RS. There are a plethora of resources on the Web and Bill Burke’s book that I mentioned earlier is another good one. He’s even written about JAX-RS 2.0 elsewhere, which is worth a look. The standards group, of which Bill was a member, explicitly added an asynchronous client API with callbacks in the most recent version of the specification, and let’s not forget his presentation on earlier versions in 2009.
Is it really asynchronous?
OK, so if you’ve read this far I hope it’s clear that it is entirely possible to do asynchronous processing with HTTP; however, there’s something else I wanted to try to point out as a flaw in some of the postings from other groups on the topic – things I did hint at in the original rant.
When people have been talking about asynchronous interactions they tend to fall into one of two categories: either they mean that the service request is delivered synchronously to the service which returns an acknowledgement or “ack” to the caller to indicate the work will be done and later the result is sent back to the caller, which has made making forward progress concurrently (think Promises and Callbacks), or the request is sent in a “fire and forget” manner, such that there is no indication of successful delivery to the caller.
Fortunately (or unfortunately depending on your perspective) most people who talk about the latter approach are really thinking about the former, they just forget/ignore the delivery “ack”. The distinction is important to understand and here’s why: in a truly asynchronous system (the second category) it’s impossible to rely on the concept of time to determine whether an endpoint has failed, or if it is just slow. This has a significant impact on deterministic consensus.
Managing the costs
In their 1985 paper – which later won the Dijkstra award given to the most influential papers – Fischer, Lynch and Patterson proved the theory (often referred to as the “FLP Theorem”) that it is impossible to rely on the concept of time to determine whether an endpoint has failed. Consensus (agreeing on a value between participants) is possible in synchronous systems but it’s impossible to do this in an asynchronous system with even just a single faulty processor.
You might ask why this is important to you? There’s the obvious aspect that if you move to a truly asynchronous invocation mechanism, then you need to understand what is and is not possible as a direct result. This isn’t theoretical either, as the FLP paper proved. So be aware and develop accordingly. There’s a good reason all ACID transaction protocols, such as those in Narayana, are synchronous.
The other thing to note is that some people assume that Brewer’s CAP theorem – which discusses trade-offs that need to be made between Consistency, Availability and Partition tolerance when developing a distributed system – is the same as FLP; Some even completely confuse the two theorems.
Although CAP and FLP are related in so much as they are both about behaviors in asynchronous distributed systems, there are some important differences:
For example, CAP says that it is not possible to build an implementation of read-write storage in an asynchronous network that satisfies all of the following three properties:
- Availability – each request eventually receives a response.
- Consistency – each server returns the right response to each request (they are atomic or linearizably consistent).
- Partition tolerance – the network is allowed to drop messages.
To summarize, FLP permits one failed node to be totally partitioned from the network and does not have to respond to requests, it does not allow messages to be lost (the network is asynchronous, not lossy) and consensus is a different problem to atomic storage.
Now maybe the above is a little more information than you need as a developer, but I think it’s always better to know about all of the possible pitfalls that are waiting for you in distributed systems. Plus, it really is important to understand when people throw around CAP or FLP that sometimes they’re not truly understanding the basis behind them. Unfortunately, I have to agree with Ken Birman, that CAP is often overused and misunderstood.