Most of the larger-scale services that we design nowadays depend, more or less, on external APIs: you’ve heard it multiple times, as soon as your codebase starts to look like a monolith it’s time to start splitting it into smaller services that can evolve independently and aren’t strongly coupled with the monolith.
Even if you don’t really employ microservices, chances are that you already depend on external services, such as elasticsearch, redis or a payment gateway, and need to integrate with them via some kind of APIs.
What happens when those services are slow or unavailable? Well, you can’t process search queries, or payments, but your app would still be working “fine” — right?
That is not always the case, and I want to run a few benchmarks to show you how a little tweak, timeouts, prove beneficial when dealing with external services.
Our case
We’ve started a new Hello World! startup that, surprisingly, makes money by deploying a useless service that prints a string retrieved from another service: as you understand, this is an oversimplification of a real-world scenario, but it will serve our purpose well enough.
Our clients will be connecting to our main frontend,
server1.js
which will then make an HTTP request towards
another service, server2.js
which will reply back:
once we have an answer from server2.js
we can then
return the response body to our client.
1 2 3 4 5 6 7 8 9 |
|
1 2 3 4 5 6 7 8 9 |
|
A few things to note:
- the servers run on port
3000
(main app) and3001
(“backend” server): so once the client will make a request tolocalhost:3000
a new HTTP request will be sent tolocalhost:3001
- the backend service will wait 100ms (this is to simulate real-world use cases) before returning a response
- I’m using the unirest HTTP client: I like it a lot and, even though we could have simply used the built-in
http
module, I’m confident this will gives us a better feeling in terms of real-world applications - unirest is nice enough to tell us if there was an error on our request, so we can just check
response.error
and handle the drama from there - I’m going to be using docker to run these tests, and the code is available on github
Let’s run our first tests
Let’s run our servers and start bombing server1.js
with
requests: we’ll be using siege (I’m too hipster for AB),
which provides some useful info upon executing the load test:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
The -c
option, in siege, defines how many concurrent requests
we should send to the server, and you can even specify how many
repetitions (-r
) you’d like to run: for example, -c 10 -r 5
would mean we’d be sending to the server 50 total requests, in
batches of 10 concurrent requests. For the purpose of our benchmark
I decided, though, to keep the tests running for 3 minutes and
analyze the results afterwards, without setting a max number of
repetitions.
Additionally, in my next examples I will be trimming down the results to the most important items provided by siege:
- availability: how many of our requests was the server able to handle
- transaction rate: how many requests per second we were able to male
- successful / failed transactions: how many requests ended up with successful / failure status codes (ie. 2xx vs 5xx)
Let’s start by sending 500 concurrent requests to observe how the services behave.
1 2 3 4 |
|
After around 3 minutes, it’s time to stop siege (ctrl+c
) and see how the results
look like:
1 2 3 4 |
|
Not bad, as we’ve been able to serve 1156 transactions per second; even better than that, it doesn’t seem like we’ve got any error, which means our success rate is 100%. What if we up our game and go for 1k concurrent transactions?
1 2 3 4 5 6 7 |
|
Well, well done: we slightly increased throughput, as now our app is able to handle 1283 requests per second: since the apps do very less (print a string and that’s it) it is likely that the more concurrent requests we’ll send the higher the throughput.
These numbers might be useless now (we’re not comparing them to anything) but will prove essential in the next paragraphs.
Introducing failure
This is not how real-world webservices behave: you have to accept failures and build resilient applications that are capable of overcoming them.
For example, suppose our backend service is going through a hard phase and starts lagging every now and then:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
In this example, 1 out of 10 requests will be served after a timeout of 10s has passed, whereas the other ones will be processed with the “standard” delay of 100ms: this kind of simulates a scenario where we have multiple servers behind a load balancer, and one of them starts throwing random errors or becomes slower due to excessive load.
Let’s go back to our benchmark and see how our server1.js
performs now that its dependency will start to slow down:
1 2 3 4 5 6 |
|
What a bummer: our transaction rate has plummeted,
down by more than 30%, just because some of the responses
are lagging — this means that server1.js
needs
to hold on for longer in order to receive responses
from server2.js
, thus using more resources and being
able to serve less requests than it theoretically can.
An error now is better than a response tomorrow
The case for timeouts starts by recognizing one simple fact: users won’t wait for slow responses.
After 1 or 2 seconds, their attention will fade away and the chances that they might still be hooked to your content will vanish as soon as you cross the 4/5s threshold — this mean that it’s generally better to give them an immediate feedback, even if negative (“An error occurred, please try again”), rather than letting them get frustrated over how slow your service is.
In the spirit of “fail fast”, we decide to add a timeout in order to make sure that our responses meet a certain SLA: in this case, we decide that our SLA is 3s, which is the time our users will possibly wait to use our service1.
1 2 3 4 5 |
|
Let’s see how the numbers look like with timeouts enabled:
1 2 3 4 |
|
Oh boy, we’re back into the game: the transaction rate is again higher than 1k per second and we can almost serve as many requests as we’d do under ideal conditions (when there’s no lag in the backend service).
Of course, one of the drawbacks is that we have now increased the number of failures (10% of total requests), which means that some users will be presented an error page — still better, though, than serving them after 10s, as most of them would have abandoned our service anyway.
We’ve now seen that, ideally, timeouts help you preserve near-ideal rps (requests per second), but what about resource consumption? Will they be better at making sure that our servers won’t require extra resources if one of their dependencies becomes less responsive?
Let’s dig into it.
The RAM factor
In order to figure out how much memory our server1.js
is consuming, we need to measure, at intervals, the amount
of memory the server is using; in production, we would
use tools such as NewRelic or KeyMetrics
but, for our simple scripts, we’ll resort to the poor man’s
version of such tools: we’ll be printing the amount of
memory from server1.js
and we’ll use another script
to record the output and print some simple stats.
Let’s make sure server1.js
prints the amount of memory
used every 100ms:
1 2 3 4 5 6 7 |
|
If we start the server we should see something like:
1 2 3 4 5 6 7 8 |
|
which is the amount of memory, in MB, that the server is
using: in order to crunch the numbers I wrote a simple script
that reads the input from the stdin
and computes the stats:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
The module is public and available on NPM, so we can just install it globally and redirect the output of the server to it:
1 2 3 4 5 |
|
Let’s now run our benchmark again — 3 minutes, 1k concurrent requests, no timeouts:
1 2 3 |
|
And now let’s enable the 3s timeout:
1 2 3 |
|
Whoa, at a first look it seems like timeouts aren’t helping after all: our memory usage hits a high with timeouts enabled and is, on average, 5% higher as well. Is there any reasonable explanation to that?
There is, of course, as we just need to go back to siege and look at the rps:
1 2 |
|
Here we’re comparing apples to oranges: it’s useless to look at memory usage of 2 servers that are serving a different number of rps, as we should instead make sure they are both offering the same throughput, and only measure the memory at that point, else the server that’s serving more requests will always start with some disadvantage!
To do so, we need some kind of tool that makes it easy to generate rps-based load, and siege is not very suitable for that: it’s time to call our friend vegeta, a modern load testing tool written in golang.
Enter vegeta
Vegeta is very simple to use, just start “attacking” a server and let it report the results:
1
|
|
2 very interesting options here:
--duration
, so that vegeta will stop after a certain time--rate
, as in rps
Looks like vegeta is the right tool for us — we can then issue a command tailored to our server and see the results:
1
|
|
This is what vegeta outputs without timeouts:
1 2 3 4 5 6 7 8 |
|
and this is what we get when server1.js
has
the 3s timeout enabled:
1 2 3 4 5 6 7 8 9 |
|
as you see, the total number of requests and elapsed time is the same between the 2 benchmarks, meaning that we’ve put the servers under the same level of stress: now that we’ve got them to perform the same tasks, under the same load, we can look at memory usage to see if timeouts helped us keeping a lower memory footprint.
Without timeouts:
1 2 3 |
|
and with timeouts:
1 2 3 |
|
This looks more like it: timeouts have helped us keeping the memory usage, on average, 30% lower2.
All of this thanks to a simple .timeout(3000)
:
what a win!
Avoiding the domino effect
Quoting myself:
What happens when those services are slow or unavailable? Well,
you can’t process search queries, or payments, but your
app would still be working “fine” – right?
Fun fact: a missing timeout can cripple your entire infrastructure!
In our basic example we saw how a service that starts to fail at a 10% rate can significantly increase memory usage of the services depending on it, and that’s not an unrealistic scenario — it’s basically just a single, wonky server in a pool of ten.
Imagine you have a webpage that relies on a
backend service, behind a load balancer, that
starts to perform slower than usual: the service
still works (it’s just way slower than it should),
your healthcheck is probably
still getting a 200 Ok
from the service (even
though it comes after several seconds rather than
milliseconds), so the service won’t be removed from
the load balancer.
You’ve just created a trap for your frontends: they will start requiring more memory, serve less requests and… …a recipe for disaster.
This is what a domino effect looks like: a system slows down (or incurs in downtime) and other pieces in the architecture are affected by it, highlighting a design that didn’t consider failure an option and is neither robust nor resilient enough.
The key thing to keep in mind is: embrace failures, let them come and make sure you can fight them with ease.
A note on timeouts
If you thought waiting is dangerous, let’s add to the fire:
- we’re not talking HTTP only — everytime we rely on an external system we should use timeouts
- a server could have an open port and drop every packet you send — this will result in a TCP connection timeout. Try this in your terminal:
time curl example.com:81
. Good luck! - a server could reply instantly, but be very slow at sending each packet (as in, seconds between packets). You would then need to protect yourself against a read timeout
…and many more edge cases to list. I know, distributed systems are nasty.
Luckily, high-level APIs (like the one exposed by unirest) are generally helpful since they take care of all of the hiccups that might happen on the way.
Closing remarks: I’ve broken every single benchmarking rule
If you have any “aggressive” feedback about my rusty benchmarking skills… …well, I would agree with you, as I purposely took some shortcuts for the sake of simplifying my job and the ability, for you, to easily reproduce these benchmarks.
Things you should do if you’re serious about benchmarks:
- do not run the code you’re benchmarking and the tool you use to benchmark on the same machine. Here I ran everything on my XPS which is powerful enough to let me run these tests, though running siege / vegeta on the same machine the servers run definitely has an impact on the results (I say
ulimit
and you figure out the rest). My advice is to try to get some hardware on AWS and benchmark from there — more isolation, less doubts - do not measure memory by logging it out with a
console.log
, instead use a tool such as NewRelic which, I think, is less invasive - measure more data: benchmarking for 3 minutes is ok for the sake of this post, but if we want to look at real-world data, to give a better estimate of how helpful timeouts are, you should leave the benchmarks running for way longer
- keep Gmail closed while you run
siege ...
, the tenants living in/proc/cpuinfo
will be grateful
And…I’m done for the day: I hope you enjoyed this post and, if otherwise, feel free to rant in the comment box below!
- For system-to-system integrations it could even be 10 to 20 seconds, depending on how slow the backend you have to connect to is (sigh: don’t ask me) ↩
- But don’t be naive and think that the 30% of memory saved will be the same in your production infrastructure. It really depends on your setup – it could be lower (most likely) or even higher. Benchmark for yourself and see what you’re missing out on! ↩