<aside> 🌟 ⬆️ 原网页有演示动画
</aside>
Past a certain point, web applications outgrow a single server deployment. Companies either want to increase their availability, scalability, or both! To do this, they deploy their application across multiple servers with a load balancer in front to distribute incoming requests. Big companies may need thousands of servers running their web application to handle the load.
In this post we're going to focus on the ways that a single load balancer might distribute HTTP requests to a set of servers. We'll start from the bottom and work our way up to modern load balancing algorithms.
Let's start at the beginning: a single load balancer sending requests to a single server. Requests are being sent at a rate of 1 request per second (RPS), and each request reduces in size as the server processes it.
For a lot of websites, this setup works just fine. Modern servers are powerful and can handle a lot of requests. But what happens when they can't keep up?
Here we see that a rate of 3 RPS causes some requests to get dropped. If a request arrives at the server while another request is being processed, the server will drop it. This will result in an error being shown to the user and is something we want to avoid. We can add another server to our load balancer to fix this.
No more dropped requests! The way our load balancer is behaving here, sending a request to each server in turn, is called "round robin" load balancing. It's one of the simplest forms of load balancing, and works well when your servers are all equally powerful and your requests are all equally expensive.
In the real world, it's rare for servers to be equally powerful and requests to be equally expensive. Even if you use the exact same server hardware, performance may differ. Applications may have to service many different types of requests, and these will likely have different performance characteristics.
Let's see what happens when we vary request cost. In the following simulation, requests aren't equally expensive. You'll be able to see this by some requests taking longer to shrink than others.
While most requests get served successfully, we do drop some. One of the ways we can mitigate this is to have a "request queue."
Request queues help us deal with uncertainty, but it's a trade-off. We will drop fewer requests, but at the cost of some requests having a higher latency. If you watch the above simulation long enough, you might notice the requests subtly changing colour. The longer they go without being served, the more their colour will change. You'll also notice that thanks to the request cost variance, servers start to exhibit an imbalance. Queues will get backed up on servers that get unlucky and have to serve multiple expensive requests in a row. If a queue is full, we will drop the request.
Everything said above applies equally to servers that vary in power. In the next simulation we also vary the power of each server, which is represented visually with a darker shade of grey.
The servers are given a random power value, but odds are some are less powerful than others and quickly start to drop requests. At the same time, the more powerful servers sit idle most of the time. This scenario shows the key weakness of round robin: variance.
Despite its flaws, however, round robin is still the default HTTP load balancing method for nginx.
It's possible to tweak round robin to perform better with variance. There's an algorithm called "weighted round robin" which involves getting humans to tag each server with a weight that dictates how many requests to send to it.