Edit: 23rd May, 2019

Here is a not-so-brief introduction about Load Balancers, various ways you can do it, possible behind the scene and the algorithms used.

I also learned that Power of 2 least random choices is one of the popular Load balancing algorithms due to it's performance and simplicity.

I think most of the developers have at least heard of the concept of load balancing or load balancer. Maybe some of them used it in production, provided by cloud services or language tooling etc. I personally have never thought of making one though!

Recently I had to make one and to do so I studied how it works and some algorithms behind it. In this post I will try to document my findings.

It turns out load balancing is a simple topic. Let's say we have multiple workers and list of tasks to do. In this context load balancing could mean distribute all the tasks in a way that all workers get same number of task, or in a way that tasks get done as fast as possible. During the distribution we can consider multiple variables  of workers like who is more muscular (resource), who can get task done fast (response time), who gets tired working faster than others (latency!) and many more.

Algorithms... following are few ones that I am going to discuss here. All of them, in a way, tries to determine who will get next task.

  1. Round Robin /+ weighted version
  2. Least Connection /+ weighted version
  3. Fixed weight /aka predefined order
  4. Response Time based
  5. Server and resource load based

Round Robin is the simplest of them all. Each worker gets their turn one by one. First task will go to workerA, second one to workerB, third one to workerC and so on. And when all the workers are done, it will again start from workerA. There's a weighted version of it. This means that depending on the weight, workers will get priority. Now this weight could be determined from many variables, like the ones we talked about above. Or it could simply be predefined. In this case it will become Fixed weighted!

Least Connection or in our context least task, basically means whichever worker has least task in his hand will get the next task. Because probably he has more free time than others. This one also comes with the weighted version but you get the idea.

Response Time based mainly tries to get task faster. So send the task to the one who is fast worker. But the he might be busy or overloaded and we are trying to load balance remember? So there could be constrain like even if he is a fast worker he could handle 5 task at a time. In which case 6th task is send to next possible fast worker.

Server and resource based model tries to determine the next worker for the task based on their ability. So whichever worker can handle more tasks, gets more tasks to do than others.

As we can see although the topic is simple it doesn't remain simple at all. And I don't think any one of them is perfect for the job. Maybe a combination of two or more of them is more likely to produce better algorithm to balance the load.
What do you think?