Discussion on Load Balancing

Discussion on Load Balancing

Daily short news for you
  • Privacy Guides is a non-profit project aimed at providing users with insights into privacy rights, while also recommending best practices or tools to help reclaim privacy in the world of the Internet.

    There are many great articles here, and I will take the example of three concepts that are often confused or misrepresented: Privacy, Security, and Anonymity. While many people who oppose privacy argue that a person does not need privacy if they have 'nothing to hide.' 'This is a dangerous misconception, as it creates the impression that those who demand privacy must be deviant, criminal, or wrongdoers.' - Why Privacy Matters.

    » Read more
  • There is a wonderful place to learn, or if you're stuck in the thought that there's nothing left to learn, then the comments over at Hacker News are just for you.

    Y Combinator - the company behind Hacker News focuses on venture capital investments for startups in Silicon Valley, so it’s no surprise that there are many brilliant minds commenting here. But their casual discussions provide us with keywords that can open up many new insights.

    Don't believe it? Just scroll a bit, click on a post that matches your interests, check out the comments, and don’t forget to grab a cup of coffee next to you ☕️

    » Read more
  • Just got played by my buddy Turso. The server suddenly crashed, and checking the logs revealed a lot of errors:

    Operation was blocked LibsqlError: PROXY_ERROR: error executing a request on the primary

    Suspicious, I went to the Turso admin panel and saw the statistics showing that I had executed over 500 million write commands!? At that moment, I was like, "What the heck? Am I being DDoSed? But there's no way I could have written 500 million."

    Turso offers users free monthly limits of 1 billion read requests and 25 million write requests, yet I had written over 500 million. Does that seem unreasonable to everyone? 😆. But the server was down, and should I really spend money to get it back online? Roughly calculating, 500M would cost about $500.

    After that, I went to the Discord channel seeking help, and very quickly someone came in to assist me, and just a few minutes later they informed me that the error was on their side and had restored the service for me. Truly, in the midst of misfortune, there’s good fortune; what I love most about this service is the quick support like this 🙏

    » Read more

Issues

Load Balancing is an important technique in distributed systems to ensure performance, availability, and scalability of applications. This is a concept that any programmer should know as it is an essential part of enhancing the performance of software applications. It ensures that the system operates stably in the face of unpredictable user access surges. However, when first learning about this concept, many things can feel quite confusing.

A simple example is about Nginx - a web server that everyone knows. In addition, Nginx also plays the role of a load balancer, meaning it is configured to distribute requests to different servers or services. Then suddenly Docker appears, and Docker also has the concept of load balancing through the swarm mechanism, which means requests to a service are distributed to the scaled containers accordingly. Not stopping there, Kubernetes can also do the same. Why is that? Load balancing appears everywhere, how should it be set up correctly and optimally?

The simplest way to set up load balancing is to set up a server whose sole purpose is to distribute user requests. But what happens if that server becomes... overloaded or crashes? Certainly, the service will be disrupted because there is nothing to coordinate user requests anymore. So instead of setting up just one, why not set up a cluster of servers whose sole job is load balancing? If one fails, the other can take over, and alternating between them will minimize risk. I completely agree! But is there another way?

We all know that both hardware and software have their limitations. The limitations of software usually depend on hardware. No matter how optimized the software is, if it runs on limited hardware, it is essentially useless. Conversely, if you create super powerful hardware combined with super optimized software, what will happen?

Is there anything about load balancing that we have yet to learn? Surely there is, because knowledge is infinite and technology changes rapidly. Therefore, in today's article, I will present my understanding of load balancing. This is a patchwork picture pieced together from fragments I have gathered over time. I hope to provide everyone with the most objective view possible. If there are any omissions or errors, I hope readers will help me recognize them.

Load Balancing Methods

First, let's talk about methods or ways of load balancing. There are two popular methods: algorithm-based and network layer-based.

Algorithm-based distribution includes Round Robin, Least Connections, or IP Hash... Among these, Round Robin is a round-robin connection algorithm, meaning it evenly distributes destinations in a circular manner. Least Connections will choose the server with the fewest connections... The characteristic of this method is to write an algorithm to distribute so that all servers participate in processing efficiently.

Network layer-based distribution relies on Layer 4 and Layer 7 of the network. We are familiar with the OSI 7-layer model. For two machines to communicate with each other, they must sequentially pass through 7 layers. Among them, Layer 4 is the Transport Layer, while Layer 7 is the Application Layer. Layer 7 is often known through protocols like HTTP, SMTP, FTP, DNS... Meanwhile, Layer 4 is much more rudimentary.

To make it easier to visualize, a connection passes through 7 layers before reaching the processing point, where both Layer 4 and Layer 7 can navigate requests elsewhere. Typically, we navigate at Layer 7, meaning we receive GET, POST protocols... before classifying and forwarding them. This method consumes a certain amount of computational resources. In return, it has more complete information about the request, making it easier to write routing logic. Meanwhile, Layer 4 offers better performance, but its access to information is limited, so it can only classify to a certain extent.

Load Balancing Tools

Tools are the means to implement load balancing methods. Tools are divided into two types: hardware and software.

First, let's talk about software. These are programs created to distribute requests to ensure the load balancing process. Nginx is an example; not only is it a web server, but Nginx also provides load balancing features at both Layer 4 and Layer 7. Next is HAProxy, a very powerful load balancing software. There are also many other names like Apache, Traefik, Envoy... Some even create entire operating systems (OS) to serve load balancing. Why is that? Is it that software is not enough? The answer is simple: it depends on whether the usage needs are simple or complex. If the software is sufficient, you do not need to use an OS. Conversely, an OS created solely focused on load balancing functions will have superior performance and features compared to software.

Next is hardware. These are physical products created solely for load balancing with maximum performance. Although I have not had direct contact, you can find information about them on the internet. They are often specialized routers designed to navigate the network or servers optimized to handle millions of connections simultaneously.

Besides hardware and software, with the "virtualization" of cloud technology, we now also benefit from the networks of major players. This has led to the emergence of new cloud-based load balancing tools. It goes without saying that everyone knows AWS and Google Cloud are very strong in this field. They have server systems in many locations worldwide along with a dense network. At this point, we no longer need to worry about a "single point of failure" anymore; instead, we must ask how many servers are available to connect to these giants.

Putting Methods and Tools Together

As the issue stated at the beginning of the article, why is there already Nginx for load balancing, yet other tools like Docker still create additional load balancing? Is that redundant and difficult for users? Absolutely not! That is completely normal because the purpose of load balancing is to enhance application performance. Nginx and Docker are independent applications; they do not necessarily have to work together. Just think, if you do not use Nginx, can Docker still balance the load?

In this section, I will present how to combine hardware and software together to create a load balancing system. Of course, there are many other ways to combine, but I hope readers can envision what I want to convey.

First, let's look at a simple diagram of a server using Nginx to distribute requests to 3 running services.

Example 1

Request is the client's request; when it reaches the server, Nginx receives it and distributes it evenly to all three according to some algorithm like Round Robin.

Next is the presence of Docker or similar tools. At this point, load balancing occurs both in Nginx and Docker.

Example 2

Wow, the system is stronger now! But everything is still on one server, so resource utilization is not very good. Let's separate them.

Example 3

So we have built a dedicated Nginx load-balancing server, while the other two continue to utilize Nginx and Docker for processing. That's quite powerful! But what if the load balancing server fails? Let's move to a new diagram.

Example 4

At this point, both hardware and software are present. The router is very powerful, dividing requests to the corresponding network operating systems, and at this stage, the operating systems distribute to the smaller servers. Wow, what happens if the router fails? Combine multiple routers together or:

Example 5

All those complex components are gone, replaced by the Cloud. This cloud load balancing tool will replace all the complex hardware and software networks because it has virtualized this layer.

Conclusion

Thus, it can be seen that load balancing appears everywhere. From hardware to software, and even with the help of cloud services. Among them, hardware is often complex, requires high technical expertise, or is suitable for enterprises needing high performance. Meanwhile, software is more diverse; from operating systems to software installed on computers, everything can manage load balancing.

References:

Premium
Hello

5 profound lessons

Every product comes with stories. The success of others is an inspiration for many to follow. 5 lessons learned have changed me forever. How about you? Click now!

Every product comes with stories. The success of others is an inspiration for many to follow. 5 lessons learned have changed me forever. How about you? Click now!

View all

Subscribe to receive new article notifications

or
* The summary newsletter is sent every 1-2 weeks, cancel anytime.

Comments (0)

Leave a comment...