Load Balancing Overview
What Is Load Balancing?
Load balancing is a service that distributes traffic on demand. By routing requests across multiple backend servers, it increases an application’s throughput, eliminates single points of failure, and improves overall availability.
Why Do We Need Load Balancing?
Load balancing addresses the following common challenges in network services:
Insufficient Performance of a Single Server Medium- to large-scale systems or websites often experience high traffic volumes that exceed the capacity of a single server. Deploying multiple servers requires load balancing to distribute traffic and increase service capacity.
Need for Flexible Traffic Scheduling Websites or systems with massive traffic volumes demand highly efficient and precise traffic management. Load balancers provide high performance, advanced scheduling algorithms, and even Layer 7 forwarding capabilities to enable flexible traffic control.
Single Point of Failure Critical systems or websites require high availability through multi-server deployments. Load balancers perform health checks to detect failures and automatically isolate unhealthy servers, preventing service disruption caused by a single server failure.
Secure External Exposure of Services Load balancers can act as reverse proxies, offering a unified external access point. Backend services do not need to be directly exposed to the internet, which enhances overall security.
How Does Load Balancing Work?
The following diagram illustrates the working principle of load balancing:

After traveling over the internet, client requests sent to the VIP address reach the load balancer. Upon receiving a request, the load balancing service uses predefined scheduling algorithms to determine which backend server the request should be forwarded to.
The load balancing service maintains one or more backend server groups, each consisting of multiple backend servers. Based on the health status and current load of these servers, the load balancer selects an appropriate server to handle the request. This ensures that requests are evenly distributed across backend servers, preventing performance degradation or service unavailability caused by server overload.
Last updated