HAProxy is an open source load-balancer with very impressive performance. It is used to front-end even planet-scale web applications like Twitter. It can load-balance networked applications over TCP (Layer3) and HTTP (Layer 7). It also fails over requests to active server instances when required, and therefore offers high availability (HA).
E2E Networks offers designated nodes with computing power suitable for running a high-performance load-balancer like HAProxy. These nodes are called Virtual Load Balancer (VLB) nodes. In this blog we will explain how to set up HAProxy on a Virtual Load Balancer node on E2E Cloud. We will use HAProxy at Layer 7, to frontend PHP applications running on Apache Web Servers. In the second and concluding part of this blog, we will run through a few additional steps most web-site administrators need to perform, over and above the basic set-up. These additional steps are: implementing session stickiness, and securing access to the web-site using SSL. To a large extent, these two blogs are self-contained and together, should help system administrators set up HAProxy smoothly as the front-end load-balancer for web applications deployed on E2E Cloud.
Installation And Basic Set-up
Although our focus is on HAProxy, we’ll start with installation of two web servers, to which HTTP requests can be routed. After logging into our E2E Networks Dashboard, we follow the links to ‘Create Node’, and choose a Ubuntu 16.04 distro with 1 CPU and 2GB RAM. Let this Virtual Compute Node be named ‘websrv1‘. Then we bring up another identical node to install a second web-server instance, named ‘websrv2‘. On each web-server node we install Apache 2.4 and PHP, by running the following commands on the terminal (of each web-server node):
- apt-get update
- apt-get install apache2 # install web server
- apt-get install php libapache2-mod-php php-mcrypt # php and related packages
- systemctl enable apache2 # enable web server to start on system reboot
- systemctl restart apache2 # restart web server
- systemctl status apache2 # check that web server is up and running
In order to test that the web server instances can server PHP, we’ll create a simple PHP file at the document root (/var/www/html directory) on each web-server using vi editor:
echo “Hello World”;
Now, if we point our browser at http://<web_server_external_ip_address>/greet.php , this “Hello World” greeting should be displayed. We can substitute the IP address of each of the nodes websrv1 and websrv2 in turn.
So, we are ready to install HAProxy to frontend the web servers. From our E2E Networks Dashboard, we again create a node, but this time from an ‘Appliance’. We choose an appliance of type ‘Load Balancing HAProxy’. There are several available configurations, of which we choose VLB-B-1 (5 VCPUs and 4GB RAM) for our HAProxy instance.
Figure 1: Creating a Virtual Load Balancer Node
The VLB nodes on E2E Cloud are powerful compute nodes running on CentOS 7. To install HAProxy on each of these nodes, we run the following commands on the terminal:
- yum install haproxy # install HAProxy
- systemctl enable haproxy # enable HAProxy to start on system reboot
- systemctl restart haproxy # restart HAProxy
- systemctl status haproxy # check that HAProxy is up and running
In our deployment we will need two additional packages for HAProxy to function: syslog and openssl. Since the VLB nodes (running on CentOS) come pre-installed with these two packages, no additional installation step is involved. However, we do need a few configuration steps for HAProxy even before load-balancing starts to work.
HAProxy Generic Configuration
The HAProxy configuration file is located at /etc/haproxy/haproxy.cfg , and comes with some out-of-the box-settings, a few of which may have to be tweaked. This configuration file is expected to have at least one ‘frontend‘ and at least one ‘backend‘ defined. Each frontend load-balances to one or more backends. A backend consists of several web servers serving the same web application, for scalability and redundancy. But there is also a section in this file labeled ‘generic‘ that pertains to the overall HAProxy installation and a ‘defaults‘ section that specifies parameter values applicable to all frontend and backends defined here.
Max Connections: In the ‘global‘ section, we set the ‘maxconn‘ parameter (the maximum number of client connections this load balancer can handle across all backends it caters to). This parameter can be set using the HAProxy sizing guidelines, or conversely, we can choose a VLB node from E2E Cloud, based on the expected peak load our website may encounter. With our chosen VLB-B-1 node, 4096 is a reasonable value for ‘maxconn‘ at the global level.
Figure 2: Global and Default Configuration Parameters for HAProxy
Protocol: In our deployment HAProxy will intercept and load-balance HTTP traffic. This is configured by setting the mode in the defaults section above.
Logging: It is recommended that HAProxy should use syslog. On CentOS, this means setting the ‘log’ parameter to ‘local2‘ as shown in the screenshot above. Furthermore, the syslog configuration on the HAProxy node (/etc/rsyslog.conf) must also be modified so that it listens on the UDP port 514.
Figure 3: Syslog configuration on HAProxy node
To make sure that all log messages on ‘local2‘ to a separate file exclusively for HAProxy (/var/log/haproxy.log), we have to create a file haproxy.conf in the directory /etc/rsyslog.d and type in the following line:
For end-to-end request traceability, we should also tweak the Apache Web Server log format in the web server configuration file (/etc/apache2/apache2.conf). On each web server node, we modify LogFormat to include the actual client address (X-Forwarded-For) from which the HTTP request originated (instead of displaying the IP address of the HAProxy node).
Figure 4: Apache Web Server LogFormat
HAProxy Statistics: HAProxy can be configured to display request statistics on a web-based UI. We can set a URL for displaying HAProxy stats (/lb-stats) protected using basic authentication.
Other Defaults: The ‘defaults‘ section has a bunch of time-out parameters which we left unchanged. But in production, each site administrator may want to fine-tune the time-outs based on factors like network latency, processing time at the server end, etc. We did introduce the option ‘httpclose‘ so that HTTP client connections are closed as soon as response is sent back, without consuming resources unnecessarily (unless there is a Keep-Alive setting for HTTP connections). And we ensured that the actual client IP address is forwarded all the way to the web-server for traceability (using option ‘option forwardfor‘ parameter). For detailed definitions of the HAProxy configuration parameters, HAProxy documentation should be consulted.
Simple Round-Robin Load-Balancing
Initially, we set up only a single website (consisting of the simple PHP Greeter app mentioned previously), to be load-balanced by our HAProxy installation. We have already set up two web servers and one load-balancer (HAProxy) as listed on the E2E Cloud Dashboard.
Figure 5: Web-server and Load-balancer nodes
First we bind HAProxy to a suitable IP address and port (usually port 80 for http mode) on our VLB node. The IP address has to be external-facing, to accept client requests. This leads us to defining a ‘frontend‘, named ‘httptraffic‘, like the following (in /etc/haproxy/haproxy.cfg). Here, by default, this frontend load-balances any client request to the backend named ‘site‘.
Figure 6: Frontend configuration for HAProxy
So, obviously, we must define a ‘backend‘ named ‘site‘. It should consist of the IP addresses of web servers to route requests to (although, we should use internal / private addresses for this purpose). And we specify the load-balancing policy to be ‘round-robin‘. HAProxy also allows us to enable periodic health-check of the backend nodes, using the ‘check‘ parameter.
Figure 7: Backend Configuration for HAProxy
At this point we must restart syslog and HAProxy on the load-balancer node:
- systemctl restart rsyslog
- systemctl restart haproxy
Before we access the web-application through HAProxy, we need to open the port 80 (for HAProxy set up for http mode) on the virtual load balancer node, and the UDP port 514 for syslog. On CentOS, this requires setting up the iptables appropriately:
- iptables -A INPUT -m state –state ESTABLISHED,RELATED -j ACCEPT
- iptables -A INPUT -i eth0 -p udp –dport 514 -j ACCEPT
- iptables -A INPUT -i eth0 -p tcp –dport 80 -m state –state NEW,ESTABLISHED -j ACCEPT
- systemctl restart iptables
At this stage we can actually test out load-balancing functionality and some of the configuration settings (including logging and stats collection) mentioned in the previous section.
To detect requests landing on HAProxy, we run the following command on the load-balancer node:
- tail -f /var/log/haproxy.log
For HAProxy, this log file (/var/log/haproxy.log) is updated through syslog.
Similarly, we can check out the access logs on each web-server node to trace the request end-to-end.
- tail -f /var/log/apache2/access.log
Now, we access the simple PHP Greeter application from two different machines. On each client machine we can point a browser to the URL:
We should find updates to the HAProxy logs. (This also verifies that our syslog configuration is working.)
Figure 8: HAProxy Logs (Round-robin load-balancing)
One of the requests is served by the node websrv1 while the other is served by the node websrv2. This is evident from the access logs at each web server, and by closely matching the timestamps. The client IP addresses are also logged within the HAProxy and in the web-server logs.
Figure 9: Access Logs from node: websrv1
Figure 10: Access Logs from node: websrv2
The following greetings will be displayed on the browser on each client machine:
Figure 11: Browser Display
Load-balancer Statistics: Finally we can check out the HAProxy statistics by pointing our browser to the following URL (configured in the ‘defaults‘ section):
Figure 12: Load-balancer Statistics
Conclusion And Next Steps
So far we have successfully set up load-balancing on the E2E cloud using HAProxy. In the next (and concluding) part of this blog, we will take this configuration to the next level by enabling sticky sessions and secure access to the web-site, and also set up a larger web-site with multiple applications. These steps will take us closer to a production deployment.
Please follow the link below to see the step 2: