Getting Started with HA Proxy [High Availability]

February 17, 2013

HAProxy is a very fast, very awesome and very free TCP/HTTP Load Balancer and Failover program. Its very simplistic but also quite powerful. The configuration is flexible enough to fit into many high traffic infrastructures but also simple enough to fit into any design seeking simple high availability. It can support tens of thousands of connections at high speeds and can even private a private to public address “translation” if you will by hiding web servers from the internet.

One of the most popular scenarios for HA Proxy is load balancing in its simplest form. A web server struggling to keep up with traffic demands would benefit by throwing the software in front of the web server, adding a second server and balancing between the two. This would effectively cut the load in half on any individual server and distribute it amongst the two, or however many you want, and should yield a much more stable web site.

However load balancing is only the beginning, because of the powerful nature of the config many webmasters have found that much of its value lies in its ability to be a “traffic cop” if you will. Imgur has a really interesting post on how they utilize the software, not for load balancing, but for directing any image requests to another server. Essentially, HA Proxy is grabbing any GET requests destined for any JPG, PNG or whatever and sending to a separate Lighttpd process and everything else goes to apache. It would be trivial to take this concept and use it to your advantage. Rather than writing any custom code to separate out certain elements you could simply write your requests in HA Proxy to rewrite your images, javascripts, css files, whatever to a CDN and send all traffic to a dedicated server. Its the flexibility and power that makes this a necessary piece of software for any linux fan to take hold of. There are dozens of other case uses (Conveniently located on their site) and quite a few more advanced features. For now, lets just get this software installed and setup a basic round robin config setup!

The Setup

The setup that we will be working with is populated with 1 HAProxy box and 2 web servers all on a single LAN. You can see the layout in this image:

HAProxy Example Layout

All configuration will be done on the HAProxy machine and it is assumed that the web servers are already configured and displaying correctly. The HAProxy box I am using is running CentOS 5. Lets get started!

Installing

If you are using CentOS, you will need to enable the EPEL repo and then simply install it via yum:

yum install haproxy

Debian should have this package in the default repo:

 sudo apt-get install haproxy

If you are like me and love installing things from source, first make sure you have gcc installed. If you’re on CentOS, I just like installing all of the development tools. If you are low on space or resources you may not want to install everything but its just easier in my opinion:

yum groupinstall "Development Tools"

The download and compile the program:

cd /usr/local/src/
wget http://haproxy.1wt.eu/download/1.4/src/haproxy-1.4.22.tar.gz
tar xzf haproxy-1.4.22.tar.gz
cd haproxy-1.4.22
make TARGET=linux26 #Change TARGET accordingly
cp haproxy /usr/sbin/

On the fifth line, make sure that you change the TARGET to your linux version. If you are unsure, run the following and record the first 2 numbers:

uname -r

You can also append “ARCH=i386” to build the 32 bit binary if you are running on a 64 but system. HAProxy works on quite a few machines, you have the following options when it comes to the TARGET variable:

linux22 for Linux 2.2
linux24 for Linux 2.4 and above (default)
linux24e for Linux 2.4 with support for a working epoll (> 0.21)
linux26 for Linux 2.6 and above
linux2628 for Linux 2.6.28 and above (enables splice and tproxy)
solaris for Solaris 8 or 10 (others untested)
freebsd for FreeBSD 5 to 8.0 (others untested)
openbsd for OpenBSD 3.1 to 4.6 (others untested)
aix52 for AIX 5.2
cygwin for Cygwin
generic for any other OS.
custom to manually adjust every setting

You can launch the haproxy by running the following command:

haproxy -f /etc/haproxy/haproxy.cfg -sf `pgrep haproxy`

This can be ran multiple times without fear of spawning overlapping processes. The “-sf pgrep haproxy” will kill off the previous command if it exists and launch HAProxy using /etc/haproxy/haproxy.cfg as its config, which we will configure in the next section.

You can also create an init script from the file included:

cp examples/haproxy.init /etc/init.d/haproxy
chmod a+x /etc/init.d/haproxy

Note that this will not run until /etc/haproxy/haproxy.cfg is configured…so lets do that!

Build a Basic Config

We are going to assume that your config is held in /etc/haproxy/haproxy.cfg. Please note that if you installed this with a precompiled binary such as through a deb package or RPM this may be different. For us though, we need to first create the /etc/haproxy folder, once created touch a file called haproxy.cfg. Open the file with your favorite editor and paste the following:

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        retries 3
        option redispatch
        maxconn 20000
        contimeout      5000
        clitimeout      50000
        srvtimeout      50000

listen webservers 192.168.1.4:80
       mode http
       stats enable     #Optional: Enables stats interface
       stats uri /haproxy?stats     #Optional: Statistics URL
       stats realm Haproxy\ Statistics
       stats auth haproxy:stats     #Optional: username:password of site
       balance roundrobin     #balance type
       cookie webserver insert indirect nocache 
       option httpclose
       option forwardfor     #Add X-ForwardFor in the header
       server server1 192.168.1.5:80 cookie server1 weight 1 check
       server server2 192.168.1.6:80 cookie server2 weight 1 check

Modify this if needed and save it to /etc/haproxy/haproxy.cfg. Once you’re finished we can actually check the syntax of this config file by running the following:

haproxy -f /etc/haproxy/haproxy.cfg -c

If its good, you should see “Configuration file is valid”. Go ahead and start everything up!

/etc/init.d/haproxy start

Go ahead and visit http://192.168.1.4 in your web browser and you should see the page from one of your two web servers! Pretty cool right?!

Config Commentary

I added a few comments to some of the lines within the config to help better explain what it does. Here is a little more info on some of those commands.

Stats Commands
- stats enable - Simply enables the statistics web interface
- stats uri /haproxy?stats - This defines the url where you can visit the stats. ex: http://192.168.1.4/haproxy?stats
- stats realm Haproxy\ Statistics - The site info in the authentication box
- stats auth haproxy:stats - The username and password of the url above. If this is removed the stats site will not require authentication
balance roundrobin - The method used to balance between the two servers. Here is a list of the more popular options below, the full list can be found in the documentation
- round robin - Each server is used in turns, according to their weights. This is the smoothest and fairest algorithm when the server’s processing time remains equally distributed.
- source - The source IP address is hashed and divided by the total weight of the running servers to designate which server will receive the request. This ensures that the same client IP address will always reach the same server as long as no server goes down or up. If the hash result changes due to the number of running servers changing, many clients will be directed to a different server.
- leastconn - The server with the lowest number of connections receives the connection. Round-robin is performed within groups of servers of the same load to ensure that all servers will be used. Use of this algorithm is recommended where very long sessions are expected, such as LDAP, SQL, TSE, etc… but is not very well suited for protocols using short sessions such as HTTP
server server1 192.168.1.5:80 cookie server1 weight 1 check - There are a lot of options you can use here. Lets break some down
- cookie [value] - Sets the name of the cooke to store on the client. In this case its named server1
- weight [int] - This, to put it simply, puts a specific value on the servers power/importance. In our scenario both are valued at 1 meaning that both are taken as equally powerful and thus should be equally distributed. The default is 1, the max is 256. The higher the number the more load on the server proportional to the values of the other servers.
- backup - If this value is added in the server line, it is disabled unless all non backup servers go down. This is good if you want to only use this as a failover.
- check - This enables health checks on the server. If this is not added a server will always be considered available even if it actually is not physically up.
- disabled - When added the server is marked as down in maintenance mode
- fall [int] - States how long to wait until the server is considered to be down. Default is 3
- rise [int] - States how long to wait until the server is backup. Default is 2

There are many more options and more explanations which can be found in the documentation.

What Next?

About that documentation, you can find it here. There are far too many options to detail here, this being a beginners guide. There are some more examples and commands that are pretty cool and should help you out!

monitor-uri /haproxy - When this is enabled, visiting http://192.168.1.4/haproxy it simply provides a success 200 if the haproxy service is working
option httpchk /check.html - Essentially its a much more powerful version of the “check” command. It checks the backend servers and checks /check.html. It its anything but a valid 200 response, it marks the server as down. You can take this a step further and create a php page that check for apache, mysql, server load, etc… and if any one of those is invalid it sends an error 500. This will take the server down until you can correct it.

ACL’s and Frontends

I wanted to wrap this up with another sample configuration. This one actually incorporates one of my favorite features and that’s front ends and ACL’s. Take a look ad this:

global
    daemon
    #maxconn 30000
    #ulimit-n 655360
    #chroot /home/haproxy
    #uid 500
    #gid 500
    #nbproc 1
    pidfile /var/run/haproxy.pid
    log 127.0.0.1     local4 notice

defaults 
    mode http

    clitimeout          600000   # maximum inactivity time on the client side
    srvtimeout          600000   # maximum inactivity time on the server side
    timeout connect     8000     # maximum time to wait for a connection attempt to a server to succeed

    stats enable
    stats auth      admin:password
    stats uri       /admin?stats
    stats refresh   5s
    stats hide-version
    stats realm "Loadbalancer stats"
    #option httpchk GET /status
    retries     5
    option redispatch
    option forwardfor
    option httpclose

    monitor-uri /test

    balance roundrobin  # each server is used in turns, according to assigned weight

    default_backend default

frontend http
    bind :80

    acl nagios path_reg ^/nagios/?
    acl app path_reg ^/app/?
    acl mysite path_reg ^/blog/?

    use_backend nagios if nagios
    use_backend app if app
    use_backend mysite if mysite

backend default
    server web1 172.16.1.201:80 cookie A check inter 1000 rise 2 fall 5 maxqueue 50 weight 30
    server web2 172.16.1.202:80 cookie B check inter 1000 rise 2 fall 5 maxqueue 50 weight 75
    stats admin if TRUE
    option httpchk HEAD /check.txt HTTP/1.0
    errorfile 503 /var/www/html/503.html

backend nagios
    server nagios 172.16.1.217:80 check inter 1000 rise 2 fall 5 maxqueue 50 weight 30
    errorfile 503 /var/www/html/503.html

backend app
    server clear 172.16.0.39:81 check inter 1000 rise 2 fall 5 maxqueue 50 weight 30

backend mysite
    server mysite 172.16.1.203:80 check inter 1000 rise 2 fall 5 maxqueue 50 weight 30

The defaults section are a set of instructions that apply to all backend servers. Really useful if you have a few set of instructions that need to apply to everything. In the ACL section for “nagios” we see a regex value that matches /nagios. Then below it a command says that if the incoming request matches the regex value, then pass it to the “nagios” backend! So if someone hits http://192.168.1.4/nagios, HAProxy will display data from http://172.16.1.217/nagios! You can also match GET requests as stated in the beginning and match images, javascript files or whatever and use an ACL to route them wherever you want!

HAProxy’s config is really easy to use and it has a lot of value in a web stack. I hope this has given you at least the basics of HAProxy and that you have found some value in it. Enjoy!

The Wonderful World of Linux A mostly dead cpanel/linux blog

Getting Started with HA Proxy [High Availability]