NginX is one of the most popular web server and reverse proxy load balancers. It's widely used to accelerate websites via reverse, caching proxy for HTTP and several other internet protocols such as IMAP and SMTP .

While NginX can act as application server as will, in most cases it is used as a reverse proxy and load balancer. As reverse proxy it can detect upstream failures and route traffic to healthy ones. There are many articles about how to configure NginX as reverse proxy, so this article is about dealing with possible failures of NginX instances.

In this example we will setup two NginX servers which will proxy incoming traffic to several upstream nodes. Each of these NginX instances is configured with public and private IP addresses, so incoming traffic to public addresses is proxied  to upstreams via private network. Public DNS is configured with both IP address ether as an alias or with A records. With this setup incoming traffic is properly balanced via round-robin DNS to NginX instances, then internally proxied to upstream servers.  

Image above illustrates quite common and widely used setup. It makes equally balanced traffic to NginX and Upstream, also NginX detects upstream failures and proxy traffic to healthy ones. Both machines have identical configuration, so it’s safe just to configure one server and copy config files to another one .

Code below is an example of reverse proxy and with SSL termination.

upstream appservers {
	server 192.168.10.120 max_fails=2 fail_timeout=60;
	server 192.168.10.121 max_fails=2 fail_timeout=60;
	server 192.168.10.122 max_fails=2 fail_timeout=60;
	server 192.168.10.123 max_fails=2 fail_timeout=60;
}

server {
	listen   80;
	server_name  example.com  *.example.com;
	return 301 https://$host$request_uri;
}

server {
	listen 443  reuseport ssl http2;
	server_name  example.com  *.example.com;
	add_header Strict-Transport-Security "max-age=31536000";
	ssl on;
	ssl_certificate_key /etc/nginx/ssl/example_private.key;
	ssl_certificate /etc/nginx/ssl/example_bundle.crt;
	ssl_prefer_server_ciphers on;
	ssl_ecdh_curve prime256v1;
	ssl_protocols TLSv1.1 TLSv1.2;
	location / {
		proxy_pass http://appservers;
		include proxy_params;
	}
}

Now we need to make NginX proxy servers fault tolerant so if one of the servers fails the incoming traffic is automatically routed to healthy server and routed back when failed server is healthy. For this we will need two IP addresses which will float between NginX instances.

We will use KeepaliveD for this task. It's free OpenSource and can be installed via package managers of most of the Linux distributions. On Debian family systems, just run on both NginX servers:

apt-get install keepalived 

Next step is to configure KeepeliveD. Let's assume that NginX servers have private IP addresses in 192.168.10.0/24 range as upstream servers. And floating IP address are from range 10.10.10.0/24 (Note 10.10.10.0/24 is actually not a public IP range, you should use public IP addresses assigned by your ISP)

lb1.example.local
Private IP:  192.168.10.2/24
Public IP:   10.10.10.20/26
Floating IP: 10.10.10.2/26

lb2.example.local
Private IP:  192.168.10.3/24
Public IP:   10.10.10.30/26
Floating IP: 10.10.10.3/26

So we need 2x private IP addresses and 4x Public ones. 2 of 4 public addresses will be assigned to server automatically at boot time and controlled by the OS, other 2x addresses are controlled by  KeepaliveD service.

KeepaliveD also supports failover detection via health check scripts so it's a good idea to configure one.

vrrp_script chk_nginx {
    script "pidof nginx"
    interval 2
}

Snippet above is a very simple script health check: KeepaliveD runs "pidoof nginx "  every 2 seconds and if pid is not returned, node is considered as faulty.

Unlike NginX, KeepaliveD instances can't have identical configurations files, resources are configured as double Master/Slave pairs. So if specified resource is a master on lb1 it’s a slave on lb2 and vice versa. This configuration allows to move resources (IP addresses) when failures are detected and move back at recovery.

Full configuration of KeepaliveD on lb1 will look like this:

vrrp_script chk_nginx {
    script "pidof nginx"
    interval 2
}


vrrp_instance ip_2 {
    state MASTER
    interface eth1
    virtual_router_id 15
    priority 110
    advert_int 4
    authentication {
        auth_type AH
        auth_pass SomePa$$word
    }


    track_script {
        chk_nginx
    }

unicast_peer {
        192.168.10.3
    }
virtual_ipaddress {
        10.10.10.2/26 dev eth0 label eth0
}
}

vrrp_instance ip_3 {
    state BACKUP
    interface eth1
    virtual_router_id 14
    priority 109
    advert_int 4
    authentication {
        auth_type AH
        auth_pass SomePa$$word
    }

    track_script {
        chk_nginx
    }


unicast_peer {
        192.168.10.3
    }
virtual_ipaddress {
        10.10.10.3/26 dev eth0 label eth0
}
}

Configuration for lb2:

vrrp_script chk_nginx {
    script "pidof nginx"
    interval 2
}

vrrp_instance ip_2 {
    state BACKUP
    interface eth1
    virtual_router_id 15
    priority 109
    advert_int 4
    authentication {
        auth_type AH
        auth_pass SomePa$$word
    }

    track_script {
        chk_nginx
    }

unicast_peer {
        192.168.10.2
    }

virtual_ipaddress {
	10.10.10.2/26 dev eth0 label eth0 
}
}

vrrp_instance ip_3 {
    state MASTER
    interface eth1
    virtual_router_id 14
    priority 110
    advert_int 4
    authentication {
        auth_type AH
        auth_pass SomePa$$word
    }

    track_script {
        chk_nginx
    }

unicast_peer {
        192.168.10.2
    }
virtual_ipaddress {
        10.10.10.3/26 dev eth0 label eth0
}
}

Restart keealived on both servers. After several second floating IP 10.10.10.2 should be assigned to lb1 and 10.10.10.3 to lb2. If you stop NginX or power off one of machines, floating IP of that instance will be assigned to another server. When the faulty server is back online with up and running NginX daemon, the floating IP address will be reassigned.