Here is all you need to know about Bonding, Routing and Network Performance tuning in Linux. This should serve you as a single blue print for setting up Networking on your Linux server.


Lets assume the below example infrastructure.

Subnet Gateway IP
192.168.1.0/24 192.168.1.1 192.168.1.11
10.13.43.0/24 10.13.43.1 10.13.43.8

Choose default GW from one of the subnets in /etc/sysconfig/network file as in below.

$ cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=myhostname
GATEWAY=10.13.43.1

1. SCENARIO 1: Different NIC bonding Modes
2. SCENARIO 2: Two separate NICS for each subnet
3. SCENARIO 3: single NIC – multiple subnets
4. SCENARIO 4: bond0 with slave NICs – Bond Mode 4 LACP
5. SCENARIO 5: having 2 IPs from same subnet using alias (on single NIC or Bonded NIC)
6. SCENARIO 6: To use server as Router or Gateway
7. Testing Server Reachability with TCP dump
8. Network Performance tuning

SCENARIO 1: Different NIC bonding modes

Configure NIC1 as slave to bond0

$ cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
HWADDR=08:00:27:5C:A8:8F
TYPE=Ethernet
ONBOOT=yes
NM_CONTROLLED=no
MASTER=bond0
SLAVE=yes

Configure NIC2 as slave to bond0

$ cat /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
TYPE=Ethernet
ONBOOT=yes
NM_CONTROLLED=no
MASTER=bond0
SLAVE=yes

Configure bond0 as master
$ Cat /etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=bond0
IPADDR=192.168.1.11
NETMASK=255.255.255.0
GATEWAY=192.168.1.1
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=static
BONDING_OPTS="mode=0 miimon=100" # mode 0: round-robin mode for fault tolerance and load balancing.
#BONDING_OPTS="mode=1 miimon=100" # mode 1: active-backup mode for fault tolerance.
#BONDING_OPTS="mode=2 miimon=100" # mode 2: XOR (exclusive-or) mode for fault tolerance and load balancing.
#BONDING_OPTS="mode=3 miimon=100" # mode 3: broadcast mode for fault tolerance. All transmissions are sent on all slave interfaces.
#BONDING_OPTS="mode=4 miimon=100 lacp_rate=0" # mode 4: LACP active-active mode for performance. Needs LACP on Switch
#BONDING_OPTS="mode=5 miimon=100" # mode 5: Transmit Load Balancing (TLB) mode for fault tolerance & load balancing.
#BONDING_OPTS="mode=6 miimon=100" # mode 6: Active Load Balancing (ALB) mode for fault tolerance & load balancing.

SCENARIO 2: Two separate NICS for each subnet

This scenario is when two different VLANs or SUBNETS need to access the server with NIC bonding for High Availability. Since, we will bond two NICs as bond0 in active-active mode for HA, for this to work LACP configuration is needed on the switch.

We also want to achieve symmetric routing wherein, packets should not follow a route (NIC) in an outbound direction and return via a different route (NIC) in the inbound direction. Linux Kernel by default rejects packets that has followed such an asymmetric route. Thus, when you ping a server of subnet B, through default route/default gateway of subnet A on NIC-1 and if that host pings back directly inbound using the NIC2 without using the original route, kernel rejects those inbound packets. To overcome that, below kernel parameter can be set to 0 OR 2 as a dirty workaround. However, this enables asymmetric routing and thus not recommended.

$ cat /etc/sysctl.conf | grep -i rp_filter
net.ipv4.conf.default.rp_filter = 2 #Default is 1. Different modes explained below. 

0 — No restrictions. After sending request to a host via certain NIC, Allow host to reply, regardless of which NIC is being used to send reply packets. (NOT RECOMMENDED)
1 — Strict mode.  After sending request to a host via certain NIC, Allow host to reply, only if the host sends reply packets via the same NIC. (DEFAULT/RECOMMENDED)
2 — Loose. After sending request to a host via certain NIC, Allow host to reply, only if the host replys via one of NICs through which the same host is reachable. (NOT RECOMMENDED)

To be able to ping both gateways and keep the routing symmetric, static routes can be implemented.

Network file for NIC 1 – ens192

$ cat /etc/sysconfig/network-scripts/ifcfg- ens192
DEVICE= ens192
BOOTPROTO=none
ONBOOT=yes
NM_CONTROLLED=no
IPADDR=192.168.1.11
NETMASK=255.255.255.0

Network file for NIC 2 – ens133
$ cat /etc/sysconfig/network-scripts/ifcfg-ens133
DEVICE= ens133
BOOTPROTO=none
ONBOOT=yes
NM_CONTROLLED=no
IPADDR=10.13.43.8
NETMASK=255.255.255.0

Create backup of /etc/iproute2/rt_tables
cp -p /etc/iproute2/rt_tables /etc/iproute2/rt_tables.bkp

Creating Entries for routing table names for different NICs
echo '1 vlan_815' >> /etc/iproute2/rt_tables
echo '2 vlan_840' >> /etc/iproute2/rt_tables

(If you have a 3rd NIC then it will be: echo ‘3 other_vlan’ >> /etc/iproute2/rt_tables )

Create route files for NIC 1 and NIC2 and add the following. The following tells which gateway to use when communicating to each subnet

$ cat /etc/sysconfig/network-scripts/route-ens192
default via 192.168.1.1 dev ens192 table vlan_815 #Specify default gateway and NIC to use, for vlan_815

$ cat /etc/sysconfig/network-scripts/route-ens133
default via 10.13.43.1 dev ens133 table vlan_840 #Specify default gateway and nic to use, for vlan_840

Create rule files for NIC 1 and NIC2 and add the following. The following tells which routing table rules to use when communicating to each subnet
$ cat /etc/sysconfig/network-scripts/rule-ens192
iif ens192 table vlan_815  #route requests from ens192 back via ens192
from 192.168.1.11 table vlan_815 #reply to requests from 192.168.1.11 (IP of ens192) back via ens192
from 192.168.1.0/24 table vlan_815 #route requests from 192.168.1.0/24 subnet back via ens192

$ cat /etc/sysconfig/network-scripts/rule-ens133
iif ens133 table vlan_840 #route requests from ens133 back via ens133
from 10.13.43.8 table vlan_840 #reply to requests from 10.13.43.8 (IP of ens192) back via ens133
from 10.13.43.0/24 table vlan_840 #route requests from 10.13.43.0/24 subnet back via ens133

SCENARIO 3: single NIC – multiple subnets

This is used when server needs to be accessed from two different VLAN or subnets however only one NIC is available. Since we have single NIC for both VLANs, we will create virtual NIC for each VLAN.

$ cat /etc/sysconfig/network-scripts/ifcfg-ens192
DEVICE= ens192
BOOTPROTO=none
ONBOOT=yes
NM_CONTROLLED=no

Create virtual NIC-1 on ens192 for vlan 815
$ cat /etc/sysconfig/network-scripts/ifcfg-ens192.815
DEVICE= ens192.815
BOOTPROTO=none
ONBOOT=yes
NM_CONTROLLED=no
IPADDR=192.168.1.11
NETMASK=255.255.255.0
VLAN=yes

Create virtual NIC-2 on ens192 for vlan 840
$ cat /etc/sysconfig/network-scripts/ifcfg-ens192.840
DEVICE= ens192.840
BOOTPROTO=none
ONBOOT=yes
NM_CONTROLLED=no
IPADDR=10.13.43.8
NETMASK=255.255.255.0
VLAN=yes

Create backup of /etc/iproute2/rt_tables
cp -p /etc/iproute2/rt_tables /etc/iproute2/rt_tables.bkp

Creating Entries for routing table names for different NICs
echo '1 vlan_815' >> /etc/iproute2/rt_tables
echo '2 vlan_840' >> /etc/iproute2/rt_tables

(If you have a 3rd NIC then it will be: echo ‘3 other_vlan’ >> /etc/iproute2/rt_tables )

Create route files for virtual NIC 1 and virtual NIC2 and add the following. The following tells which gateway to use when communicating to each subnet

$ cat /etc/sysconfig/network-scripts/route-ens192.815
192.168.1.0/24 via 192.168.1.1 dev ens192.815

$ cat /etc/sysconfig/network-scripts/route-ens192.840
10.13.43.0/24 via 10.13.43.1 dev ens192.840

Create rule files for virtual NIC 1 and virtual NIC2 and add the following. The following tells which routing table rules to use when communicating to each subnet
$ cat /etc/sysconfig/network-scripts/rule-ens192.815
iif ens192.815 table vlan_815 #route requests from ens192.815 back via ens192.815
from 192.168.1.11 table vlan_815 #reply to requests from 192.168.1.11 (IP of ens192.815) back via ens192.815
from 192.168.1.0/24 table vlan_815 #route requests from 192.168.1.0/24 subnet back via ens192.815

$ cat /etc/sysconfig/network-scripts/rule-ens192.840
iif ens192.840 table vlan_840 #reply to requests from ens192.840 back via ens192.840
from 10.13.43.8 table vlan_840 #reply to requests from 10.13.43.8 (IP of ens192.840) back via ens192.840 
from 10.13.43.0/24 table vlan_840 #reply to requests from 10.13.43.0/24 subnet back via ens192.840

SCENARIO 4: bond0 with slave NICs and multiple VLANs.

In the previous scenario we used single NIC. Here its is just the same, we will be using a bonded NIC(bond0) in place of the single NIC, and we will create virtual NICs out of bond0 nic for each VLAN.

Configure NIC1 as slave to bond0

$ cat /etc/sysconfig/network-scripts/ifcfg-ens192
DEVICE=ens192
BOOTPROTO=none
ONBOOT=yes
NM_CONTROLLED=no
MASTER=bond0
SLAVE=yes

Configure NIC2 as slave to bond0
$ cat /etc/sysconfig/network-scripts/ifcfg-ens133
DEVICE=ens133
BOOTPROTO=none
ONBOOT=yes
NM_CONTROLLED=no
MASTER=bond0
SLAVE=yes

Configure bond0 as master
Bonding ens192 and ens133 in LACP mode 4 i.e Active-Active for High availability.
$ cat /etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=bond0
BOOTPROTO=none
ONBOOT=yes
NM_CONTROLLED=no
BONDING_OPTS="mode=4 miimon=100 lacp_rate=0"

Create virtual NIC-1 on bond0 for vlan 815
$ cat /etc/sysconfig/network-scripts/ifcfg-bond0.815
DEVICE=bond0.815
BOOTPROTO=none
ONBOOT=yes
NM_CONTROLLED=no
IPADDR=192.168.1.11
NETMASK=255.255.255.0
VLAN=yes

Create virtual NIC-2 on bond0 for vlan 840
$ cat /etc/sysconfig/network-scripts/ifcfg-bond0.840
DEVICE=bond0.840
BOOTPROTO=none
ONBOOT=yes
NM_CONTROLLED=no
IPADDR=10.13.43.8
NETMASK=255.255.255.0
VLAN=yes

Create backup of /etc/iproute2/rt_tables
cp -p /etc/iproute2/rt_tables /etc/iproute2/rt_tables.bkp

Create route files for virtual NIC 1 and virtual NIC2 and add the following. The following tells which gateway to use when communicating to each subnet
$ cat /etc/sysconfig/network-scripts/route-bond0.815
192.168.1.0/24 via 192.168.1.1 dev bond0.815

$ cat /etc/sysconfig/network-scripts/route-bond0.840
10.13.43.0/24 via 10.13.43.1 dev bond0.840

Create rule files for virtual NIC 1 and virtual NIC2 and add the following. The following tells which routing table rules to use when communicating to each subnet
$ cat /etc/sysconfig/network-scripts/rule-bond0.815
iif bond0.815 table vlan_815 #route requests from bond0.815 back via bond0.815
from 192.168.1.11 table vlan_815 #reply to requests from 192.168.1.11 (IP of bond0.815) back via bond0.815
from 192.168.1.0/24 table vlan_815 #route requests from 192.168.1.0/24 subnet back via bond0.815

$ cat /etc/sysconfig/network-scripts/rule-bond0.840
iif bond0.840 table vlan_840 #reply to requests from bond0.840 back via bond0.840
from 10.13.43.8 table vlan_840 #reply to requests from 10.13.43.8 (IP of bond.840) back via bond0.840 
from 10.13.43.0/24 table vlan_840 #reply to requests from 10.13.43.0/24 subnet back via bond0.840

SCENARIO 5: having 2 IPs from same subnet using alias (on single NIC or Bonded NIC)

If a second IP is needed from same subnet, then an alias can be created.
If there are two NICS, it is suggested to bond two NICs and have 1st IP assigned to bond0 and add 2nd IP as an alias to the same.

Creating Alias IP for bond0, so that NIC can reached with a second IP as well. Need to make sure that the Alias IP comes from one of the configured Subnets 840 or 815.

$ cat /etc/sysconfig/network-scripts/ifcfg-bond0.840:0
DEVICE=bond0.840:0
BOOTPROTO=none
ONBOOT=yes
NM_CONTROLLED=no
IPADDR=10.13.43.9
NETMASK=255.255.255.0
VLAN=yes

Example of Alias IP on a non-bonded NIC.
$ cat /etc/sysconfig/network-scripts/ifcfg-ens133:0
DEVICE=ens133:0
BOOTPROTO=none
ONBOOT=yes
NM_CONTROLLED=no
IPADDR=10.13.43.9
NETMASK=255.255.255.0

NOTE: :0 can be incremented as :1 and :2 etc for more Aliases.

SCENARIO 6: To use server as Router or Gateway:

To use the server as a router, the server needs to be able to route packets. To enable routing, below parameter is needed.

$ cat /etc/sysctl.conf | grep -i ip_forward

net.ipv4.ip_forward = 0 #By default this is 0. To allow this server to be used as default gateway, set this to 1.

Testing Server Reachability with TCP dump:

Ping the different IPs assigned to the system from a system of external network.
For example, Lets ping from 10.11.174.6 to our server IPs (10.13.43.9 and 10.13.43.10) which corresponds to NIC1 and NIC2 respectively.

In our server, we can run the below TCP dump command to see incoming requests from 10.11.174.6.

$ tcpdump host 10.11.174.6 #Listen on both NICS
$ tcpdump -i eth0 host 10.11.174.6 #listen on eth0 only
$ tcpdump -i eth1 host 10.11.174.6 #listen on eth1 only

All the IPs assigned should be pingable from external network indicating a successful routing implementation. Tcp dump output will show that server is listening on a specific interface, with respect to IP that is being pinged form outside.

Below is a sample output of a typical 2 subnet configuration with 2 NICs, one NIC per subnet.
ip route show all should display the custom routes defined and route -n should display default routes

$ ip route show all

192.168.1.0/24 dev ens192  proto kernel  scope link  src 192.168.1.11
10.13.43.0/24 dev ens133  proto kernel  scope link  src 10.13.43.8
169.254.0.0/16 dev ens192  scope link  metric 1002
169.254.0.0/16 dev ens133  scope link  metric 1003
default via 192.168.1.11 dev ens192

$ route -n

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.168.1.0      0.0.0.0         255.255.255.0   U     0      0        0 ens192
10.13.43.0      0.0.0.0         255.255.255.0   U     0      0        0 ens133
169.254.0.0     0.0.0.0         255.255.0.0     U     1002   0        0 ens192
169.254.0.0     0.0.0.0         255.255.0.0     U     1003   0        0 ens133
0.0.0.0         10.190.8.1      0.0.0.0         UG    0      0        0 ens192

Network Performance Tuning

The following are important Network performance parameters to update in /etc/sysctl.conf.

Enable TCP scaling I.E scale the TCP window size based on available bandwidth

net.ipv4.tcp_window_scaling = 1  

Calculate socket send buffer size – net.core.rmem_max & receive buffer size – net.core.wmem_max
Optimal size = (size of the link in MB/s) x (round trip delay in seconds)
calculate size of link
$ ethtool eth1 | grep Speed

        Speed: 10000Mb/s

Calculate Round trip delay in seconds

$ ping 74.125.28.147

PING 74.125.28.147 (74.125.28.147) 56(84) bytes of data.
64 bytes from pc-in-f147.1e100.net (74.125.28.147): icmp_seq=1 ttl=36 time=25.8 ms
64 bytes from pc-in-f147.1e100.net (74.125.28.147): icmp_seq=2 ttl=36 time=25.8 ms
64 bytes from pc-in-f147.1e100.net (74.125.28.147): icmp_seq=3 ttl=36 time=25.8 ms
^C
--- 74.125.28.147 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2335ms
rtt min/avg/max/mdev = 25.874/25.888/25.896/0.131 ms

Round trip delay in seconds = 25.8/1000 = 0.0258
Thus, optimal size = 10000 x 0.0258 = 258MB
258MB = 270532608 Bytes

net.core.rmem_max = 270532608
net.core.wmem_max = 270532608
net.ipv4.tcp_rmem = 4096        87380   270532608
net.ipv4.tcp_wmem = 4096        16384   270532608

Turn on SYN-flood protections
net.ipv4.tcp_syncookies=1

Max number of “backlogged sockets” (connection requests that can be queued for any given listening socket)
net.core.somaxconn = 50000

Increase max number of sockets allowed in TIME_WAIT
net.ipv4.tcp_max_tw_buckets = 1440000

Number of packets to keep in the backlog before the kernel starts dropping them
net.ipv4.tcp_max_syn_backlog = 3240000

We would decrease the default values for tcp_keepalive_* paramerers as follow.
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 9

The TCP FIN timeout sets the amount of time a port must be inactive before it can reused for another connection. The default is often 60 seconds, but can normally be safely reduced to 30 or even 15 seconds
net.ipv4.tcp_fin_timeout = 7

Make changes permanent
$sysctl -p /etc/sysctl.conf