Here is all you need to know about Bonding, Routing and Network Performance tuning in Linux. This should serve you as a single blue print for setting up Networking on your Linux server.
Lets assume the below example infrastructure.
Subnet | Gateway | IP |
192.168.1.0/24 | 192.168.1.1 | 192.168.1.11 |
10.13.43.0/24 | 10.13.43.1 | 10.13.43.8 |
Choose default GW from one of the subnets in /etc/sysconfig/network file as in below.
$ cat /etc/sysconfig/network NETWORKING=yes HOSTNAME=myhostname GATEWAY=10.13.43.1
1. SCENARIO 1: Different NIC bonding Modes
2. SCENARIO 2: Two separate NICS for each subnet
3. SCENARIO 3: single NIC – multiple subnets
4. SCENARIO 4: bond0 with slave NICs – Bond Mode 4 LACP
5. SCENARIO 5: having 2 IPs from same subnet using alias (on single NIC or Bonded NIC)
6. SCENARIO 6: To use server as Router or Gateway
7. Testing Server Reachability with TCP dump
8. Network Performance tuning
Configure NIC1 as slave to bond0
$ cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 HWADDR=08:00:27:5C:A8:8F TYPE=Ethernet ONBOOT=yes NM_CONTROLLED=no MASTER=bond0 SLAVE=yes
Configure NIC2 as slave to bond0
$ cat /etc/sysconfig/network-scripts/ifcfg-eth1 DEVICE=eth1 TYPE=Ethernet ONBOOT=yes NM_CONTROLLED=no MASTER=bond0 SLAVE=yes
Configure bond0 as master
$ Cat /etc/sysconfig/network-scripts/ifcfg-bond0 DEVICE=bond0 IPADDR=192.168.1.11 NETMASK=255.255.255.0 GATEWAY=192.168.1.1 ONBOOT=yes NM_CONTROLLED=no BOOTPROTO=static BONDING_OPTS="mode=0 miimon=100" # mode 0: round-robin mode for fault tolerance and load balancing. #BONDING_OPTS="mode=1 miimon=100" # mode 1: active-backup mode for fault tolerance. #BONDING_OPTS="mode=2 miimon=100" # mode 2: XOR (exclusive-or) mode for fault tolerance and load balancing. #BONDING_OPTS="mode=3 miimon=100" # mode 3: broadcast mode for fault tolerance. All transmissions are sent on all slave interfaces. #BONDING_OPTS="mode=4 miimon=100 lacp_rate=0" # mode 4: LACP active-active mode for performance. Needs LACP on Switch #BONDING_OPTS="mode=5 miimon=100" # mode 5: Transmit Load Balancing (TLB) mode for fault tolerance & load balancing. #BONDING_OPTS="mode=6 miimon=100" # mode 6: Active Load Balancing (ALB) mode for fault tolerance & load balancing.
This scenario is when two different VLANs or SUBNETS need to access the server with NIC bonding for High Availability. Since, we will bond two NICs as bond0 in active-active mode for HA, for this to work LACP configuration is needed on the switch.
We also want to achieve symmetric routing wherein, packets should not follow a route (NIC) in an outbound direction and return via a different route (NIC) in the inbound direction. Linux Kernel by default rejects packets that has followed such an asymmetric route. Thus, when you ping a server of subnet B, through default route/default gateway of subnet A on NIC-1 and if that host pings back directly inbound using the NIC2 without using the original route, kernel rejects those inbound packets. To overcome that, below kernel parameter can be set to 0 OR 2 as a dirty workaround. However, this enables asymmetric routing and thus not recommended.
$ cat /etc/sysctl.conf | grep -i rp_filter net.ipv4.conf.default.rp_filter = 2 #Default is 1. Different modes explained below.
0 — No restrictions. After sending request to a host via certain NIC, Allow host to reply, regardless of which NIC is being used to send reply packets. (NOT RECOMMENDED) 1 — Strict mode. After sending request to a host via certain NIC, Allow host to reply, only if the host sends reply packets via the same NIC. (DEFAULT/RECOMMENDED) 2 — Loose. After sending request to a host via certain NIC, Allow host to reply, only if the host replys via one of NICs through which the same host is reachable. (NOT RECOMMENDED)
To be able to ping both gateways and keep the routing symmetric, static routes can be implemented.
Network file for NIC 1 – ens192
$ cat /etc/sysconfig/network-scripts/ifcfg- ens192 DEVICE= ens192 BOOTPROTO=none ONBOOT=yes NM_CONTROLLED=no IPADDR=192.168.1.11 NETMASK=255.255.255.0
Network file for NIC 2 – ens133
$ cat /etc/sysconfig/network-scripts/ifcfg-ens133 DEVICE= ens133 BOOTPROTO=none ONBOOT=yes NM_CONTROLLED=no IPADDR=10.13.43.8 NETMASK=255.255.255.0
Create backup of /etc/iproute2/rt_tables
cp -p /etc/iproute2/rt_tables /etc/iproute2/rt_tables.bkp
Creating Entries for routing table names for different NICs
echo '1 vlan_815' >> /etc/iproute2/rt_tables echo '2 vlan_840' >> /etc/iproute2/rt_tables
(If you have a 3rd NIC then it will be: echo ‘3 other_vlan’ >> /etc/iproute2/rt_tables )
Create route files for NIC 1 and NIC2 and add the following. The following tells which gateway to use when communicating to each subnet
$ cat /etc/sysconfig/network-scripts/route-ens192 default via 192.168.1.1 dev ens192 table vlan_815 #Specify default gateway and NIC to use, for vlan_815
$ cat /etc/sysconfig/network-scripts/route-ens133 default via 10.13.43.1 dev ens133 table vlan_840 #Specify default gateway and nic to use, for vlan_840
Create rule files for NIC 1 and NIC2 and add the following. The following tells which routing table rules to use when communicating to each subnet
$ cat /etc/sysconfig/network-scripts/rule-ens192 iif ens192 table vlan_815 #route requests from ens192 back via ens192 from 192.168.1.11 table vlan_815 #reply to requests from 192.168.1.11 (IP of ens192) back via ens192 from 192.168.1.0/24 table vlan_815 #route requests from 192.168.1.0/24 subnet back via ens192
$ cat /etc/sysconfig/network-scripts/rule-ens133 iif ens133 table vlan_840 #route requests from ens133 back via ens133 from 10.13.43.8 table vlan_840 #reply to requests from 10.13.43.8 (IP of ens192) back via ens133 from 10.13.43.0/24 table vlan_840 #route requests from 10.13.43.0/24 subnet back via ens133
This is used when server needs to be accessed from two different VLAN or subnets however only one NIC is available. Since we have single NIC for both VLANs, we will create virtual NIC for each VLAN.
$ cat /etc/sysconfig/network-scripts/ifcfg-ens192 DEVICE= ens192 BOOTPROTO=none ONBOOT=yes NM_CONTROLLED=no
Create virtual NIC-1 on ens192 for vlan 815
$ cat /etc/sysconfig/network-scripts/ifcfg-ens192.815 DEVICE= ens192.815 BOOTPROTO=none ONBOOT=yes NM_CONTROLLED=no IPADDR=192.168.1.11 NETMASK=255.255.255.0 VLAN=yes
Create virtual NIC-2 on ens192 for vlan 840
$ cat /etc/sysconfig/network-scripts/ifcfg-ens192.840 DEVICE= ens192.840 BOOTPROTO=none ONBOOT=yes NM_CONTROLLED=no IPADDR=10.13.43.8 NETMASK=255.255.255.0 VLAN=yes
Create backup of /etc/iproute2/rt_tables
cp -p /etc/iproute2/rt_tables /etc/iproute2/rt_tables.bkp
Creating Entries for routing table names for different NICs
echo '1 vlan_815' >> /etc/iproute2/rt_tables echo '2 vlan_840' >> /etc/iproute2/rt_tables
(If you have a 3rd NIC then it will be: echo ‘3 other_vlan’ >> /etc/iproute2/rt_tables )
Create route files for virtual NIC 1 and virtual NIC2 and add the following. The following tells which gateway to use when communicating to each subnet
$ cat /etc/sysconfig/network-scripts/route-ens192.815 192.168.1.0/24 via 192.168.1.1 dev ens192.815
$ cat /etc/sysconfig/network-scripts/route-ens192.840 10.13.43.0/24 via 10.13.43.1 dev ens192.840
Create rule files for virtual NIC 1 and virtual NIC2 and add the following. The following tells which routing table rules to use when communicating to each subnet
$ cat /etc/sysconfig/network-scripts/rule-ens192.815 iif ens192.815 table vlan_815 #route requests from ens192.815 back via ens192.815 from 192.168.1.11 table vlan_815 #reply to requests from 192.168.1.11 (IP of ens192.815) back via ens192.815 from 192.168.1.0/24 table vlan_815 #route requests from 192.168.1.0/24 subnet back via ens192.815
$ cat /etc/sysconfig/network-scripts/rule-ens192.840 iif ens192.840 table vlan_840 #reply to requests from ens192.840 back via ens192.840 from 10.13.43.8 table vlan_840 #reply to requests from 10.13.43.8 (IP of ens192.840) back via ens192.840 from 10.13.43.0/24 table vlan_840 #reply to requests from 10.13.43.0/24 subnet back via ens192.840
In the previous scenario we used single NIC. Here its is just the same, we will be using a bonded NIC(bond0) in place of the single NIC, and we will create virtual NICs out of bond0 nic for each VLAN.
Configure NIC1 as slave to bond0
$ cat /etc/sysconfig/network-scripts/ifcfg-ens192 DEVICE=ens192 BOOTPROTO=none ONBOOT=yes NM_CONTROLLED=no MASTER=bond0 SLAVE=yes
Configure NIC2 as slave to bond0
$ cat /etc/sysconfig/network-scripts/ifcfg-ens133 DEVICE=ens133 BOOTPROTO=none ONBOOT=yes NM_CONTROLLED=no MASTER=bond0 SLAVE=yes
Configure bond0 as master
Bonding ens192 and ens133 in LACP mode 4 i.e Active-Active for High availability.
$ cat /etc/sysconfig/network-scripts/ifcfg-bond0 DEVICE=bond0 BOOTPROTO=none ONBOOT=yes NM_CONTROLLED=no BONDING_OPTS="mode=4 miimon=100 lacp_rate=0"
Create virtual NIC-1 on bond0 for vlan 815
$ cat /etc/sysconfig/network-scripts/ifcfg-bond0.815 DEVICE=bond0.815 BOOTPROTO=none ONBOOT=yes NM_CONTROLLED=no IPADDR=192.168.1.11 NETMASK=255.255.255.0 VLAN=yes
Create virtual NIC-2 on bond0 for vlan 840
$ cat /etc/sysconfig/network-scripts/ifcfg-bond0.840 DEVICE=bond0.840 BOOTPROTO=none ONBOOT=yes NM_CONTROLLED=no IPADDR=10.13.43.8 NETMASK=255.255.255.0 VLAN=yes
Create backup of /etc/iproute2/rt_tables
cp -p /etc/iproute2/rt_tables /etc/iproute2/rt_tables.bkp
Create route files for virtual NIC 1 and virtual NIC2 and add the following. The following tells which gateway to use when communicating to each subnet
$ cat /etc/sysconfig/network-scripts/route-bond0.815 192.168.1.0/24 via 192.168.1.1 dev bond0.815
$ cat /etc/sysconfig/network-scripts/route-bond0.840 10.13.43.0/24 via 10.13.43.1 dev bond0.840
Create rule files for virtual NIC 1 and virtual NIC2 and add the following. The following tells which routing table rules to use when communicating to each subnet
$ cat /etc/sysconfig/network-scripts/rule-bond0.815 iif bond0.815 table vlan_815 #route requests from bond0.815 back via bond0.815 from 192.168.1.11 table vlan_815 #reply to requests from 192.168.1.11 (IP of bond0.815) back via bond0.815 from 192.168.1.0/24 table vlan_815 #route requests from 192.168.1.0/24 subnet back via bond0.815
$ cat /etc/sysconfig/network-scripts/rule-bond0.840 iif bond0.840 table vlan_840 #reply to requests from bond0.840 back via bond0.840 from 10.13.43.8 table vlan_840 #reply to requests from 10.13.43.8 (IP of bond.840) back via bond0.840 from 10.13.43.0/24 table vlan_840 #reply to requests from 10.13.43.0/24 subnet back via bond0.840
If a second IP is needed from same subnet, then an alias can be created.
If there are two NICS, it is suggested to bond two NICs and have 1st IP assigned to bond0 and add 2nd IP as an alias to the same.
Creating Alias IP for bond0, so that NIC can reached with a second IP as well. Need to make sure that the Alias IP comes from one of the configured Subnets 840 or 815.
$ cat /etc/sysconfig/network-scripts/ifcfg-bond0.840:0 DEVICE=bond0.840:0 BOOTPROTO=none ONBOOT=yes NM_CONTROLLED=no IPADDR=10.13.43.9 NETMASK=255.255.255.0 VLAN=yes
Example of Alias IP on a non-bonded NIC.
$ cat /etc/sysconfig/network-scripts/ifcfg-ens133:0 DEVICE=ens133:0 BOOTPROTO=none ONBOOT=yes NM_CONTROLLED=no IPADDR=10.13.43.9 NETMASK=255.255.255.0
NOTE: :0 can be incremented as :1 and :2 etc for more Aliases.
To use the server as a router, the server needs to be able to route packets. To enable routing, below parameter is needed.
$ cat /etc/sysctl.conf | grep -i ip_forward
net.ipv4.ip_forward = 0 #By default this is 0. To allow this server to be used as default gateway, set this to 1.
Ping the different IPs assigned to the system from a system of external network.
For example, Lets ping from 10.11.174.6 to our server IPs (10.13.43.9 and 10.13.43.10) which corresponds to NIC1 and NIC2 respectively.
In our server, we can run the below TCP dump command to see incoming requests from 10.11.174.6.
$ tcpdump host 10.11.174.6 #Listen on both NICS $ tcpdump -i eth0 host 10.11.174.6 #listen on eth0 only $ tcpdump -i eth1 host 10.11.174.6 #listen on eth1 only
All the IPs assigned should be pingable from external network indicating a successful routing implementation. Tcp dump output will show that server is listening on a specific interface, with respect to IP that is being pinged form outside.
Below is a sample output of a typical 2 subnet configuration with 2 NICs, one NIC per subnet.
ip route show all should display the custom routes defined and route -n should display default routes
$ ip route show all
192.168.1.0/24 dev ens192 proto kernel scope link src 192.168.1.11 10.13.43.0/24 dev ens133 proto kernel scope link src 10.13.43.8 169.254.0.0/16 dev ens192 scope link metric 1002 169.254.0.0/16 dev ens133 scope link metric 1003 default via 192.168.1.11 dev ens192
$ route -n
Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 10.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens192 10.13.43.0 0.0.0.0 255.255.255.0 U 0 0 0 ens133 169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 ens192 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 ens133 0.0.0.0 10.190.8.1 0.0.0.0 UG 0 0 0 ens192
The following are important Network performance parameters to update in /etc/sysctl.conf.
Enable TCP scaling I.E scale the TCP window size based on available bandwidth
net.ipv4.tcp_window_scaling = 1
Calculate socket send buffer size – net.core.rmem_max & receive buffer size – net.core.wmem_max
Optimal size = (size of the link in MB/s) x (round trip delay in seconds)
calculate size of link
$ ethtool eth1 | grep Speed
Speed: 10000Mb/s
Calculate Round trip delay in seconds
$ ping 74.125.28.147
PING 74.125.28.147 (74.125.28.147) 56(84) bytes of data. 64 bytes from pc-in-f147.1e100.net (74.125.28.147): icmp_seq=1 ttl=36 time=25.8 ms 64 bytes from pc-in-f147.1e100.net (74.125.28.147): icmp_seq=2 ttl=36 time=25.8 ms 64 bytes from pc-in-f147.1e100.net (74.125.28.147): icmp_seq=3 ttl=36 time=25.8 ms ^C --- 74.125.28.147 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2335ms rtt min/avg/max/mdev = 25.874/25.888/25.896/0.131 ms
Round trip delay in seconds = 25.8/1000 = 0.0258
Thus, optimal size = 10000 x 0.0258 = 258MB
258MB = 270532608 Bytes
net.core.rmem_max = 270532608 net.core.wmem_max = 270532608 net.ipv4.tcp_rmem = 4096 87380 270532608 net.ipv4.tcp_wmem = 4096 16384 270532608
Turn on SYN-flood protections
net.ipv4.tcp_syncookies=1
Max number of “backlogged sockets” (connection requests that can be queued for any given listening socket)
net.core.somaxconn = 50000
Increase max number of sockets allowed in TIME_WAIT
net.ipv4.tcp_max_tw_buckets = 1440000
Number of packets to keep in the backlog before the kernel starts dropping them
net.ipv4.tcp_max_syn_backlog = 3240000
We would decrease the default values for tcp_keepalive_* paramerers as follow.
net.ipv4.tcp_keepalive_time = 600 net.ipv4.tcp_keepalive_intvl = 10 net.ipv4.tcp_keepalive_probes = 9
The TCP FIN timeout sets the amount of time a port must be inactive before it can reused for another connection. The default is often 60 seconds, but can normally be safely reduced to 30 or even 15 seconds
net.ipv4.tcp_fin_timeout = 7
Make changes permanent
$sysctl -p /etc/sysctl.conf