In this article we will show how to monitor network health from the client perspective using our AreWeDown tool. We will then disrupt communication from the client perspective to the server by using a ping flood, and will solve the problem using traffic shaping. Let’s start out with a healthy network:
| 2005-08-06 08:13:50 | are@10.50.100.190 | 101 | | 2005-08-06 08:14:12 | are@10.10.10.11 | 100 | | 2005-08-06 08:14:12 | are@10.10.10.11 | 101 | | 2005-08-06 08:14:20 | are@10.50.100.190 | 100 | | 2005-08-06 08:14:20 | are@10.50.100.190 | 101 | | 2005-08-06 08:14:42 | are@10.10.10.11 | 100 | | 2005-08-06 08:14:42 | are@10.10.10.11 | 101 | | 2005-08-06 08:14:50 | are@10.50.100.190 | 100 | | 2005-08-06 08:14:50 | are@10.50.100.190 | 101 | | 2005-08-06 08:15:12 | are@10.10.10.11 | 100 | | 2005-08-06 08:15:12 | are@10.10.10.11 | 101 | | 2005-08-06 08:15:20 | are@10.50.100.190 | 100 | | 2005-08-06 08:15:20 | are@10.50.100.190 | 101 | |
See this article for information on the utility we are using to test with. Basically, this measures network health by showing how long it takes to make two consecutive TCP requests from the client’s perspective. The 101 entry is sent right after the 100 entry, so they should be pretty much happening at the same time. A ping looks like this:
[root@srv-1 usr-1]# ping 10.10.10.11 PING 10.10.10.11 (10.10.10.11) 56(84) bytes of data. 64 bytes from 10.10.10.11: icmp_seq=0 ttl=127 time=18.6 ms 64 bytes from 10.10.10.11: icmp_seq=1 ttl=127 time=18.7 ms 64 bytes from 10.10.10.11: icmp_seq=2 ttl=127 time=18.6 ms 64 bytes from 10.10.10.11: icmp_seq=3 ttl=127 time=18.7 ms 64 bytes from 10.10.10.11: icmp_seq=4 ttl=127 time=18.6 ms --- 10.10.10.11 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4004ms rtt min/avg/max/mdev = 18.623/18.697/18.762/0.053 ms, pipe 2 [root@srv-1 usr-1]# |
Our router stats look like this:
router#show interfaces Async 5 Async5 is up, line protocol is up Hardware is Async Serial Internet address is 10.10.10.10/24 MTU 1500 bytes, BW 9 Kbit, DLY 100000 usec, reliability 255/255, txload 1/255, rxload 1/255 Encapsulation PPP, loopback not set Keepalive not set DTR is pulsed for 5 seconds on reset LCP Open Open: IPCP Last input 00:00:09, output 00:00:09, output hang never Last clearing of "show interface" counters 00:29:33 Input queue: 1/75/0 (size/max/drops); Total output drops: 0 Queueing strategy: weighted fair Output queue: 0/1000/64/0 (size/max total/threshold/drops) Conversations 0/1/16 (active/max active/max total) Reserved Conversations 0/0 (allocated/max allocated) 5 minute input rate 0 bits/sec, 0 packets/sec 5 minute output rate 0 bits/sec, 0 packets/sec 615 packets input, 45313 bytes, 0 no buffer Received 0 broadcasts, 0 runts, 0 giants, 0 throttles 1 input errors, 1 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort 565 packets output, 35812 bytes, 0 underruns 0 output errors, 0 collisions, 0 interface resets 0 output buffer failures, 0 output buffers swapped out 0 carrier transitions router# |
This is running a workstation via PPP through the Aux port on a Cisco 1720 router we have in our lab. More information on this configuration is available in this article. Now, let’s kill the network connection with a ping flood:
[root@srv-1 usr-1]# ping -f -s 1000 10.10.10.11 PING 10.10.10.11 (10.10.10.11) 1000(1028) bytes of data. ................................................................................ |
You can see the ping stats climb:
64 bytes from 10.10.10.11: icmp_seq=3 ttl=127 time=18.6 ms 64 bytes from 10.10.10.11: icmp_seq=4 ttl=127 time=18.6 ms 64 bytes from 10.10.10.11: icmp_seq=5 ttl=127 time=4882 ms 64 bytes from 10.10.10.11: icmp_seq=34 ttl=127 time=6474 ms 64 bytes from 10.10.10.11: icmp_seq=47 ttl=127 time=6697 ms 64 bytes from 10.10.10.11: icmp_seq=53 ttl=127 time=6787 ms 64 bytes from 10.10.10.11: icmp_seq=68 ttl=127 time=7011 ms 64 bytes from 10.10.10.11: icmp_seq=69 ttl=127 time=6935 ms 64 bytes from 10.10.10.11: icmp_seq=87 ttl=127 time=7327 ms 64 bytes from 10.10.10.11: icmp_seq=88 ttl=127 time=7252 ms |
We are starting to see some delays between 100 and 101 on the AreWeDown tool:
| 2005-08-06 08:25:17 | are@10.10.10.11 | 100 | | 2005-08-06 08:25:18 | are@10.10.10.11 | 101 | | 2005-08-06 08:25:21 | are@10.50.100.190 | 100 | | 2005-08-06 08:25:21 | are@10.50.100.190 | 101 | | 2005-08-06 08:25:49 | are@10.10.10.11 | 100 | | 2005-08-06 08:25:50 | are@10.50.100.190 | 100 | | 2005-08-06 08:25:50 | are@10.50.100.190 | 101 | | 2005-08-06 08:25:51 | are@10.10.10.11 | 101 | |
Our output queue is at the drop threshold on the router:
router#show interfaces Async 5 Async5 is up, line protocol is up Hardware is Async Serial Internet address is 10.10.10.10/24 MTU 1500 bytes, BW 9 Kbit, DLY 100000 usec, reliability 255/255, txload 141/255, rxload 140/255 Encapsulation PPP, loopback not set Keepalive not set DTR is pulsed for 5 seconds on reset LCP Open Open: IPCP Last input 00:00:00, output 00:00:00, output hang never Last clearing of "show interface" counters 00:35:37 Input queue: 1/75/0 (size/max/drops); Total output drops: 7997 Queueing strategy: weighted fair Output queue: 64/1000/64/7997 (size/max total/threshold/drops) Conversations 1/2/16 (active/max active/max total) Reserved Conversations 0/0 (allocated/max allocated) 5 minute input rate 49000 bits/sec, 11 packets/sec 5 minute output rate 50000 bits/sec, 11 packets/sec 2347 packets input, 1714050 bytes, 0 no buffer Received 0 broadcasts, 0 runts, 0 giants, 0 throttles 3 input errors, 2 CRC, 0 frame, 1 overrun, 0 ignored, 0 abort 2324 packets output, 1734040 bytes, 0 underruns 0 output errors, 0 collisions, 0 interface resets 0 output buffer failures, 0 output buffers swapped out 0 carrier transitions router# |
Things are getting worse:
| 2005-08-06 08:27:26 | are@10.10.10.11 | 100 | | 2005-08-06 08:27:29 | are@10.10.10.11 | 101 | | 2005-08-06 08:27:51 | are@10.50.100.190 | 100 | | 2005-08-06 08:27:51 | are@10.50.100.190 | 101 | | 2005-08-06 08:27:58 | are@10.10.10.11 | 100 | | 2005-08-06 08:28:02 | are@10.10.10.11 | 101 | |
Four seconds, now, between 100 and 101. Our ping:
64 bytes from 10.10.10.11: icmp_seq=209 ttl=127 time=8998 ms 64 bytes from 10.10.10.11: icmp_seq=215 ttl=127 time=9254 ms 64 bytes from 10.10.10.11: icmp_seq=217 ttl=127 time=9186 ms 64 bytes from 10.10.10.11: icmp_seq=262 ttl=127 time=9989 ms |
We are now unresponsive:
| 2005-08-06 08:29:36 | are@10.10.10.11 | 100 | | 2005-08-06 08:29:41 | are@10.10.10.11 | 101 | | 2005-08-06 08:29:51 | are@10.50.100.190 | 100 | | 2005-08-06 08:29:51 | are@10.50.100.190 | 101 | | 2005-08-06 08:30:21 | are@10.50.100.190 | 100 | | 2005-08-06 08:30:21 | are@10.50.100.190 | 101 | |
We should see an entry every 30 seconds, but .11 has stopped sending requests. Our ping responses are climbing as well:
64 bytes from 10.10.10.11: icmp_seq=385 ttl=127 time=11831 ms 64 bytes from 10.10.10.11: icmp_seq=386 ttl=127 time=11756 ms 64 bytes from 10.10.10.11: icmp_seq=387 ttl=127 time=11680 ms 64 bytes from 10.10.10.11: icmp_seq=388 ttl=127 time=11604 ms 64 bytes from 10.10.10.11: icmp_seq=406 ttl=127 time=12122 ms |
Our router stats:
Async5 is up, line protocol is up Hardware is Async Serial Internet address is 10.10.10.10/24 MTU 1500 bytes, BW 9 Kbit, DLY 100000 usec, reliability 255/255, txload 252/255, rxload 255/255 Encapsulation PPP, loopback not set Keepalive not set DTR is pulsed for 5 seconds on reset LCP Open Open: IPCP Last input 00:00:00, output 00:00:00, output hang never Last clearing of "show interface" counters 00:40:27 Input queue: 1/75/0 (size/max/drops); Total output drops: 23735 Queueing strategy: weighted fair Output queue: 64/1000/64/23735 (size/max total/threshold/drops) Conversations 1/2/16 (active/max active/max total) Reserved Conversations 0/0 (allocated/max allocated) 5 minute input rate 88000 bits/sec, 10 packets/sec 5 minute output rate 91000 bits/sec, 11 packets/sec 5632 packets input, 4974777 bytes, 0 no buffer Received 0 broadcasts, 0 runts, 0 giants, 0 throttles 3 input errors, 2 CRC, 0 frame, 1 overrun, 0 ignored, 0 abort 5652 packets output, 5042433 bytes, 0 underruns 0 output errors, 0 collisions, 0 interface resets 0 output buffer failures, 0 output buffers swapped out 0 carrier transitions router# |
Our output queue is still at the threshold and we are dropping a lot of packets. One fix would be to disallow ICMP. But, we are going to assume that we want ICMP allowed. Another fix for this is to apply traffic shaping:
router#conf term Enter configuration commands, one per line. End with CNTL/Z. router(config)#int Async 5 router(config-if)#traffic-shape rate 80000 router(config-if)#exit router(config)#exit router# |
Our output queue is back down:
router#show interfaces Async 5 Async5 is up, line protocol is up Hardware is Async Serial Internet address is 10.10.10.10/24 MTU 1500 bytes, BW 9 Kbit, DLY 100000 usec, reliability 255/255, txload 18/255, rxload 255/255 Encapsulation PPP, loopback not set Keepalive not set DTR is pulsed for 5 seconds on reset LCP Open Open: IPCP Last input 00:00:00, output 00:00:00, output hang never Last clearing of "show interface" counters 00:43:08 Input queue: 1/75/0 (size/max/drops); Total output drops: 32551 Queueing strategy: weighted fair Output queue: 0/1000/64/30046 (size/max total/threshold/drops) Conversations 0/2/16 (active/max active/max total) Reserved Conversations 0/0 (allocated/max allocated) 5 minute input rate 89000 bits/sec, 11 packets/sec 5 minute output rate 86000 bits/sec, 10 packets/sec 7439 packets input, 6773226 bytes, 0 no buffer Received 0 broadcasts, 0 runts, 0 giants, 0 throttles 11 input errors, 10 CRC, 0 frame, 1 overrun, 0 ignored, 0 abort 7475 packets output, 6851326 bytes, 0 underruns 0 output errors, 0 collisions, 0 interface resets 0 output buffer failures, 0 output buffers swapped out 0 carrier transitions router# |
Our client can now talk again:
| 2005-08-06 08:33:21 | are@10.50.100.190 | 101 | | 2005-08-06 08:33:51 | are@10.50.100.190 | 100 | | 2005-08-06 08:33:51 | are@10.50.100.190 | 101 | | 2005-08-06 08:34:21 | are@10.50.100.190 | 100 | | 2005-08-06 08:34:21 | are@10.50.100.190 | 101 | | 2005-08-06 08:34:51 | are@10.50.100.190 | 100 | | 2005-08-06 08:34:51 | are@10.50.100.190 | 101 | | 2005-08-06 08:35:08 | are@10.10.10.11 | 100 | | 2005-08-06 08:35:10 | are@10.10.10.11 | 101 | | 2005-08-06 08:35:20 | are@10.10.10.11 | 100 | | 2005-08-06 08:35:21 | are@10.50.100.190 | 100 | | 2005-08-06 08:35:21 | are@10.50.100.190 | 101 | | 2005-08-06 08:35:21 | are@10.10.10.11 | 101 | | 2005-08-06 08:35:43 | are@10.10.10.11 | 100 | | 2005-08-06 08:35:43 | are@10.10.10.11 | 101 | |
Now, the ping times from the flood machine are still high:
64 bytes from 10.10.10.11: icmp_seq=538 ttl=127 time=14117 ms 64 bytes from 10.10.10.11: icmp_seq=557 ttl=127 time=14413 ms 64 bytes from 10.10.10.11: icmp_seq=558 ttl=127 time=14338 ms 64 bytes from 10.10.10.11: icmp_seq=559 ttl=127 time=18114 ms |
But, from another machine:
[root@pippi ~]# ping 10.10.10.11 PING 10.10.10.11 (10.10.10.11) 56(84) bytes of data. 64 bytes from 10.10.10.11: icmp_seq=0 ttl=54 time=80.3 ms 64 bytes from 10.10.10.11: icmp_seq=1 ttl=54 time=81.5 ms 64 bytes from 10.10.10.11: icmp_seq=2 ttl=54 time=81.5 ms 64 bytes from 10.10.10.11: icmp_seq=3 ttl=54 time=82.8 ms |
If we then ping flood from that same host, it will eventually drop most of the packets:
64 bytes from 10.10.10.11: icmp_seq=2 ttl=54 time=774 ms 64 bytes from 10.10.10.11: icmp_seq=8 ttl=54 time=763 ms --- 10.10.10.11 ping statistics --- 27 packets transmitted, 2 received, 92% loss, time 26204ms rtt min/avg/max/mdev = 763.927/769.305/774.684/5.449 ms [root@mondo root]# |
This is as it should be.
Related Post: Best Cisco Monitoring Software