Using sar Part 2 – Making it Play
In Part 1 of this article, we looked at basic invocation of sar and how it works. In Part 2 of this article, we’ll use sar to look at disk, network, and CPU activity. In Part 3, we’ll look at the options available to report on the virtual memory system and what some of the statistics mean.
The observations on performance tuning made in this article are generalities based on our own experience. As always, your mileage may vary. Tuning is a complex activity that requires a sysadmin to make a careful analysis of the hardware, users, and applications that comprise each unique system. One way to do this is to collect baseline data. None of these sar reports, run when a performance problem rears its ugly head, is much use without baselines. So archive some performance data.
The sar option for looking at block devices is -d (disk).
[root@sv-1 sa]# sar -d | more 03:20:01 PM DEV tps rd_sec/s wr_sec/s 03:30:01 PM dev1-0 0.00 0.00 0.00 03:30:01 PM dev1-1 0.00 0.00 0.00 03:30:01 PM dev1-2 0.00 0.00 0.00 03:30:01 PM dev1-3 0.00 0.00 0.00 03:30:01 PM dev1-4 0.00 0.00 0.00 03:30:01 PM dev1-5 0.00 0.00 0.00 03:30:01 PM dev1-6 0.00 0.00 0.00 03:30:01 PM dev1-7 0.00 0.00 0.00 03:30:01 PM dev1-8 0.00 0.00 0.00 03:30:01 PM dev1-9 0.00 0.00 0.00 03:30:01 PM dev1-10 0.00 0.00 0.00 03:30:01 PM dev1-11 0.00 0.00 0.00 03:30:01 PM dev1-12 0.00 0.00 0.00 03:30:01 PM dev1-13 0.00 0.00 0.00 03:30:01 PM dev1-14 0.00 0.00 0.00 03:30:01 PM dev1-15 0.00 0.00 0.00 03:30:01 PM dev3-0 0.30 0.12 6.08 03:30:01 PM dev3-64 0.00 0.00 0.00 03:30:01 PM dev22-0 0.00 0.00 0.00 03:30:01 PM dev22-64 0.00 0.00 0.00 03:30:01 PM dev2-0 0.00 0.00 0.00 03:30:01 PM dev9-0 0.00 0.00 0.00 |
The statistics given here are straightforward; transfers per second, 512 byte sectors read per second, and 512 byte sectors written per second. What makes the block io statistics from sar hard to read is the listing of devices. On this system, 22 block devices show up, identified by major and minor number. Here is the output of df for this system:
[urbana@sv-1 ~]$ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/hda2 287945408 56170484 217148144 21% / /dev/hdb1 288435168 108982600 164800888 40% /secondary none 517544 0 517544 0% /dev/shm |
There are just two hard drives on this system, each formatted with a single file system. So what are all these devices? A look at /dev tells us that dev3-0 is hda, and dev3-64 is hdb. The devices labeled with major number 1 are all special block devices like /dev/null, ramdisk, /dev/zero, and so on. The 22’s are cdrom drives, 2 is the floppy, and 9-0 is the software RAID device md0. So before these stats can be of much use to you, you’ll have to identify the major and minor numbers of the drives you want to examine. ATA hard drives will have major number 3 on Linux systems, but the minor number will be different for each device. Check this by doing a listing on /dev;
[urbana@sv-1 ~]$ ls -l /dev/hd* brw-rw---- 1 root disk 3, 0 Feb 26 22:16 /dev/hda brw-rw---- 1 root disk 3, 1 Feb 26 22:16 /dev/hda1 brw-rw---- 1 root disk 3, 2 Feb 26 22:16 /dev/hda2 brw-rw---- 1 root disk 3, 64 Feb 26 22:16 /dev/hdb brw-rw---- 1 root disk 3, 65 Feb 26 22:16 /dev/hdb1 brw------- 1 usr-1 disk 22, 0 Feb 26 22:16 /dev/hdc brw------- 1 usr-1 disk 22, 64 Feb 26 22:16 /dev/hdd |
The major and minor numbers are listed between the group ownership and the date. To cut through all of the extraneous listings for devices you don’t care about, just run sar through grep with something like this:
[urbana@sv-1 ~]$ sar -d | grep dev3- 12:10:01 AM dev3-0 0.13 0.00 2.29 12:10:01 AM dev3-64 0.00 0.00 0.00 12:20:01 AM dev3-0 0.10 0.00 1.24 12:20:01 AM dev3-64 0.00 0.00 0.00 *snip* |
Note, on old kernels, pre 2.5, use -b for the disk io report.
The -n (network) option to sar displays statistics on network interface traffic, errors, and sockets. The -n option takes arguments DEV (devices), EDEV (error count for devices), SOCK (sockets), and ALL (all of the above).
[urbana@sv-1 ~]$ sar -n DEV | more 12:00:01 AM IFACE rxpck/s txpck/s rxbyt/s txbyt/s rxcmp/s txcmp/s rxmcst/s 12:10:01 AM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:10:01 AM eth0 0.04 0.01 3.19 1.03 0.00 0.00 0.00 12:10:01 AM eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:10:01 AM sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:10:01 AM vmnet1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:10:01 AM vmnet8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 |
The output is pretty easy to parse visually. The three pairs of stats here are packets per second (received and transmitted), bytes per second (received and transmitted), compressed packets per second (received and transmitted), and the lone multicast packets received. The output for the EDEV option is wide, but it’s often the first three numbers that you want — receive errors per second, transmit errors per second, and collisions per second.
[urbana@sv-1 ~]$ sar -n EDEV | more Linux 2.6.9-34.EL (sv-1.example.com) 03/03/2007 12:00:01 AM IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s txcarr/s rxfram/s rxfifo/s txfifo/s 12:10:01 AM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:10:01 AM eth0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:10:01 AM eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:10:01 AM sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:10:01 AM vmnet1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:10:01 AM vmnet8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 |
Use the SOCK argument to see total number of sockets, and breakdowns of tcp, udp, raw, and the number of ip fragments in use. This is a nice quick way to see trends in socket useage and corralate with system changes.
For reporting on CPU activity, run sar in its default mode. On multiprocessor systems, sar will report CPU activity for the systems as a whole, and for each processor if you use “sar -P ALL” This is useful to discover imbalances in CPU use which can then be tracked down to problems with applications or the operating system.
[root@sv-1 ~]# sar -P ALL ---- snipped ---- 03:20:01 PM CPU %user %nice %system %iowait %idle 03:30:01 PM all 0.03 0.00 0.06 0.09 99.82 Average: all 0.05 0.00 0.03 0.02 99.90 |
This system has only one processor, so there is no breakdown by processor. The first three numbers show the percentage of CPU usage broken down by execution type or level. If the three of these total close to 100, the system is CPU bound. The %iowait column is really important. It shows the percentage of time the CPU(s) was idle due to a disk i/o request. A high value here is indicative of a disk bottleneck, which can then be assessed further with sar -d and tools like iostat. The last column, idle time, is a good one to eyeball, but a high value here is not necessarily a bad thing. It’s great to have your CPUs working most of the time – that’s what they’re for. But 0% idle time is not good. Look at these stats in conjunction with load and run queue stats, sar -q. High loads can be caused by i/o bottlenecks and network problems such as unresponsive NFS or NIS servers or DNS problems. If that’s the case you will see low percentages on the CPU utilization stats.
The statistics presented with these sar options are pretty easy to interpret. In Part 3 of this article, we will work with the four options that provide statistics on virtual memory, and these are a little trickier to use. We will also use sar to track the activity of a specific process.