Monday, November 7, 2011

Monitoring Linux Servers during Performance Tests

The server resource usage of the servers used in a load or performance test should be monitored and saved to be able to later verify system bottlenecks and behavior.  Linux servers can be monitored in a number of different ways.

If you are using Performance Center to drive load, linux monitors can be enabled as part of the load test, allowing the linux metrics to be conveniently combined with other metrics in a single report.  That requires the rstat daemon to be running on the linux servers monitored.  For CentOS, rstat daemon can be installed as follows:

sudo yum install rpcbind
sudo wget http://www.wyzaerd.com/public/rpc.rstatd-4.0.1.tar.gz
sudo tar xvzf rpc.rstatd-4.0.1.tar.gz
cd rpc.rstatd-4.0.1
sudo ./configure
sudo make
sudo make install
sudo /usr/local/sbin/rpc.rstatd

Monitors can then be configured in performance center to collect stats from the linux server running rstatd.

Another tool for viewing and collecting server resource usage on linux is jperfmeter (http://jperfmeter.sourceforge.net/).  Jperfmeter provides a gui view of resource usage metrics of a list of servers, showing values over time.  For example the following shows the jmeter console for a group of 4 linux servers:

The jmeter metrics can be saved to a delimited text file for later analysis:

A third tool for monitoring linux server resource usage is sar. The following script will collect server resource usage on CPU, memory, network, and disk utilization:


# Run sar every 10 seconds until stopped
# cpu
sar -u 10  > sar.cpu.log &
# free memory
sar -r 10  > sar.freememory.log &
# disk total
sar -b 10  > sar.disk.log &
# network by device
#    - Note that you need to filter by the adaptor in use.  
#    - Run "sar -n DEV 10" to see which adaptor is being used
sar -n DEV 10 |grep eth1 > sar.network.log &

On some versions of sar, it is necessary to provide a number of data points to collect such as the following:

# Run sar every 10 seconds until stopped
# cpu
sar -u 10 99999 > sar.cpu.log &
# free memory
sar -r 10 99999 > sar.freememory.log &
# disk total
sar -b 10 99999 > sar.disk.log &
# network by device
#    - Note that you need to filter by the adaptor in use.  
#    - Run "sar -n DEV 10" to see which adaptor is being used
sar -n DEV 10 99999 |grep eth1 > sar.network.log &


For a test of one or two hours, a 10 second collection interval is convenient, providing a couple of hundred data points.  For an overnight test, a 300 second collection interval is usually appropriate.

Output

The cpu log file shows user and system CPU % utilization:

03:07:55 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
03:08:55 PM     all     73.99      0.00      2.43      0.21      0.00     23.37
03:09:55 PM     all     81.79      0.00      2.67      0.21      0.00     15.34
03:10:55 PM     all     82.29      0.00      2.68      0.17      0.00     14.86

The free memory log file shows how much memory is free and used:

03:07:55 PM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
03:08:55 PM 110106128  88246712     44.49    356468  42850352  30363972      7.61
03:09:55 PM 110053452  88299388     44.52    356472  42879192  30371420      7.61
03:10:55 PM 109989584  88363256     44.55    356484  42914152  30372688      7.61

The disk log file shows read and write transfers per second and bytes read and written per second

03:07:55 PM       tps      rtps      wtps   bread/s   bwrtn/s
03:08:55 PM   7889.59      0.00   7889.59      0.00  58582.09
03:09:55 PM   8454.59      0.00   8454.59      0.00  62458.76
03:10:55 PM   8456.30      0.00   8456.30      0.00  62645.15
03:11:55 PM   7257.61      0.00   7257.61      0.00  57384.76

The network log file shows packets received and transmitted per second and bytes received and transmitted per second.

03:00:01 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
03:08:55 PM      eth1   3285.46   2965.12    956.75   1824.97      0.00      0.00      1.05
03:09:55 PM      eth1   3640.33   3307.06   1053.38   2074.92      0.00      0.00      1.14
03:10:55 PM      eth1   3617.67   3283.23   1047.62   2061.22      0.00      0.00      1.65
03:11:55 PM      eth1   2917.34   2657.74    842.35   1686.10      0.00      0.00      1.38
03:12:55 PM      eth1   3859.74   3502.98   1119.06   2194.43      0.00      0.00      1.15

These can be charted to provide a simple overview of server resource usage as follows:






No comments:

Post a Comment