sub-second: network

Showing posts with label network. Show all posts

Thursday, September 13, 2012

How to Test the Stability of an Application

Testing the stability of an application is critical. It can prevent system outages by identifying problems before they occur in production. Outages can severely damage a business, in some cases permanently. The following outline provides a reasonable template for testing application stability.

Ramp load up incrementally to the breaking point of the system. Do not stop at expected peak load because bursts or unexpected traffic can entail load far higher than anticipated.

Load should cover critical dimensions such as transaction rate/throughput, connections, concurrent users, range of use cases/functionality
When the application breaks, investigate what broke

If the test infrastructure broke (test client capacity hit, test network capacity hit, test case crashed, etc.), the test infrastructure must be repaired so that the application is what breaks, not the test infrastructure.
If the application broke, diagnose the type of breakage and what broke.
Is breakage recoverable?
Does breakage affect already connected users, or just block new users?
Did the application code break (errors, deadlocks, thread blocking, etc.)?
Was a system resource limit hit (cpu, memory, network, disk)?
If system resource limits were not hit, does the application need to be fixed so that it is not the bottleneck? The system should scale up so that system limits are hit, whether CPU, network, disk I/O, or network bandwidth.
Did a downstream service break?

How can the downstream service be improved to provide more capacity and stability?

Did the system just slow down, remaining functional?
Is a restart required, and what must be restarted (services, server, downstream services, etc.)?
Can the system be scaled out or scaled up to improve the capacity?

If not, why not? Is there an architectural limitation preventing further scalability? How can scalability be improved?

From the test determine the peak capacity of the application and verify that proper production monitoring is in place to detect this threshold.

Run at near peak capacity for an extended period of time (this could be one day or more depending on uptime requirements)

Is the application stable when run for a long time or does it eventually crash?

Why does it crash?

Does performance degrade over time?

Why does it degrade?

Perform administrative operations that may need to be performed during production usage while system is near peak load.

Is the system stable when this happens?

Perform the full suite of functional tests while the system is near peak load.

Is the system stable when this happens?

Document the results of the test carefully. Do not ignore crashes and instability. Spend the time and effort to understand the behavior and harden the application to behave well under any conditions, anticipated or not.

Monday, August 13, 2012

Recording Loadrunner Traffic in Fiddler

It can be useful to capture loadrunner http traffic in fiddler. This can be useful, for example, for seeing the timeline of http requests, which ones are slow, what sort of parallelism exists, if any, etc. It can also be useful to compare the vugen timeline of requests to the same information captured directly from the browser, as vugen may not match browser behavior as expected, by serializing what is sent asynchronously in parallel by the browser, etc.

Vugen can be configured in Runtime Settings, Internet Protocol, Proxy, Use Proxy Server, address=localhost, port=8888.

You then put breakpoints in the script before and after the section you want to record, run the script in vugen to the first breakpoint (with logging disabled), start capturing traffic in fiddler, run the script to the second breakpoint and stop capturing traffic in fiddler. You can then see a visualization of the request pipeline in fiddler for your loadrunner traffic:

It is also a good idea to capture the traffic by running the script from the controller on the load generator to reduce vugen overhead. You will get a more compact and realistic pipeline that way.

You can then record the same transaction manually using the web browser for comparison to loadrunner.

Wednesday, May 23, 2012

Monitoring Linux Server Usage With Sar

A simple way to monitor server resource usage is with sar. The following simple shell script sar.sh will monitor cpu, memory, network, and disk every 10 seconds and write each to a separate log file which can be easily imported into a spreadsheet for charting.

Script

# Run sar every 10 seconds until stopped
# cpu
sar -u 10 > sar.cpu.log &
# free memory
sar -r 10 > sar.freememory.log &
# disk total
sar -b 10 > sar.disk.log &
# network by device
# - Note that you need to filter by the adaptor in use.
# - Run "sar -n DEV 10" to see which adaptor is being used
sar -n DEV 10 |grep eth1 > sar.network.log &

Output

The cpu log file shows user and system CPU % utilization:

03:07:55 PM CPU %user %nice %system %iowait %steal %idle

03:08:55 PM all 73.99 0.00 2.43 0.21 0.00 23.37

03:09:55 PM all 81.79 0.00 2.67 0.21 0.00 15.34

03:10:55 PM all 82.29 0.00 2.68 0.17 0.00 14.86

The free memory log file shows how much memory is free and used:

03:07:55 PM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit

03:08:55 PM 110106128 88246712 44.49 356468 42850352 30363972 7.61

03:09:55 PM 110053452 88299388 44.52 356472 42879192 30371420 7.61

03:10:55 PM 109989584 88363256 44.55 356484 42914152 30372688 7.61

The disk log file shows read and write transfers per second and bytes read and written per second

03:07:55 PM tps rtps wtps bread/s bwrtn/s

03:08:55 PM 7889.59 0.00 7889.59 0.00 58582.09

03:09:55 PM 8454.59 0.00 8454.59 0.00 62458.76

03:10:55 PM 8456.30 0.00 8456.30 0.00 62645.15

03:11:55 PM 7257.61 0.00 7257.61 0.00 57384.76

The network log file shows packets received and transmitted per second and bytes received and transmitted per second.

03:00:01 PM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s

03:08:55 PM eth1 3285.46 2965.12 956.75 1824.97 0.00 0.00 1.05

03:09:55 PM eth1 3640.33 3307.06 1053.38 2074.92 0.00 0.00 1.14

03:10:55 PM eth1 3617.67 3283.23 1047.62 2061.22 0.00 0.00 1.65

03:11:55 PM eth1 2917.34 2657.74 842.35 1686.10 0.00 0.00 1.38

03:12:55 PM eth1 3859.74 3502.98 1119.06 2194.43 0.00 0.00 1.15

Friday, January 27, 2012

How to Test Software Resiliency to Network Problems Using a Network Emulator

Software applications can be destabilized by many factors that are difficult to cover in a test or lab environment. These include timing-related issues, unanticipated bursts, cascading effects, unexpected administrative or batch processes, and integration complexities. Some of most nefarious factors that can destabilize applications are network problems. Network infrastructure is often a complex black box and can perform in an inconsistent fashion for various reasons. Network problems can cause serious application stability problems including cascading failures, unrecoverable states, and outages.

Network problems can also be difficult to emulate in a test or lab environment. One technique for handling this is to use a network emulator in the test environment. Take two servers between which you want to test various network problems or impairments. These could be two services in a SOA environment, or a web and application server, an application server and a database server, etc. Plug one of the servers into a network emulator port. Plug the other server into another network emulator port which is tied to the first emulator port, as illustrated below:

With the network emulator in place between the two servers, run the application under load, and introduce various network impairments, observing how the application behaves. The following is an example of network impairments that could be introduced:

Network latency. Introduce latency at various levels, such as 1ms, 10ms, 100ms, 1000ms, 10000ms. Resume normal functioning after varying lengths of time, such as 10 sec, 1 min, 10 min.

Bandwidth throttling. Introduce throttling at various levels, such as 100 mb/s, 10mb/s, 1mb/s, 100kb/s. Resume normal functioning after varying lengths of time.

Network down. Introduce 100% packet loss for varying lengths of time.
Emulate dropped packets for varying lengths of time.

Emulate packet accumulation/burst for varying lengths of time.
Other network impairments

With each network problem scenario, the application behavior should be carefully studied. Answers to questions such as the following should be determined:

Does the application behave as expected under the network impairments?
Is the application behavior appropriate? Is timeout, retry, and reconnect functionality functioning as expected?
When the network recovers to a normal state, does the application recover, or is the application in an unrecoverable state?
Is any manual intervention required to bring the application to a normal state?
Do any applications or servers require restarting?
Are appropriate messages logged?
Does excessive message spamming occur?

Testing network problems in the lab provides an extra measure of security and could be well worth the time, expense and effort. If network problems still destabilize the application after doing this type of network problem testing, the test suite should be enhanced to cover the type of scenario that was missed.

Wednesday, January 4, 2012

networkspeedtest - a free network bandwidth test application

Download

Network performance problems can cause a variety of difficult to diagnose application performance problems. It can be necessary to test the bandwidth of the network to verify whether the bandwidth is as expected.
Network speed test is a free, simple network test application that will copy files from one test server to another, showing the network bandwidth used in bytes per second. This could be compared to a previous baseline to show a network problem. For example, if 50 MB/sec of bandwidth was previously seen between two test servers, and now there is 1 MB/sec between the same two test servers, there is likely a network problem.

Installation
Unzip networkspeedtest directory to c:\temp

USAGE
copyto <SERVERNAME> repeat

(Copies current directory to SERVERNAME\c$\temp\networkspeedtest showing network speed in bytes per second.)
(repeat parameter causes it to repeat the operation indefinitely)

EXAMPLE
(Shows network bandwidth of 48 MB/sec from current server to SERVER1)

C:\Temp\networkspeedtest> copyto SERVER1 repeat

-- Speed of copying files of varying sizes to SERVER1 in bytes per second
Speed : 48339531 bps.
Speed : 58255333 bps.
Speed : 41308327 bps.
Speed : 48339531 bps.
Speed : 48339531 bps.
Speed : 48339531 bps.
Speed : 48339531 bps.
Speed : 48339531 bps.
Speed : 48339531 bps.
Speed : 48339531 bps.
Speed : 41308327 bps.
Speed : 58255333 bps.
Speed : 48339531 bps.
Speed : 41308327 bps.

The following shows windows task manager networking tab during this test: