Friday, September 14, 2012

Dumping Very Large Java Heaps

When a java application has either a memory leak or much higher than expected memory utilization, it is necessary to obtain heap information to identify the source of the problem.  A heap dump is ideal because it can then be analyzed using various tools.  However, with very large java heaps, perhaps > 100GB, a heap dump may be impractical for several reasons:

  • the heap dump may crash the java process before completing
  • the heap dump may hang indefinitely
  • there may not be enough disk space to accomodate the dump
  • the dump may be so large that analysis tools are unable to process it
One solution to this scenario is to use the jmap utility to obtain a heap dump histogram from the running process.  This appears to be very lightweight, completing quickly on very large heaps and generating a very small summary analysis file that can be used for troubleshooting.

The syntax for doing this is the following, where <pid> is the process id of the java process.

jmap -histo <pid>

The output is a very nice summary showing, for each class in the heap, the class name, the number of instances, and the size in bytes, for example as follows:

 num     #instances         #bytes  class name
----------------------------------------------
   1:         70052       11118624  <constMethodKlass>
   2:         70052        8422160  <methodKlass>
   3:          6320        8258472  <constantPoolKlass>
   4:          6320        6117216  <instanceKlassKlass>
   5:        116656        5732520  <symbolKlass>
   6:         17467        5729824  [I
   7:          5682        5050352  <constantPoolCacheKlass>
   8:         57275        4818512  [C
   9:         24818        2660384  [B
  10:         59327        1898464  java.lang.String
  11:          2847        1766720  [J
  12:          2978        1542008  <methodDataKlass>
  13:         11687         797256  [S
  14:         13307         706440  [Ljava.lang.Object;
  15:          6777         704808  java.lang.Class
  16:         18904         604928  java.util.HashMap$Entry
  17:         10088         522512  [[I
  18:          5736         499408  [Ljava.util.HashMap$Entry;
  19:         12838         410816  java.util.Hashtable$Entry
  20:          5580         267840  java.util.HashMap
  21:           428         249952  <objArrayKlassKlass>
  22:          5888         235520  java.util.concurrent.ConcurrentHashMap$Segment
  23:          6243         199776  java.util.concurrent.locks.ReentrantLock$NonfairSync
  24:          5888         146544  [Ljava.util.concurrent.ConcurrentHashMap$HashEntry;
  ...
3029:             1             16  sun.awt.X11.XToolkit$4
3030:             1             16  java.util.Collections$EmptyIterator
3031:             1             16  com.sun.tools.visualvm.core.explorer.ExplorerContextMenuFactory
3032:             1             16  sun.reflect.generics.tree.TypeVariableSignature
3033:             1             16  sun.awt.X11.XKeyboardFocusManagerPeer$1
3034:             1             16  org.openide.xml.EntityCatalog$Forwarder
Total        714547       74730032

Loadrunner Convert To Unicode or UTF-8

The following shows how to convert strings in Vugen to unicode or UTF-8, which is necessary for certain applications with globalization functionality.

Script


Action() 

char *converted_buffer_unicode = NULL; 

// convert to unicode
lr_convert_string_encoding("Text to convert to unicode", 
LR_ENC_SYSTEM_LOCALE, 
LR_ENC_UNICODE, 
"paramUnicode"); 

// convert to UTF-8
lr_convert_string_encoding("Text to convert to utf8", 
LR_ENC_SYSTEM_LOCALE, 
LR_ENC_UTF8, 
"paramUtf8"); 

return 0; 




Console Output


Running Vuser...
Starting iteration 1.
Starting action Action.
Action.c(6): Notify: Saving Parameter "paramUnicode = T\x00e\x00x\x00t\x00 \x00t\x00o\x00 \x00c\x00o\x00n\x00v\x00e\x00r\x00t\x00 \x00t\x00o\x00 \x00u\x00n\x00i\x00c\x00o\x00d\x00e\x00\x00\x00".
Action.c(12): Notify: Saving Parameter "paramUtf8 = Text to convert to utf8\x00".
Ending action Action.
Ending iteration 1.

Thursday, September 13, 2012

How to Test the Stability of an Application


Testing the stability of an application is critical.  It can prevent system outages by identifying problems before they occur in production.  Outages can severely damage a business, in some cases permanently.  The following outline provides a reasonable template for testing application stability.

  • Ramp load up incrementally to the breaking point of the system.  Do not stop at expected peak load because bursts or unexpected traffic can entail load far higher than anticipated.
    • Load should cover critical dimensions such as transaction rate/throughput, connections, concurrent users, range of use cases/functionality
    • When the application breaks, investigate what broke
      • If the test infrastructure broke (test client capacity hit, test network capacity hit, test case crashed, etc.), the test infrastructure must be repaired so that the application is what breaks, not the test infrastructure.
      • If the application broke, diagnose the type of breakage and what broke.
      • Is breakage recoverable?
      • Does breakage affect already connected users, or just block new users?
      • Did the application code break (errors, deadlocks, thread blocking, etc.)?
      • Was a system resource limit hit (cpu, memory, network, disk)?
      • If system resource limits were not hit, does the application need to be fixed so that it is not the bottleneck?  The system should scale up so that system limits are hit, whether CPU, network, disk I/O, or network bandwidth.
      • Did a downstream service break?
        • How can the downstream service be improved to provide more capacity and stability?
      • Did the system just slow down, remaining functional?
      • Is a restart required, and what must be restarted (services, server, downstream services, etc.)? 
      • Can the system be scaled out or scaled up to improve the capacity?  
        • If not, why not?  Is there an architectural limitation preventing further scalability?  How can scalability be improved?
    • From the test determine the peak capacity of the application and verify that proper production monitoring is in place to detect this threshold.
  • Run at near peak capacity for an extended period of time (this could be one day or more depending on uptime requirements)
    • Is the application stable when run for a long time or does it eventually crash?
      • Why does it crash?
    • Does performance degrade over time?
      • Why does it degrade?
  • Perform administrative operations that may need to be performed during production usage while system is near peak load.
    • Is the system stable when this happens?
  • Perform the full suite of functional tests while the system is near peak load.
    • Is the system stable when this happens?

Document the results of the test carefully.  Do not ignore crashes and instability.  Spend the time and effort to understand the behavior and harden the application to behave well under any conditions, anticipated or not.

Loadrunner Unique Names

It is useful to be able to create human-readable unique names to be used for filenames, database keys, logging, etc.  One way to do this is to create a string with the following information:

  • load generator name
  • vuser id
  • iteration number
  • timestamp
This can be done easily as follows:
  • get load generator name using lr_get_host_name()
  • get vuser id using lr_whoami()
  • get iteration number using a parameter of type iteration
  • get current timestamp using lr_save_datetime()
The following script shows how to do this.

Script

Action()
{
int id
char *host; 

        // get load generator name
host = lr_get_host_name( ); 
//lr_output_message("Host: %s", host);

// get vuser id
lr_whoami(&id, NULL, NULL);
//lr_message( "Vuser id: %d",  id);

        // for iteration add a parameter named "Iteration" of type Iteration
//lr_output_message(lr_eval_string("Iteration: {Iteration}")); 

        // get timestamp, formatted as wanted
        lr_save_datetime("%m-%d-%Y-%I%M%S%p", DATE_NOW, "now"); 
//lr_output_message(lr_eval_string("Time: {now}"));

        // now generate the nice, readable unique id
lr_output_message(lr_eval_string("Host-%s-VuserId-%d-Iteration-{Iteration}-Time-{now}"),host, id);

return 0;
}

Output
Action.c(20): Host-generator02-VuserId-1-Iteration-1-Time-09-13-2012-035856PM [MsgId: MMSG-17999]
Action.c(20): Host-generator02-VuserId-2-Iteration-1-Time-09-13-2012-035856PM [MsgId: MMSG-17999]
Action.c(20): Host-generator02-VuserId-1-Iteration-2-Time-09-13-2012-035857PM [MsgId: MMSG-
Action.c(20): Host-generator02-VuserId-2-Iteration-2-Time-09-13-2012-035857PM [MsgId: MMSG-17999
Action.c(20): Host-generator02-VuserId-1-Iteration-3-Time-09-13-2012-035858PM [MsgId: MMSG-17999]


Loadrunner Charting Custom Metrics

In Loadrunner, you can chart your own custom metrics easily by using the function lr_user_data_point.  These could include calls to the server to collect system stats, application metrics based on responses, or anything at all.  The custom metrics are then available in Loadrunner Analysis.  The following provides an example:

Script


double getMyMetric();

Action()
{
double mymetric;
int i;
for (i=0;i<100;i++) {

mymetric = getMyMetric(); 
lr_user_data_point("mymetric", mymetric); 
lr_log_message( "--- Added custom metric value = %f", mymetric );

return 0;
}


double getMyMetric()
{
    int randomInt = (rand() % 100)+1;  // Get a random number 1 - 100
double mymetric = randomInt / 100.0; // Return a double between 0  and 1
return mymetric;
}



Console Output


Running Vuser...
Starting iteration 1.
Starting action Action.
Action.c(10): Notify: Data Point "mymetric" value = 0.4800.
--- Added custom metric value = 0.480000
Action.c(10): Notify: Data Point "mymetric" value = 0.7900.
--- Added custom metric value = 0.790000
...


Analysis Output

The custom metric is then available in analysis:


A chart can be generated showing the trend of the custom data points added during the test:




Loadrunner Current Timestamp

In Vugen, it is helpful to include a datetime with log messages, either informational or errors.  This can be done very easily using the lr_save_datetime function.  The timestamp can also be formatted in many different ways using the format codes commented out in the script below.   The following script shows how this can be done:

Script


Action()
{
// handle an event such as an error or important message including a timestamp
lr_save_datetime("%m/%d/%Y %I:%M:%S %p", DATE_NOW, "now"); 
lr_output_message(lr_eval_string("{now}: My important message here.")); 

//  Format codes for lr_save_datetime
// %a  day of week, using locale's abbreviated weekday names
// %A  day of week, using locale's full weekday names
// %b  month, using locale's abbreviated month names
// %B  month, using locale's full month names
// %c  date and time as %x %X
// %d  day of month (01-31)
// %H  hour (00-23)
// %I  hour (00-12)
// %j  number of day in year (001-366)
// %m  month number (01-12)
// %M  minute (00-59)
// %p  locale's equivalent of AM or PM, whichever is appropriate
// %S  seconds (00-59)
// %U  week number of year (01-52), Sunday is the first day of the week. Week number 01 is the first week with four or more January days in it.
// %w  day of week; Sunday is day 0
// %W  week number of year (01-52), Monday is the first day of the week. Week number 01 is the first week with four or more January days in it.
// %x  date, using locale's date format
// %X  time, using locale's time format
// %y  year within century (00-99)
// %Y  year, including century (for example, 1988)
// %Z  time zone abbreviation
// %%  to include the "%" character in your output string

return 0;
}

Console Output


Running Vuser...
Starting iteration 1.
Starting action Action.
Action.c(29): 09/13/2012 02:56:41 PM: My important message here.
Ending action Action.
Ending iteration 1.