sub-second: histogram

Showing posts with label histogram. Show all posts

Friday, September 14, 2012

Dumping Very Large Java Heaps

When a java application has either a memory leak or much higher than expected memory utilization, it is necessary to obtain heap information to identify the source of the problem. A heap dump is ideal because it can then be analyzed using various tools. However, with very large java heaps, perhaps > 100GB, a heap dump may be impractical for several reasons:

the heap dump may crash the java process before completing
the heap dump may hang indefinitely
there may not be enough disk space to accomodate the dump
the dump may be so large that analysis tools are unable to process it

One solution to this scenario is to use the jmap utility to obtain a heap dump histogram from the running process. This appears to be very lightweight, completing quickly on very large heaps and generating a very small summary analysis file that can be used for troubleshooting.

The syntax for doing this is the following, where <pid> is the process id of the java process.

jmap -histo <pid>

The output is a very nice summary showing, for each class in the heap, the class name, the number of instances, and the size in bytes, for example as follows:

num #instances #bytes class name

----------------------------------------------

1: 70052 11118624 <constMethodKlass>

2: 70052 8422160 <methodKlass>

3: 6320 8258472 <constantPoolKlass>

4: 6320 6117216 <instanceKlassKlass>

5: 116656 5732520 <symbolKlass>

6: 17467 5729824 [I

7: 5682 5050352 <constantPoolCacheKlass>

8: 57275 4818512 [C

9: 24818 2660384 [B

10: 59327 1898464 java.lang.String

11: 2847 1766720 [J

12: 2978 1542008 <methodDataKlass>

13: 11687 797256 [S

14: 13307 706440 [Ljava.lang.Object;

15: 6777 704808 java.lang.Class

16: 18904 604928 java.util.HashMap$Entry

17: 10088 522512 [[I

18: 5736 499408 [Ljava.util.HashMap$Entry;

19: 12838 410816 java.util.Hashtable$Entry

20: 5580 267840 java.util.HashMap

21: 428 249952 <objArrayKlassKlass>

22: 5888 235520 java.util.concurrent.ConcurrentHashMap$Segment

23: 6243 199776 java.util.concurrent.locks.ReentrantLock$NonfairSync

24: 5888 146544 [Ljava.util.concurrent.ConcurrentHashMap$HashEntry;

...

3029: 1 16 sun.awt.X11.XToolkit$4

3030: 1 16 java.util.Collections$EmptyIterator

3031: 1 16 com.sun.tools.visualvm.core.explorer.ExplorerContextMenuFactory

3032: 1 16 sun.reflect.generics.tree.TypeVariableSignature

3033: 1 16 sun.awt.X11.XKeyboardFocusManagerPeer$1

3034: 1 16 org.openide.xml.EntityCatalog$Forwarder

Total 714547 74730032

Monday, June 4, 2012

Reporting Response Times - Percentile Distributions and Trends

Response time metrics can be very misleading. An average of 2 seconds can mask occasional very high values that could be relevant to performance goals. It can also mask cyclical trends. The same is true of reporting median or specific percentiles such as 95th percentile. An informative response time report should include two elements:

Response time percentile distribution
Response time trends with enough granularity to see spikes and patterns

Here is an example of these two elements:

Response Time Percentile Distribution

The distribution chart shows clearly what percentage of response times might be above the response time goal.

Response Time Trend

The trend chart shows clearly an upward trend over time, an important property that would be lost in any single response time metric. It also includes enough granularity to show the distribution pattern of response time spikes.

Friday, November 11, 2011

VisualCSV - A Free CSV Analyzer - CSV File to Chart, CSV File to Stats, CSV File to Histograms, CSV File Correlation Charts

VisualCSV Download Now

VisualCSV is a free CSV analyzer that is:

A Java application
Converts CSV File to Charts
Converts CSV File to Statistics
Converts CSV File to Histograms
Converts CSV File to Correlation Charts

Performance testing, system monitoring, etc. often generate CSV files that must be analyzed, and typically you need to chart columns, calculate stats on columns, correlate columns, etc.

VisualCSV is a java application that, given a csv file input, will generate most of the data you need automatically, including:

statistics on each column in the CSV file

min
avg
max
standard deviation
variance, median
percentiles (such as 95th and 98th percentiles, or any list of percentiles you specify)

trend charts for each column in jpeg format
histogram charts for each column
correlation metrics between any pairs of columns
correlation charts showing correlations between any pairs of columns
transposed correlation charts reversing x and y axis.
charts created using jfreechart
stats created using apache commons math

The following sections shos sample output generated from processing a csv file from windows performance monitor (perfmon) having the following columns:

(PDH-CSV 4.0) (Pacific Standard Time)(480)

\\CHELSTRPL006\Memory\Available Bytes

\\CHELSTRPL006\Network Interface(HP NC382i DP Multifunction Gigabit Server Adapter _4)\Bytes Total/sec

\\CHELSTRPL006\Network Interface(HP NC382i DP Multifunction Gigabit Server Adapter _4)\Output Queue Length

\\CHELSTRPL006\PhysicalDisk(_Total)\% Disk Time

\\CHELSTRPL006\PhysicalDisk(_Total)\Avg. Disk sec/Read

\\CHELSTRPL006\PhysicalDisk(_Total)\Avg. Disk sec/Write

\\CHELSTRPL006\Processor(_Total)\% Privileged Time

\\CHELSTRPL006\Processor(_Total)\% Processor Time

Summary stats

The following csv file showing summary stats for each column is generated.

Trend Charts

For each column, a chart is generated such as this:

Histograms

For each column, a frequency histogram showing number of occurrences by bucket is generated such as this:

The number of buckets is configurable.

For each chart, a relative frequency histogram is generated showing relative frequency by bucket is generated such as this:

Correlations Between Columns

For each pair of columns, a csv file showing correlations between pairs of columns is generated such as this:

Correlation Charts

For each pair of columns, a correlation chart and a transposed correlation chart (x and y swapped) is generated such as this:

Download

VisualCSV can be downloaded here:

https://docs.google.com/open?id=0B4gruS14dMq4N2M4ZDZjNWYtMTIzYy00NmRmLWJiNTYtOWVhYTc4NjJiMWU2

Usage

VisualCSV requires java in the system path. It has been tested with java 1.6 and probably runs fine with 1.5 and 1.7 as well.
Download and unzip the files.
Analyze the sample CSV file from command line by running:

run.bat TestPerfmonFile.csv

This will analyze the test file and generate output in directory .\out. The output should

Configuration

Client configuration is found in the file client.config.

Columns can be specifically included or excluded by putting them in specified include or exclude files which by default (as specified in client.config) are called

Column-Include-Names.properties

Column-Exclude-Names.properties

The include file can be empty or missing in which case all columns are analyzed. Otherwise, the columns analyzed are all columns in the include file minus those in the exclude file.

Likewise columns can be specifically included or excluded from being correlated with each other by putting them in specified include or exclude files which by default (as specified in client.config) are called

Correlation-Include-Names.properties

Correlation-Exclude-Names.properties

Again, the include file can be empty or missing in which case all columns are correlated. Otherwise, the columns in the exclude file are subtracted from the includes.

The client.config file includes the following values by default:

# --- Functionality
createTrendCharts=true
computeCorrelations=true
createCorrelationCharts=true

# --- whether to include or exclude columns from analysis
# --- put columns in this file to include them in analysis. empty file or no file means analyze all columns.
columnIncludeFile=Column-Include-Names.properties

# --- put columns in this file to exclude them from analysis.
columnExcludeFile=Column-Exclude-Names.properties

# --- whether to include or exclude columns from correlation with other columns
correlationIncludeFile=Correlation-Include-Names.properties
correlationExcludeFile=Correlation-Exclude-Names.properties

# --- Timestamp column index (0 = first column)
hasTimestampColumn=true
timestampColumnIndex=0

# --- Java SimpleDateFormat date format for parsing timestamps
timestampFormat=MM/dd/yyyy HH:mm:ss.S

# Letter Date or Time Component Presentation Examples
# G Era designator Text AD
# y Year Year 1996; 99
# M Month in year Month July; Jul; 07
# w Week in year Number 27
# W Week in month Number 2
# D Day in year Number 189
# d Day in month Number 10
# F Day of week in month Number 2
# E Day in week Text Tuesday; Tue
# a Am/pm marker Text PM
# H Hour in day (0-23) Number 0
# k Hour in day (1-24) Number 24
# K Hour in am/pm (0-11) Number 0
# h Hour in am/pm (1-12) Number 12
# m Minute in hour Number 30
# s Second in minute Number 55
# S Millisecond Number 978
# z Time zone General time zone Pacific Standard Time; PST; GMT-08:00
# Z Time zone RFC 822 time zone -0800

# --- chart settings
chartEveryNthDataPoint=1
histogramLogScale=false

# --- statistics settings
createChartCorrelationGreaterThan=0.0
createChartCorrelationLessThan=-0.0
numHistogramBuckets=10
percentiles=80,90,95,98,99

# --- Output file locations
outDir=out
deleteStatsDirs=true
outTrendDir=out/Trend
outCorrelationDir=out/Correlation
outHistogramDir=out/Histogram
outputStatsFile=SummaryStats.csv