Friday, November 11, 2011

VisualCSV - A Free CSV Analyzer - CSV File to Chart, CSV File to Stats, CSV File to Histograms, CSV File Correlation Charts

VisualCSV         Download Now

VisualCSV is a free CSV analyzer that is:
  • A Java application 
  • Converts CSV File to Charts
  • Converts CSV File to Statistics
  • Converts CSV File to Histograms
  • Converts CSV File to Correlation Charts
Performance testing, system monitoring, etc. often generate CSV files that must be analyzed, and typically you need to chart columns, calculate stats on columns, correlate columns, etc.

VisualCSV is a java application that, given a csv file input, will generate most of the data you need automatically, including:
  • statistics on each column in the CSV file 
    • min
    • avg
    • max
    • standard deviation
    • variance, median
    • percentiles (such as 95th and 98th percentiles, or any list of percentiles you specify)
  • trend charts for each column in jpeg format
  • histogram charts for each column
  • correlation metrics between any pairs of columns
  • correlation charts showing correlations between any pairs of columns
  • transposed correlation charts reversing x and y axis.
  • charts created using jfreechart
  • stats created using apache commons math
The following sections shos sample output generated from processing a csv file from windows performance monitor (perfmon) having the following columns:
(PDH-CSV 4.0) (Pacific Standard Time)(480)
\\CHELSTRPL006\Memory\Available Bytes
\\CHELSTRPL006\Network Interface(HP NC382i DP Multifunction Gigabit Server Adapter _4)\Bytes Total/sec
\\CHELSTRPL006\Network Interface(HP NC382i DP Multifunction Gigabit Server Adapter _4)\Output Queue Length
\\CHELSTRPL006\PhysicalDisk(_Total)\% Disk Time
\\CHELSTRPL006\PhysicalDisk(_Total)\Avg. Disk sec/Read
\\CHELSTRPL006\PhysicalDisk(_Total)\Avg. Disk sec/Write
\\CHELSTRPL006\Processor(_Total)\% Privileged Time
\\CHELSTRPL006\Processor(_Total)\% Processor Time

Summary stats
The following csv file showing summary stats for each column is generated.


Trend Charts
For each column, a chart is generated such as this:

Histograms
For each column, a frequency histogram showing number of occurrences by bucket is generated such as this:
The number of buckets is configurable.

For each chart, a relative frequency histogram  is generated showing relative frequency by bucket is generated such as this:

Correlations Between Columns
For each pair of columns, a csv file showing correlations between pairs of columns is generated such as this:

Correlation Charts
For each pair of columns, a correlation chart and a transposed correlation chart (x and y swapped) is generated such as this:

Download
VisualCSV can be downloaded here:

Usage
  • VisualCSV requires java in the system path.  It has been tested with java 1.6 and probably runs fine with 1.5 and 1.7 as well.
  • Download and unzip the files.
  • Analyze the sample CSV file from command line by running:
    • run.bat TestPerfmonFile.csv
  • This will analyze the test file and generate output in directory .\out.  The output should 

Configuration

Client configuration is found in the file client.config.

Columns can be specifically included or excluded by putting them in specified include or exclude files which by default (as specified in client.config) are called 
Column-Include-Names.properties
Column-Exclude-Names.properties

The include file can be empty or missing in which case all columns are analyzed.  Otherwise, the columns analyzed are all columns in the include file minus those in the exclude file.

Likewise columns can be specifically included or excluded from being correlated with each other by putting them in specified include or exclude files which by default (as specified in client.config)  are called 
Correlation-Include-Names.properties
Correlation-Exclude-Names.properties

Again, the include file can be empty or missing in which case all columns are correlated. Otherwise, the columns in the exclude file are subtracted from the includes.

The client.config file includes the following values by default:


# --- Functionality
createTrendCharts=true
computeCorrelations=true
createCorrelationCharts=true


# --- whether to include or exclude columns from analysis
# --- put columns in this file to include them in analysis.  empty file or no file means analyze all columns.
columnIncludeFile=Column-Include-Names.properties


# --- put columns in this file to exclude them from analysis.
columnExcludeFile=Column-Exclude-Names.properties


# --- whether to include or exclude columns from correlation with other columns
correlationIncludeFile=Correlation-Include-Names.properties
correlationExcludeFile=Correlation-Exclude-Names.properties




# --- Timestamp column index (0 = first column)
hasTimestampColumn=true
timestampColumnIndex=0


# --- Java SimpleDateFormat date format for parsing timestamps
timestampFormat=MM/dd/yyyy HH:mm:ss.S

# Letter Date or Time Component Presentation Examples
# G Era designator Text AD
# y Year Year 1996; 99
# M Month in year Month July; Jul; 07
# w Week in year Number 27
# W Week in month Number 2
# D Day in year Number 189
# d Day in month Number 10
# F Day of week in month Number 2
# E Day in week Text Tuesday; Tue
# a Am/pm marker Text PM
# H Hour in day (0-23) Number 0
# k Hour in day (1-24) Number 24
# K Hour in am/pm (0-11) Number 0
# h Hour in am/pm (1-12) Number 12
# m Minute in hour Number 30
# s Second in minute Number 55
# S Millisecond Number 978
# z Time zone General time zone Pacific Standard Time; PST; GMT-08:00
# Z Time zone RFC 822 time zone -0800


# --- chart settings
chartEveryNthDataPoint=1
histogramLogScale=false


# --- statistics settings
createChartCorrelationGreaterThan=0.0
createChartCorrelationLessThan=-0.0
numHistogramBuckets=10
percentiles=80,90,95,98,99


# --- Output file locations
outDir=out
deleteStatsDirs=true
outTrendDir=out/Trend
outCorrelationDir=out/Correlation
outHistogramDir=out/Histogram
outputStatsFile=SummaryStats.csv




1 comment:

  1. Good evening,

    First : let me tell you that I don't understand how come you don't have already comments to thank you about this and tell you how helpfull this tool is ! I mean statistics , trends and correlation all that in a couple of minute without effort it's amazing. So here you go THANK YOU VERY MUCH.

    Second : regarding my personal usage, I found your application very easy to use and something that brought me quickly important input on performance data that I have gathered overtime.

    Third : I have two little suggestion / addition that you could be interesting, I think :
    a. Regarding the size of the graphs : sometimes the data that I gathered elapse multiple days, and the resolution of the graphs (600*400) start to be hard to read. Therefore I think it could be a good addition to your config to add some desired graph sizing , to allow flexibility in regards of the output according to data being analyzed.

    b. As per your config, it does not seem to distinguish the value from the column and correlation include and exclude file. the correlationIncludeFile always equals the value of columnIncludeFile and correlationExcludeFile always equals the value of columnExcludeFile

    As you can see these are minor suggestions, additions to one program that already do a lot !

    Again thank you very much

    Ricardo Fonseca
    www.databaseskills.net


    ReplyDelete