Saturday, December 03, 2011

Quickstatd: A simple tool to get performance data into Graphite

I recently wanted to start tracking OS-level performance metrics for a group of servers, and see the results charted in Graphite. My initial thought was to do this using collectd, with the graphite plugin. In many cases, using collectd is a great way to go -- it's been well tested and has a robust feature set.

But collectd wasn't a good fit for my environment. My servers do not have internet access. I was not able to install by copying the .rpm over to the servers, due to unmet dependancies. I decided I wanted something dead simple to install and operate, with no external dependencies.

The result was quickstatd: a small set of bash scripts, easily extendable, that forwards the performance data generated from standard system tools like vmstat, iostat, and sar, to a running instance of Graphite. I am providing it here in the hopes that it will be useful to others.

https://bitbucket.org/travis_bear/quickstatd

Here are two graph examples. Graphite generated these using data provided by quickstatd's vmstat plugin.


Quickstatd has no external dependencies. Just extract the tarball, run the installer, and you're good to go. You can configure the services you want to record in /etc/quickstatd.conf. The tool is started and stopped via /etc/init.d/quickstatd.





Quickstatd is tested and works on CentOS (Fedora) and Ubuntu (Debian).  With small modifications it could be made to work on OS X, Solaris, etc.

The vmstat plugin should work on most versions of Linux, but the systat package must be installed if you wish to use the 'sar' and 'iostat' monitoring tools on Debian/Ubuntu.

Friday, December 02, 2011

Visualizing Grinder Data With Other External Metrics

OVERVIEW

The Grinder is a great tool for doing load and performance testing. But there has never been a good set of tools available to integrate data generated by The Grinder with data from other sources. For example, a reasonable thing for a performance engineer to want to see might be a chart containing the transactions per second data generated by the grinder along with the server-side CPU data generated by vmstat. While not impossible to do, there have never been any standard tools or processes available to make this task easy.
I recently decided to work on that problem. I wanted to find a good way to tie Grinder data together with a wide variety of other available performance metrics. I was willing to write new code as needed, but started with a heavy bias for using existing tools. I came up with a solution that uses three different data-collection tools, with Graphite as the unifying back end and data visualizer.

1. GRAPHITE FOR VISUALIZATION

Graphite is a great tool for storing and visualizing time-series data collected from a variety of different sources.

1.1 Key Features

Simplicity -- Graphite is always on. Your data collectors are always on. You no longer have to remember to start server monitoring along with your Grinder run, or write wrapper scripts that start your Grinder agent and your server monitoring at the same time. Just kick off your grinder test, then look at the results when you're done.
It's easy to integrate data from a variety of sources into Graphite. For many common forms of data, tools to move it into Graphite have been in place for a long time.
Graph creation is flexible and simple. In the Graphite UI it's easy to build new types of graphs interactively. And graphs can be generated programmatically using the Graphite API.
It's possible to save dashboards with a pre-configured collection of your favorite graph types. This is handy, since it saves you from having to reconfigure your aggregate graphs after each Grinder run.
Graphite is not the only tool available fo manage time series data, but it is in wide use. Lots of shops use Graphite, which makes it more likely you can find the help (or the tools) you need.

1.2 Why not Cacti or Ganglia?
Cacti and Ganglia are both great tools. In the place I work, our ops team uses ganglia extensively to monitor what’s going on in production, so I initially started this effort with a preference for using Ganglia as the back-end instead of Graphite.
Ganglia and Cacti are both built atop RRD. Unfortunately, RRD assumes all its incoming data is happening in real time. There are no good options for sending old, timestamped data to an RRD back-end. This rules it out for processing the non-realtime data contained in logs from completed Grinder runs.

Graphite is not built on top of RRD. It uses an alternate data storage layer named Whisper. Whisper was specifically designed to get around this issue, and to be able to store intermittent data. This is perfect for the Grinder, which produces data in separate blocks of time for each test.

For more information on Graphite (including documentation, download links, setup instructions, etc.) see the Graphite web site: http://graphite.wikidot.com/start

2. GATHERING GRINDER METRICS

Up until now, there has been no good way to get Grinder data into Graphite. This was the piece that had no pre-existing solution, and required a new tool to be written. What I came up with is Graphite Log Feeder, available under the GPL at https://bitbucket.org/travis_bear/graphitelogfeeder

Graphite Log Feeder (GLF) parses your Grinder data logs and forwards the performance data to a running instance of Graphite. As with the existing Grinder Analyzer tool, (http://track.sourceforge.net/) you have the option to specify a list of response time groupings. GLF runs in CPython, Jython, and pypy.
Once your Grinder data is imported into Graphite, you can use it to construct arbitrary graphs.

2.1 Example Graphs
Here are examples of graphs I just threw together in a few minutes. You are certainly not limited to what you see here; the number of possible ways to combine your data is vast, so with time and experimentation, you can come up with whatever presentation you need. In this test load was increasing steadily over time for half an hour.

2.2 GLF limitations

With GLF you have no direct visibility into the test summary data generated for you at the end of the Grinder agent out_* file. For this, Grinder Analyzer is still your best bet.
No easy way (that I have yet discovered) to zoom in the Graphite UI to the specific block of time where your test has run
Although your OS and application-level metrics are available to Graphite in real-time (see below), GLF is only able to make your Grinder data available after your test run has completed.

2.3 Why not use Logster?
Before writing GLF, I assumed that I would be using Logster (https://github.com/etsy/logster) to transfer my Grinder data into Graphite. Unfortunately, when I started digging into Logstster, I discovered that it (like Cacti and Ganglia, see section 1.2, above) assumes all the data it processes is real-time. There is no support for ingesting old or timestamped data. This made it unsuitable for processing Grinder logs.

3. GATHERING OS-LEVEL METRICS

There are a variety of tools available for getting OS-level performance metrics (memory, disk use, CPU use) into Graphite.

3.1 quickstatd
Quickstatd is a realtime, bash-script based approach that has no external dependencies. It’s a good match in cases where you just want to get something simple up and running quickly. (Thus the name, quickstatd.) For additional detail and background, see my posting on quickstatd. For downloads and other info, see https://bitbucket.org/travis_bear/quickstatd

3.2 collectd
Collectd (with the graphite plugin) is a good choice for a production environment. It’s well-tested, with a robust feature set.

3.3 Example graphs

Here are graphs made from quickstatd metrics. The grinder test is the same one run in section 2.1 (above).

4. GATHERING APPLICATION-LEVEL METRICS

4.1 JAVA / JMXTRANS
Where I work, most of our servers are running on Java in Tomcat. Tomcat exposes a ton of information about its run state via JMX. We expose quite a bit of information in our own application code that we'd like to track as well.
We use jmxtrans (http://code.google.com/p/jmxtrans/) to capture these JVM-internal metrics, and forward them to Graphite. With this approach, we can look inside our running apps to see what's happening internally any time we want.

Here are some graphs of JMX statistics captured by jmxtrans. The examples here are of Tomcat metrics (memory use and thread counts) but they could just as easily be for anything your app exposes via JMX.

4.2 StatsD
I found the topics discussed in this blog post both powerful and compelling.
http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/
Although the place I work is primarily a Java shop, I’m investigating using StatsD for the few cases where we run apps inside other platforms like Apache HTTP Server, or Django.

5. INTEGRATING THE DATA

With all the separate pieces described above up and running, we can go into the Graphite UI to mix-and-match our metrics, creating graphs of data from different sources. Here's a chart containing data from both The Grinder and vmstat.

It took about ten seconds to set that graph up. This kind of simplicity allows us to interactively correlate all kinds of different data. We are currently only scratching the surface of what's possible, and are very excited to see where this takes us.

6. BEST PRACTICES

There are a few things you can do with this collection of tools that will make your life a little easier.
6.1 Repeatability
In general, repeatability is good, whereas running with a wide variety of test scripts on an ever-changing mix of hardware is asking for a headache. Every single metric you generate, on each machine, generates a new tree-view item in the Graphite UI. This can clutter up your UI, and make it harder to find the data you want. When possible, avoid cycling a bunch of different hardware in and out of your environment. And when possible, avoid changing the names of the different transactions in your Grinder scripts.
Also, each new metric results in a new Whisper database file being created on your Carbon (Graphite) server. Depending on your data-retention settings, this can wind up a significant amount of space. In my environment, every metric results in a Whisper file of 73 MB, with over 90 GB of disk space dedicated to my relatively-small environment of 14 machines.
6.2 Time synchronization
Time synchronization among all the machines in your environment (preferably with NTP) is a must! Otherwise the data from the different machines in your charts will get out of alignment, and you won't be able to accurately visualize what's really going on.

7. POSSIBLE FUTURE EFFORTS

GLF gives Grinder users abilities they have never had before. And the setup I have described here is quite useful, today. But there are other desirable features that will require additional work to achieve. Depending on time and motivation, I may take a stab at implementing some of these things in the future.

7.1 Grinder run manager

Graphite runs as a Django app. Another app could run in the same Django instance to help with a variety of test-management tasks:

let you zero you in immediately on the time range for a given test
save metadata (goals, hosts, test type, notes, etc.) on individual Grinder runs
Include some way to store and display the summary data at the end of the Grinder out_* file, similar to the way this information is displayed in Grinder Analyzer, with sortable columns, etc.

7.2 Additional GLF functionality
In addition to the stuff mentioned in section 2.2 (above), there's some stuff GLF doesn't do that would definitely be nice to have.

Extend the log feeder tool to include other types of logs: Apace and Tomcat access logs, scribe logs, etc.
Support for other backends besides Graphite. (Saturnalia, etc.)

7.3 Additional Grinder functionality

Add support in the Grinder for exposing (via JMX or StatsD) info on the number of threads that are running vs the number waiting for initial sleep to conclude
add support in the grinder for the agent process to report TPS and other info normally sent to the console to StatsD instead of (or in addition to) the console UI.

8. REFERENCES

graphitelogfeeder
https://bitbucket.org/travis_bear/graphitelogfeeder

quickstatd
https://bitbucket.org/travis_bear/quickstatd
http://blackanvil.blogspot.com/2011/12/quickstatd-simple-tool-to-get.html

GrinderAnalyzer
http://track.sourceforge.net/

jmxtrans
http://code.google.com/p/jmxtrans/

graphite
http://graphite.wikidot.com/start

The Grinder
http://grinder.sourceforge.net/

Logster
https://github.com/etsy/logster

Friday, April 16, 2010

Grinder Analyzer V2.b12 is released

There's some pretty good stuff in this one. From the release notes: Overview: http://track.sourceforge.net/analyzer.html New in V2.b12:

Grinder Analyzer is now compatible with log files generated by The Grinder 3.4
Much better support for scripts generated with the TCP proxy.
New response time graphs are included in the generated reports. These are stacked graphs showing the percentage of response time spent in different areas -- resolve host time, first byte time, etc.
It is now an option to define response time groups in analyzer.properties. Grinder analyzer will calculate the percentage of responses that complete within specified time ranges.
Fix for bug #2219789 -- In cases where multiple transactions have been assigned identical names in the grinder script, append the test number to the transaction name to ensure uniqueness and prevent graphs from overwriting each other. Thanks to Thomas Falkenberg for nudging me on this.

Download http://sourceforge.net/projects/track/files/

Wednesday, January 30, 2008

Grinder Analyzer is Released

In the last blog entry, I posted some examples of useful graphs that could be made from Grinder log data. Now I've published a tool that generates graphs like these. It analyzes data generated by The Grinder during scale runs, and outputs a series of graphs that show up in a summary table. Since I'm always wanting to know things like "which transaction is the slowest" or "which transaction has the highest error rate, columns in the report are sortable. Check it out.

Info on the tool is here: http://track.sourceforge.net/analyzer.html
Example output can be found here: http://track.sourceforge.net/example/report.html

Any questions or issues, let me know. Thanks! -Travis

Sunday, November 05, 2006

Centralized Analysis of Scale Data

The Grinder is a good tool for generating large levels of load against a server. But after the run is over, the love ends. There are no post-run analysis capabilities built in. No graphs of server-side performance metrics, or clients side data. I have begun working on an open-source data warehousing tool for test data generated by The Grinder. This tool parses the logs generated by each agent, and feeds all the information into a database. Once it has been centralized there, all sorts of post-run analysis becomes possible. I have two goals for this project:

Allow for simple analysis of Grinder runs: generate summary tables, graph server-side perf data such as CPU and disk activity, graph client side data such as response times and transactions per second.
Allow for long-term storage of test data. This will enable comparing performance of the server across many different builds, and longer-term, across many different server versions.

Current Situation Here is what is implemented today:

The database -- tables
The database -- sql scripts to generate TPS and response time graphs
The log parser and feeder classes which run on each agent
The grapher classes

This is an example of a transactions per second graph generated by the current code. In this scale run, the load was increasing over time (using the scheduling mechanism discussed at the end of my previous blog entry.)

Here is an example of a response times graph generated by the current code:

Future - Short Term In the short term, this basic functionality must be completed.

SQL code to generate reports summarizing a given scale run. Presenting information on response times, pass/fail rates, etc. in a tabular format.
Create feeders and graphers for server-side performance data, such as produced by perfmon, vmstat, sar, prstat, etc.

Future - Medium Term After the immediate needs are met, a good way of browsing the test results in the database is needed.

Storing the graphs themselves in the database rather than on the filesystem.
Implementing a Web UI for browsing scale test info in the database

Future - Long Term Currently, running a large-scale test with many agents is a cumbersome process. Each agent process must be manually started and connected to the console. In situations with, for example, 30 agent machines, this is problematic. Shell scripts exist today that can ssh in to remote agents and start them up, but they require Linux and are not a generic, cross-platform solution.

Extending the web UI to provide additional configuration management.
Extending the web UI to enable remote agent startup in a cross-platform manner, and the ability to drive a Grinder test in an AJAX environment via the console API.

Saturday, November 04, 2006

The Grinder: Addressing the Warts

Last June, I discussed the merits of The Grinder as a Load testing tool, and compared it with Load Runner and JMeter. Since then, I have been using The Grinder at work to perform load tests against our server product. In the nearly five months since my original analysis, all of my serious complaints have been addressed. In addition, several of my minor ones have been addressed as well, or workarounds have been discovered. So I thought it would be fair to revisit my original complaints, and in the cases where solutions have been found, share them.

Major Issues

Poor performance with large downloads.

This turned out to be a memory issue driven by my poor configuration. The Grinder does not have the option of eating the bytes of the HTTP response as it is coming in, or of spooling it off to disk. It keeps the entire body of the response in memory until the download is complete. In my test, I was requesting a web resource that was very large (4MB - 6MB). With its default heap size, and many threads running simultaneously, the agent process simply could not handle this level of traffic.

The trick was in how to give the agent more heap. The standard technique -- passing in heap-related arguments (-Xms, -Xmx) to the JVM when starting the agent was having no effect. This is because the main Java process spawns new Java instances to do the actual work, and these sub-instances had no knowledge of my settings. To pass these heap arguments to the JVM sub-processes, add a line to your grinder.properties file, and everything will work well:

grinder.jvm.arguments=-Xms300M -Xmx1600M

In the original post, there was some discussion about using native code such as libcurl to do the heavy-lifting here, but fortunately that has proven not to be necessary..

Minor Issues

Bandwidth Throttling / Slow Sockets

Bandwidth throttling, AKA slow sockets, is the name for a feature that allows each socket connection to the server to communicate at a specified bitrate, as opposed to downloading everything at wire speed as fast as possible. This is desirable because for any given level of transactions per second, the server will be working harder if it is servicing a large number of slow connections as opposed to a small number of very fast connections. So this allows the load tester to set up a far more realistic scenario, modeling the real-world use of many devices with varying network speeds connecting to the server.

And there is good news on this issue. By request, Phil Aston, the primary developer of The Grinder, has implemented this feature. I tested early Grinder builds with this feature, and am happy to say with confidence that the current implementation behaves correctly, is robust, and scalable. Awesome.

The feaure is implemented as method HTTPPluginConnection.setBandwidthLimit -- details here:

http://grinder.sourceforge.net/g3/script-javadoc/net/grinder/plugin/http/HTTPPluginConnection.html#setBandwidthLimit(int)

Load Scheduling

There is still not full-on load scheduler for The Grinder like there is for Load Runner. Thread execution is all or nothing. However, there is a neat little workaround that takes away most of the sting. If you just want your threads to ramp in smoothly over time in a linear fashion, you can have each thread sleep for a period of time before beginning execution, with higher-numbered threads sleeping progressively longer.

Be sure the module containing your TestRunner class imports the grinder:

from net.grinder.script.Grinder import grinder

Then put this code into the __init__ method of your TestRunner class:

      # rampTime is defined outside this snippet, it is the amount
      # of time (in ms) between the starting of each thread.
      sleepTime=grinder.threadID * rampTime
      grinder.sleep( sleepTime, 0 )

Results Reporting

The Grinder's results reporting an analysis features are weak. I am currently working on a data-warehousing system that stores the transaction data from each run, enabling detailed post-run processing and analysis. Details in subsequent blog entries.

Wednesday, June 14, 2006

Shootout: Load Runner vs The Grinder vs Apache JMeter

1 INTRODUCTION

I recently needed to recommend a tool to use for a scalability testing project, and I was in the fortunate situation of having some time to survey the field, and to look into the top contenders in greater depth. From an original list of over 40 candidates, I selected three finalists in the open-source and commercial categories. I then took some time to look at them in detail, to determine which tool to recommend for the ongoing scale testing effort. Since I have seen several questions about how these tools compare to each other on various mailing lists, I'm sharing my findings here in the hopes that others will find them useful.

My three finalists were Load Runner, from Mercury Interactive; JMeter, from the Apache foundation, and The Grinder, an open-source project hosted on SourceForge.

2 SUMMARY OF RESULTS

I found that I could use any of them and get a reasonably good amount of scale test coverage. Each tool has unique things it does very well, so in that sense, there is no “wrong answer.” Conversely, each of the tools I considered have unique deficiencies that will impede or block one or more of the scenarios in our test plan. So there is no “right answer” either – any option selected will be something of a trade-off.

Based on this research, I recommended The Grinder as the tool to go forward with. It has a simple, clean UI that clearly shows what is going on without trying to do too much, and offers great power and simplicity with its unique Jython-based scripting approach. Jython allows complex scripts to be developed much more rapidly than in more formal languages like Java, yet it can access any Java library or class easily, allowing us to re-use elements of our existing work.

Mercury's Load Runner had a largely attractive feature set, but I ultimately disqualified it due to shortcomings in these make-or-break areas:

Very high price to license the software.
Generating unlimited load is not permitted. With the amount of load our license allows, I will be unable to effectively test important clustered server configurations, as well as many of our “surge” scenarios.
Very weak server monitoring for Solaris environments. No support for monitoring Solaris 10.

JMeter was initially seen as an attractive contender, with its easy, UI-based script development, as well as script management and deployment features. It's UI is feature-rich and this product has the Apache branding. It was ultimately brought down by the bugginess of it's UI though, as several of it's key monitors gave incorrect information or simply didn't work at all.

3 Comparison Tables

All the items in the tables below are discussed in greater detail in the following sections. These tables are to give a quick overview

3.1 Critical Items

There are several features that are key to any scale testing effort. Items in this table are key to our efforts. Not having any of these will seriously impact our ability to generate complete scale test coverage.

Item	Load Runner	JMeter	The Grinder
Solaris Monitoring	-	neutral	neutral
Unlimited Load generation	-	+	+
Supports IP spoofing	+	-	+
Large download performance	+	-	- *

* Multiple workarounds are being investigated, including calling native (libcurl) code for the most intensive downloads.

3.2 Non-Critical Items

Items in this section are not make-or-break to our test effort, but will go a long way to making the test effort more effective.

3.2.1 General

Item	Load Runner	JMeter	The Grinder
Server monitoring	mixed	neutral	neutral
Batch Mode	-	+	+
Ease - Installation	-	+	+
Ease – Script Authoring	+	+	mixed
Ease – Running Tests	neutral	+	neutral
Results Reporting	+	-	-
Agent Management	+	+	-
Cross Platform	-	+	+
Cost	-	+	+
Technical Level	+	+	-
Stability/Bugginess	neutral	-	neutral

3.2.2 Agents

Item	Load Runner	JMeter	The Grinder
Transaction power	+	neutral	+
Custom protocols	+	+	+
Out-of-the-box protocols	+	neutral	-
Transaction aggregation	+	-	+
Scalability of Agent	+	neutral	neutral
Slow sockets	+	neutral	-
External libs usable	+	+	+
Load Scheduling	+	+	-
Ease of porting JCS	-	neutral	+

3.2.3 Controller

Item	Load Runner	JMeter	The Grinder
Scalability of Controller	neutral	-	+
Real-time test monitoring	+	-	neutral
Real-time load adjustment	+	-	-
Script management	+	+	-
Script Development Environment	+	+	-

4 GENERAL

4.1 Server Monitoring -- Windows, Solaris, etc.

4.1.1 Load Runner

Mercury is extremely strong in this area for Windows testing. Unfortunately, it is very weak in unix/Solaris. For windows hosts, Load Runner uses the native performance counters available in perfmon. This allows monitoring myriad information from the OS, as well as metrics from individual applications (such as IIS) that make their information available to perfmon.

For Solaris hosts, Load Runner is restricted to the performance counters available via rpc.rstatd. This means some very basic information on CPU and memory use, but not much else. Note that Load Runner does not currently support any kind of performance monitoring on Solaris 10.

4.1.2 JMeter

JMeter has no monitoring built in. Thus, wrapper scripts are required to synchronize test data with external perf monitoring data. This is the approach I used to great effect with our previous test harness. The advantage of this method is I can monitor (and graph!) any information the OS makes available to us. Since the amount of data to us is quite large, this is a powerful technique.

4.1.3 The Grinder

The same wrapper-based approach would be required here as I detailed above for JMeter.

4.2 Can generate unlimited load

This is a make-or-break item. There are many scenarios I just can't cover if I can only open a few thousand socket connections to the server.

4.2.1 Load Runner

Load runner restricts the number of vusers you can run. Even large amounts of money only allow a licence for a modest number of users. Historically, the rate for 10,000 HTTP vusers has been $250,000. However, on a per agent basis, load is generated very efficiently so it may take less hardware to generate the same amount of load. (But for the money you spend on the load runner license, you could buy a LOT of load generation hardware!)

4.2.2 JMeter

Since this is Free/Open Source, you may run as many agents as you have hardware to put them on. You can add more and more load virtually forever, as long as you have more hardware to run additional agents on. However, in specific unicast scenarios, such as repeatedly downloading very large files (like PIPEDSCHEDULE), the ability of agents to generate load falls off abruptly due to memory issues.

4.2.3 The Grinder

In this matter the Grinder's story is the same as JMeter. The limit is only the number of Agents. The Grinder suffers the same lack of ability to effectively download large files as JMeter. A workaround that uses native code (libcurl) to send requests is being investigated.

4.3 Can run in batch (non-interactive) mode

4.3.1 Load Runner

No. Hands-free runs can be scheduled with the scheduler, but multiple specific scenarios cannot be launched from the command line. This may be adequate for single tests; it's not clear how this would work if a series of automated tests was desired.

4.3.2 JMeter

Yes, the ability to do this is supported out of the box. However, it can only be run from a single agent; the distributed testing mechanism requires the UI. So for automated nightly benchmarks it may be ok, but for push-to-failure testing where much load is required, the UI is needed. It would presumably be possible to have a wrapper script launch JMeter in batch mode at the same time on multiple agents. This would achieve arbitrary levels of load, but would not have valid data for collective statistics like total transactions per second, total transactions, etc.

4.3.3 The Grinder

As with JMeter, a single agent can be run from the command line. See JMeter comments, above.

4.4 Ease of Use

4.4.1 Load Runner

4.4.1.1 Installation

Installation takes a ton of time, a lot of disk space, and a very specific version of Windows. But it's as simple as running a windows installer, followed by 3 or 4 product updaters.

4.4.1.2 Setting up Simple tests

For HTTP tests, Load Runner is strong in this category, with it's browser recorder and icon-based scripts.

4.4.1.3 Running Tests

The UI of the controller is complex and a bit daunting. There is great power in the UI if you can find it.

4.4.2 JMeter

4.4.2.1 Installation

Be sure Sun's JRE is installed. Unpack the tar file. Simple.

4.4.2.2 Setting up Simple tests

Very quick. Start up the console, a few clicks of the mouse, and you are ready to generate load. Add thread group, add a sampler, and you have the basics. Throw in an assertion or two on your sampler to validate server responses.

4.4.2.3 Running Tests

Both distributed and local tests can be started form the UI. A menu shows the available agents, and grays out the ones that are already busy. Standalone tests can be started from the command line. JMeter wins this category hands down.

4.4.3 The Grinder

4.4.3.1 Installation

Installation is as simple as installing java, and unpacking a tar file.

4.4.3.2 Setting up Simple tests

Setting up tests, even simple tests, requires writing Jython code. So developer experience is important. A proxy script recorder is included to simplify this. In addition, there are many useful example scripts included to help you get started.

4.4.3.3 Running Tests

Involves configuring a Grinder.properties file, manually starting an agent process, manually starting the console, then telling the test to run from within the console UI.

4.5 Results Reporting

4.5.1 Auto-generated?

Having key graphs generated at the conclusion of a scale run, such as load over time, server CPU, transactions per second, etc, can save a lot of tedium, since manually generating these graphs from log files is quite time consuming.

4.5.1.1 Load Runner

Load runner has an excellent integrated analysis tool that can dynamically generate graphs on any of the myriad performance counters available to it. The downfall of this approach is that there are only a small number of performance metrics it can gather on Solaris. And while I can gather additional server metrics using sar, vmstat, dtrace, iostat, mpstat, etc., integrating this information in to the load runner framework will be difficult at best.

4.5.1.2 JMeter

JMeter does not gather any server-side performance metrics. But it can generate a limited number of client-side graphs while the test is in progress. These graphs can be saved after the tes is over. Fortunately, all the test data is written in a standard format. So it probably makes more sense to generate all the desired graphs via shell scripts during post-processing. This is the same approach I used with our previous test harness.

4.5.1.3 The Grinder

Like with JMeter, there are no graphs generated out of the box, but with the standard-format log files, scripted post-production is reasonably straightforward, giving us a powerful and flexible view on the results.

4.5.2 Analysis tools?

4.5.2.1 Load Runner

Yes very powerful tool for doing analysis after a run. An infinite number of customized graphs can be generated. These graphs can be exported into an html report.

4.5.2.2 JMeter

Nothing included. I would want to transfer over some of the scripts used in our previous test harness, or write a simple tool that dumps test data into a DB for post-analysis.

4.5.2.3 The Grinder

Nothing included. See the JMeter comments, above.

4.6 Simplicity of Agent management

4.6.1 Load Runner

This works well in Load Runner; each agent can run as a service or an application, simplifying management. Test scripts are auto-deployed to agents.

4.6.2 JMeter

JMeter is good here. Each agent is a server that the controller can connect to at will in real-time. Test scripts are automatically sent to each agent, centralizing management.

4.6.3 The Grinder

Grinder is the weakest here. The properties files that define how much load to apply, must be manually deployed to all agents. A wrapper shell script like the one used by our previous test harness could address this by always deploying the Jython scripts to the agents before each run.

4.7 Tool is cross-platform

4.7.1 Load Runner

Not really. A subset of the complete agent functionality can be had for agents running on Linux or Solaris. Non-windows agents run each vuser as a process rather than a thread, reducing the amount of load an agent can produce. The controller and VUGen both are Windows-only. And Load Runner is poor at measuring non-Windows server statistics.

4.7.2 JMeter

Yes. Java/Swing app is platform-agnostic.

4.7.3 The Grinder

Yes. This app is based on Java, Swing, and Jython. Like JMeter, it will run anywhere you can set up a JVM.

4.8 Cost

4.8.1 Load Runner

Expect to pay in the low to mid six-figures for a license allowing any kind of robust load-generation capacity. But that's not all, there are high ongoing support costs as well. For the same kind of money I could get over 100 powerful machines to use as scale agents, as well as associated network switches, cabling, etc.

4.8.2 JMeter

Free. (Apache License)

4.8.3 The Grinder

Free. (Grinder License)

4.9 Intended audience/technical level

4.9.1 Load Runner

Load Runner has the widest audience of all these tools; perhaps not surprising given its maturity as a commercial product. It's browser-recording and icon-based script development give it the lowest technical barriers to entry of any of the three products. A QA engineer with modest technical background and little to no coding skills can still be productive with tool. And it's ability to load Windows .dll's and other libraries give it a power and flexibility useful to developers and other more advanced users.

4.9.2 JMeter

JMeter does not require developer skills to perform basic tests in any of the protocols it support out of the box. A form-driven UI allows the user to design their own scenario. This scenario is then auto-deployed to all agents during test initialization.

4.9.3 The Grinder

While it's possible that a regular QA engineer could be used to run the console and perform some testing, the tool is really more aimed at developers. This is the only tool of the three that did not include any kind of icon-based or UI-based script development. At a minimum, users will need to know how to write Python/Jython code to create simple test scripts, and the ability to write custom Java classes may be required as well, depending on the scenario.

4.10 Stability/Bugginess

4.10.1 Load Runner

The controller crashes occasionally under heavy load, but this is infrequent and largely manageable. Other than this, the product seems robust enough.

4.10.2 JMeter

JMeter fares poorly in this area.

TODO

4.10.3 The Grinder

I found no issues with the Grinder, other than the previously-mentioned memory issue with large file downloads.

5 AGENTS

5.1 Power of transactions

5.1.1 how flexible on what can be passed/failed?

5.1.1.1 Load Runner

Any arbitrary criteria can be set to define if a transaction passes. This includes but is not limited to response time, contents of response body, response code, or just about anything else.

5.1.1.2 JMeter

In JMeter, samplers generate your test requests. You can add a wide variety of assertion types to any of your samplers. These will allow you assert on response code, match regular expressions against the response body, assert on the size or md5sum of the response.

5.1.1.3 The Grinder

As with Load Runner, pass/fail criteria has merely to be defined within the test script. Criteria can be whatever you want.

5.1.2 user-defined transaction/statistic types?

5.1.2.1 Load Runner

Yes – if you get away from the icon-based view in Vugen and go to the code level, you can wrap anything you want in a transaction to get timing information, pass/fail data, etc.

5.1.2.2 JMeter

Yes – done through plugins.

5.1.2.3 The Grinder

Yes – an API exists to easily wrap any Java or Jython method in a transaction.

5.2 Other Protocols

5.2.1 Which protocols are supported out of the box?

5.2.1.1 Load Runner

This varies by the type of license purchased, with each protocol having a separate cost and a separate limit for the number of allowable VUsers. The potential number of protocols is extremely high, including Java, ODBC, FTP, HTTP, and others.

5.2.1.2 JMeter

Supports several protocols out of the box:

jdbc
http
ftp
jndi

5.2.1.3 The Grinder

The Grinder only supports HTTP out of the box.

5.2.2 Can transactions wrap custom (non-http) protocols? Can transactions wrap multiple (http or other) requests to the server?

5.2.2.1 Load Runner

Yes. There are multiple ways to do this. You can implement your own protocol handler in a .dll or in Load Runner's pseudo-c. Then you can invoke this handler from any type of VUser that you have a license for. Alternately, unless your protocol is something uncommon, you can probably buy a pre-existing implementation of your protocol, and licenses for VUsers to run this protocol.

5.2.2.2 JMeter

Yes. An external Java plugin that supports your protocol must be added in to JMeter to support this.

5.2.2.3 The Grinder

Any protocol can be tested with the Grinder. An HTTP plugin is included. In other cases, you will create a separate Java class that implements a handler for your protocol. In your test script, you will wrap this Java class in a Grinder test object. Your protocol is used/invoked by calling any method you want from your java class via the test wrapper. The wrapper will pass/fail the transaction based on response time.

This default behavior can be overridden with additional code in your Jython script. For example, after invoking your protocol method, you could inspect the state of your Java object and pass/fail the transaction based on information there.

5.3 Capacity of single agent to generate load, particularly in high-bandwidth scenarios

I have typically seen libraries like Apache's HTTPClient max out the CPU to 100% when it's conducting high-bandwidth, large file downloads. The library supports high bandwidth use and many transactions per second just fine, but has issues with repeated large file downloads.

5.3.1 Load Runner

Per-agent load generation capacity is strong. Licensing constraints may limit actual load generated.

5.3.2 JMeter

With the exception of the high-bandwidth case, per agent capacity is good.

5.3.3 The Grinder

Runs out of memory when repeatedly downloading large documents in many threads. Currently, there does not seem to be a workaround inside The Grinder itself. However, with my previous test harness I was able to work around this same issue by calling native code, and there is reason to believe that approach may work here as well.

5.4 Can support IP spoofing

Assuming a large range of valid IP addresses assigned to the agent machines, does the test harness support binding outgoing requests to arbitrary IP addresses? The ability to support this is critical for out test effort. If all broker requests come in from the same IP address, the broker thrashes unrealistically as it continually updates customer settings.

5.4.1 Load Runner

Yes. (see link in appendix 1)

5.4.2 JMeter

JMeter is weak here. There is a new mechanism (not yet released but available in nightly builds) where outbound requests can round-robin on a predetermined list of local IPs. This is not good enough for Fat Client simulation.

5.4.3 The Grinder

The local IP address to bind the outbound request to can be specified in the Jython scripts. This is just what I need.

5.5 Can support variable connection speed/bandwidth throttling

5.5.1 Load Runner

Load Runner supports this out of the box.

5.5.2 JMeter

JMeter does not support this out of the box, but there is a slow socket implementation in the wild, written for the Apache HTTP Client (which JMeter uses), that should be possible to drop in fairly easily.

5.5.3 The Grinder

The Grinder does not support this. It may be possible with additional code hacking, but the path for this is not clear. Their third-party HTTP implementation means writing a custom solution may be challenging. Perhaps it would be possible using JNI and libCurl, although the author of the libCurl binding suggest there may be a memory leak in the C layer.

5.6 Can run arbitrary logic and external libraries within agent

5.6.1 Load Runner

Windows .dll's may be loaded. Home-made libraries written in Load Runner's pseudo-C libs work fine as well. Additionally, function libraries can be embedded directly in the virtual user script.

5.6.2 JMeter

External Java libraries can be accessed via the plugin architecture.

5.6.3 The Grinder

The Grinder offers lots of flexibility for loading and executing third party libraries. With Jython, any Java code may be called, and most python code may be run unchanged. And there is a decent collection of example scripts that comes with the Grinder distribution.

5.7 Scheduling

5.7.1 Load Runner

Load Runner has a powerful, UI-based scheduling tool which allows you great flexibility to schedule arbitrary amounts of load over time. Load can be incrementally stepped up and stepped down, by single threads or entire groups. There is a graphical schedule builder that can generate schedules of arbitrary complexity.

5.7.2 JMeter

JMeter has UI-based scheduling that allows per-thread startup delays, as well as runs that start in the future. JMeter tests can run forever, for a specified time interval, or for a specified number of iterations for each thread.

5.7.3 The Grinder

No per-thread ramp-in. No generic scheduling tool. Primitive per-process (instead of per-thread) scheduling is possible but use of this feature probably reduces an Agent's maximum load-generation capacity, as the overhead of running a new process is far greater than the overhead of creating a new thread.

6 CONTROLLER

6.1 Ability of Controller to handle high volume of agent data

6.1.1 Load Runner

Load runner probably handles as much or more real-time data as any product out there. But they do it effectively. If you give the controller a beefy box to run on, you should have no problems.

6.1.2 JMeter

Limited. The amount of transaction monitors you can have running is configurable. If more that one or two are going and the agents are producing a lot of transaction data, the UI takes all the CPU, bogs down and becomes unusable.

6.1.3 The Grinder

The grinder does very well here, probably better than Load Runner. By design, the agents only send a limited amount of real-time data back to the controller during a test run. And the sampling period is adjustable with a big friendly slider. This is a handy feature I didn't fully appreciate at first – if the network bandwidth numbers are updating too fast, it's hard to see how many digits are in the number before it updates again. But with the slider, you can lock that number down for enough time to really consider it.

6.2 Real-Time Monitoring (Controller)

6.2.1 Load Runner

Load Runner features very strong real-time monitoring in the controller. Client side graphs, such as total transactions per second, errors per second, can be displayed next to server side graphs like CPU use and disk activity. The user can drag and drop from a list of dozens of graph types.

6.2.2 JMeter

Basic, table-based monitoring similar to what is in our previous test harness works properly. Other monitors threw null pointer exceptions.

6.2.3 The Grinder

The Grinder is good here. It has simple, sliding performance graphs for all transactions in one tab. These graphs are similar to what you see in the Windows Task Manager, where performance metrics older than a given amount of time slide off the left side of the graph. In addition, as in our previous test harness or JMeter, there is numeric data that periodically updates in a table.

6.3 Real-Time Load Adjustment

Sometimes while a test is in progress, you want to make adjustments. Increase the load. Decrease the load. Bring another agent online.

6.3.1 Load Runner

Load Runner wrote the book on this topic, with its highly-flexible ability to start and stop load in the middle of a test, with individual agents, groups of agents, or the entire set of agents.

6.3.2 JMeter

JMeter has the ability to interactively start and stop load on an agent-by-agent basis. It cannot interactively be done at the per-thread level, but agents and thread groups can have schedulers assigned to them.

6.3.3 The Grinder

The Grinder console does not have the ability to dynamically adjust the levels of load being generated by the agents. Coupled with its lack of a scheduler, this makes the Grinder the least flexible of the three tools when it comes to interactively setting load levels.

6.4 Controller-side script management/deployment

6.4.1 Load Runner

Yes.

6.4.2 JMeter

Yes.

6.4.3 The Grinder

Yes.

6.5 Can write simple scripts in the UI?

6.5.1 Load Runner

Load Runner comes with a powerful script-development tool, VUGen. This gives the test developer the option of developing icon-based test scripts, as well as the traditional code-view development environment. In addition. Load Runner can record web browser sessions to auto-generate scripts based on the recorded actions.

6.5.2 JMeter

Scripts are based on XML. They can be written in your preferred text editor, or created in an icon-based UI in the controller window. I found this feature to be both easy to use and surprisingly flexible. There is also a recorder feature to let you interactively create your scripts.

6.5.3 The Grinder

The Grinder is the weakest of the three here. It does have a TCP Proxy feature that can record browser sessions into Jython scripts. But there is no integrated graphical environment for script development

7 CONCLUSION

I selected The Grinder due to several make-or-break issues. However, each tool has unique strengths and weaknesses. Which tool is ultimately best for you depends on a number of things, such as:

Does you budget allow for an expenditure ranging from several tens to hundreds of thousands of dollars?
Will you be testing in a windows-only environment?
What is the technical level of your scale testers?

Both of the open source projects have merits, but neither one is ideal. My approach will be to work with the Grinder development team to resolve the most serious offenders.

8 Appendix 1 – Additional information

Load Runner system requirements (controller must be on Windows!)

http://www.mercury.com/us/products/performance-center/loadrunner/requirements.html

Linux/Solaris server monitoring (weak)

http://www.mercury.com/us/products/performance-center/loadrunner/monitors/unix.html

JMeter home page

http://jakarta.apache.org/jmeter/

JMeter Manual

http://jakarta.apache.org/jmeter/usermanual/index.html

The Grinder home page

http://grinder.sourceforge.net/

The Grinder Manual

http://grinder.sourceforge.net/g3/getting-started.html

Windows IP address multi homing

http://support.microsoft.com/kb/q149426/

9 Appendix 2 – Distinguishing features

These are some of the distinguishing features of each product:

Cool with Load Runner

highly developed, mature product
strong support
It is complex, but feature-rich

Problems w/ Load Runner

Extreme cost, both up front and ongoing
Limited load generation capacity based on license/key.
Limited ability to monitor server stats outside of windows.

Cool w/ Grinder

Jython scripting means rapid script development
Jython simplifies coding complex tasks
Good real-time feedback in the UI in most tabs.
Sockets based agent/controller communications. Trouble-free in our testing.

Problems w/ Grinder:

(Since this original article was posted, many of these issues have been addressed. See the blog entry titled "The Grinder: Addressing the Warts.")

no scheduling; load is all-or-nothing
no slow sockets, no prospects for easily fixing this
Memory failures in a few large file download scenarios.

Cool w/ JMeter:

Less technical expertise required
Overall more “slick” or “polished” feel – availability of startup scripts, more utility in the UI.

Problems w/ JMeter:

Limited feedback in the UI when the test is running
Memory and CPU issues when downloading very large files
The UI is buggy. Big pieces, including monitors, just don't work. Many Null Pointer Exceptions in the log, etc.