I recently needed to recommend a tool to use for a scalability testing project, and I was in the fortunate situation of having some time to survey the field, and to look into the top contenders in greater depth. From an original list of over 40 candidates, I selected three finalists in the open-source and commercial categories. I then took some time to look at them in detail, to determine which tool to recommend for the ongoing scale testing effort. Since I have seen several questions about how these tools compare to each other on various mailing lists, I'm sharing my findings here in the hopes that others will find them useful.
My three finalists were Load Runner, from Mercury Interactive; JMeter, from the Apache foundation, and The Grinder, an open-source project hosted on SourceForge.
2 SUMMARY OF RESULTS
I found that I could use any of them and get a reasonably good amount of scale test coverage. Each tool has unique things it does very well, so in that sense, there is no “wrong answer.” Conversely, each of the tools I considered have unique deficiencies that will impede or block one or more of the scenarios in our test plan. So there is no “right answer” either – any option selected will be something of a trade-off.
Based on this research, I recommended The Grinder as the tool to go forward with. It has a simple, clean UI that clearly shows what is going on without trying to do too much, and offers great power and simplicity with its unique Jython-based scripting approach. Jython allows complex scripts to be developed much more rapidly than in more formal languages like Java, yet it can access any Java library or class easily, allowing us to re-use elements of our existing work.
Mercury's Load Runner had a largely attractive feature set, but I ultimately disqualified it due to shortcomings in these make-or-break areas:
Very high price to license the software.
Generating unlimited load is not permitted. With the amount of load our license allows, I will be unable to effectively test important clustered server configurations, as well as many of our “surge” scenarios.
Very weak server monitoring for Solaris environments. No support for monitoring Solaris 10.
JMeter was initially seen as an attractive contender, with its easy, UI-based script development, as well as script management and deployment features. It's UI is feature-rich and this product has the Apache branding. It was ultimately brought down by the bugginess of it's UI though, as several of it's key monitors gave incorrect information or simply didn't work at all.
3 Comparison Tables
All the items in the tables below are discussed in greater detail in the following sections. These tables are to give a quick overview
3.1 Critical Items
There are several features that are key to any scale testing effort. Items in this table are key to our efforts. Not having any of these will seriously impact our ability to generate complete scale test coverage.
* Multiple workarounds are being investigated, including calling native (libcurl) code for the most intensive downloads.
3.2 Non-Critical Items
Items in this section are not make-or-break to our test effort, but will go a long way to making the test effort more effective.
4.1.1 Load Runner
Mercury is extremely strong in this area for Windows testing. Unfortunately, it is very weak in unix/Solaris. For windows hosts, Load Runner uses the native performance counters available in perfmon. This allows monitoring myriad information from the OS, as well as metrics from individual applications (such as IIS) that make their information available to perfmon.
For Solaris hosts, Load Runner is restricted to the performance counters available via rpc.rstatd. This means some very basic information on CPU and memory use, but not much else. Note that Load Runner does not currently support any kind of performance monitoring on Solaris 10.
JMeter has no monitoring built in. Thus, wrapper scripts are required to synchronize test data with external perf monitoring data. This is the approach I used to great effect with our previous test harness. The advantage of this method is I can monitor (and graph!) any information the OS makes available to us. Since the amount of data to us is quite large, this is a powerful technique.
4.1.3 The Grinder
The same wrapper-based approach would be required here as I detailed above for JMeter.
This is a make-or-break item. There are many scenarios I just can't cover if I can only open a few thousand socket connections to the server.
4.2.1 Load Runner
Load runner restricts the number of vusers you can run. Even large amounts of money only allow a licence for a modest number of users. Historically, the rate for 10,000 HTTP vusers has been $250,000. However, on a per agent basis, load is generated very efficiently so it may take less hardware to generate the same amount of load. (But for the money you spend on the load runner license, you could buy a LOT of load generation hardware!)
Since this is Free/Open Source, you may run as many agents as you have hardware to put them on. You can add more and more load virtually forever, as long as you have more hardware to run additional agents on. However, in specific unicast scenarios, such as repeatedly downloading very large files (like PIPEDSCHEDULE), the ability of agents to generate load falls off abruptly due to memory issues.
4.2.3 The Grinder
In this matter the Grinder's story is the same as JMeter. The limit is only the number of Agents. The Grinder suffers the same lack of ability to effectively download large files as JMeter. A workaround that uses native code (libcurl) to send requests is being investigated.
4.3.1 Load Runner
No. Hands-free runs can be scheduled with the scheduler, but multiple specific scenarios cannot be launched from the command line. This may be adequate for single tests; it's not clear how this would work if a series of automated tests was desired.
Yes, the ability to do this is supported out of the box. However, it can only be run from a single agent; the distributed testing mechanism requires the UI. So for automated nightly benchmarks it may be ok, but for push-to-failure testing where much load is required, the UI is needed. It would presumably be possible to have a wrapper script launch JMeter in batch mode at the same time on multiple agents. This would achieve arbitrary levels of load, but would not have valid data for collective statistics like total transactions per second, total transactions, etc.
4.3.3 The Grinder
As with JMeter, a single agent can be run from the command line. See JMeter comments, above.
4.4 Ease of Use
4.4.1 Load Runner
Installation takes a ton of time, a lot of disk space, and a very specific version of Windows. But it's as simple as running a windows installer, followed by 3 or 4 product updaters.
184.108.40.206 Setting up Simple tests
For HTTP tests, Load Runner is strong in this category, with it's browser recorder and icon-based scripts.
220.127.116.11 Running Tests
The UI of the controller is complex and a bit daunting. There is great power in the UI if you can find it.
Be sure Sun's JRE is installed. Unpack the tar file. Simple.
18.104.22.168 Setting up Simple tests
Very quick. Start up the console, a few clicks of the mouse, and you are ready to generate load. Add thread group, add a sampler, and you have the basics. Throw in an assertion or two on your sampler to validate server responses.
22.214.171.124 Running Tests
Both distributed and local tests can be started form the UI. A menu shows the available agents, and grays out the ones that are already busy. Standalone tests can be started from the command line. JMeter wins this category hands down.
4.4.3 The Grinder
Installation is as simple as installing java, and unpacking a tar file.
126.96.36.199 Setting up Simple tests
Setting up tests, even simple tests, requires writing Jython code. So developer experience is important. A proxy script recorder is included to simplify this. In addition, there are many useful example scripts included to help you get started.
188.8.131.52 Running Tests
Involves configuring a Grinder.properties file, manually starting an agent process, manually starting the console, then telling the test to run from within the console UI.
Having key graphs generated at the conclusion of a scale run, such as load over time, server CPU, transactions per second, etc, can save a lot of tedium, since manually generating these graphs from log files is quite time consuming.
184.108.40.206 Load Runner
Load runner has an excellent integrated analysis tool that can dynamically generate graphs on any of the myriad performance counters available to it. The downfall of this approach is that there are only a small number of performance metrics it can gather on Solaris. And while I can gather additional server metrics using sar, vmstat, dtrace, iostat, mpstat, etc., integrating this information in to the load runner framework will be difficult at best.
JMeter does not gather any server-side performance metrics. But it can generate a limited number of client-side graphs while the test is in progress. These graphs can be saved after the tes is over. Fortunately, all the test data is written in a standard format. So it probably makes more sense to generate all the desired graphs via shell scripts during post-processing. This is the same approach I used with our previous test harness.
220.127.116.11 The Grinder
Like with JMeter, there are no graphs generated out of the box, but with the standard-format log files, scripted post-production is reasonably straightforward, giving us a powerful and flexible view on the results.
4.5.2 Analysis tools?
18.104.22.168 Load Runner
Yes very powerful tool for doing analysis after a run. An infinite number of customized graphs can be generated. These graphs can be exported into an html report.
Nothing included. I would want to transfer over some of the scripts used in our previous test harness, or write a simple tool that dumps test data into a DB for post-analysis.
22.214.171.124 The Grinder
Nothing included. See the JMeter comments, above.
4.6.1 Load Runner
This works well in Load Runner; each agent can run as a service or an application, simplifying management. Test scripts are auto-deployed to agents.
JMeter is good here. Each agent is a server that the controller can connect to at will in real-time. Test scripts are automatically sent to each agent, centralizing management.
4.6.3 The Grinder
Grinder is the weakest here. The properties files that define how much load to apply, must be manually deployed to all agents. A wrapper shell script like the one used by our previous test harness could address this by always deploying the Jython scripts to the agents before each run.
4.7.1 Load Runner
Not really. A subset of the complete agent functionality can be had for agents running on Linux or Solaris. Non-windows agents run each vuser as a process rather than a thread, reducing the amount of load an agent can produce. The controller and VUGen both are Windows-only. And Load Runner is poor at measuring non-Windows server statistics.
Yes. Java/Swing app is platform-agnostic.
4.7.3 The Grinder
Yes. This app is based on Java, Swing, and Jython. Like JMeter, it will run anywhere you can set up a JVM.
4.8.1 Load Runner
Expect to pay in the low to mid six-figures for a license allowing any kind of robust load-generation capacity. But that's not all, there are high ongoing support costs as well. For the same kind of money I could get over 100 powerful machines to use as scale agents, as well as associated network switches, cabling, etc.
Free. (Apache License)
4.8.3 The Grinder
Free. (Grinder License)
4.9.1 Load Runner
Load Runner has the widest audience of all these tools; perhaps not surprising given its maturity as a commercial product. It's browser-recording and icon-based script development give it the lowest technical barriers to entry of any of the three products. A QA engineer with modest technical background and little to no coding skills can still be productive with tool. And it's ability to load Windows .dll's and other libraries give it a power and flexibility useful to developers and other more advanced users.
JMeter does not require developer skills to perform basic tests in any of the protocols it support out of the box. A form-driven UI allows the user to design their own scenario. This scenario is then auto-deployed to all agents during test initialization.
4.9.3 The Grinder
While it's possible that a regular QA engineer could be used to run the console and perform some testing, the tool is really more aimed at developers. This is the only tool of the three that did not include any kind of icon-based or UI-based script development. At a minimum, users will need to know how to write Python/Jython code to create simple test scripts, and the ability to write custom Java classes may be required as well, depending on the scenario.
4.10.1 Load Runner
The controller crashes occasionally under heavy load, but this is infrequent and largely manageable. Other than this, the product seems robust enough.
JMeter fares poorly in this area.
4.10.3 The Grinder
I found no issues with the Grinder, other than the previously-mentioned memory issue with large file downloads.
5.1.1 how flexible on what can be passed/failed?
126.96.36.199 Load Runner
Any arbitrary criteria can be set to define if a transaction passes. This includes but is not limited to response time, contents of response body, response code, or just about anything else.
In JMeter, samplers generate your test requests. You can add a wide variety of assertion types to any of your samplers. These will allow you assert on response code, match regular expressions against the response body, assert on the size or md5sum of the response.
188.8.131.52 The Grinder
As with Load Runner, pass/fail criteria has merely to be defined within the test script. Criteria can be whatever you want.
5.1.2 user-defined transaction/statistic types?
184.108.40.206 Load Runner
Yes – if you get away from the icon-based view in Vugen and go to the code level, you can wrap anything you want in a transaction to get timing information, pass/fail data, etc.
Yes – done through plugins.
220.127.116.11 The Grinder
Yes – an API exists to easily wrap any Java or Jython method in a transaction.
18.104.22.168 Load Runner
This varies by the type of license purchased, with each protocol having a separate cost and a separate limit for the number of allowable VUsers. The potential number of protocols is extremely high, including Java, ODBC, FTP, HTTP, and others.
Supports several protocols out of the box:
22.214.171.124 The Grinder
The Grinder only supports HTTP out of the box.
5.2.2 Can transactions wrap custom (non-http) protocols? Can transactions wrap multiple (http or other) requests to the server?
126.96.36.199 Load Runner
Yes. There are multiple ways to do this. You can implement your own protocol handler in a .dll or in Load Runner's pseudo-c. Then you can invoke this handler from any type of VUser that you have a license for. Alternately, unless your protocol is something uncommon, you can probably buy a pre-existing implementation of your protocol, and licenses for VUsers to run this protocol.
Yes. An external Java plugin that supports your protocol must be added in to JMeter to support this.
188.8.131.52 The Grinder
Any protocol can be tested with the Grinder. An HTTP plugin is included. In other cases, you will create a separate Java class that implements a handler for your protocol. In your test script, you will wrap this Java class in a Grinder test object. Your protocol is used/invoked by calling any method you want from your java class via the test wrapper. The wrapper will pass/fail the transaction based on response time.
This default behavior can be overridden with additional code in your Jython script. For example, after invoking your protocol method, you could inspect the state of your Java object and pass/fail the transaction based on information there.
I have typically seen libraries like Apache's HTTPClient max out the CPU to 100% when it's conducting high-bandwidth, large file downloads. The library supports high bandwidth use and many transactions per second just fine, but has issues with repeated large file downloads.
5.3.1 Load Runner
Per-agent load generation capacity is strong. Licensing constraints may limit actual load generated.
With the exception of the high-bandwidth case, per agent capacity is good.
5.3.3 The Grinder
Runs out of memory when repeatedly downloading large documents in many threads. Currently, there does not seem to be a workaround inside The Grinder itself. However, with my previous test harness I was able to work around this same issue by calling native code, and there is reason to believe that approach may work here as well.
Assuming a large range of valid IP addresses assigned to the agent machines, does the test harness support binding outgoing requests to arbitrary IP addresses? The ability to support this is critical for out test effort. If all broker requests come in from the same IP address, the broker thrashes unrealistically as it continually updates customer settings.
5.4.1 Load Runner
Yes. (see link in appendix 1)
JMeter is weak here. There is a new mechanism (not yet released but available in nightly builds) where outbound requests can round-robin on a predetermined list of local IPs. This is not good enough for Fat Client simulation.
5.4.3 The Grinder
The local IP address to bind the outbound request to can be specified in the Jython scripts. This is just what I need.
5.5.1 Load Runner
Load Runner supports this out of the box.
JMeter does not support this out of the box, but there is a slow socket implementation in the wild, written for the Apache HTTP Client (which JMeter uses), that should be possible to drop in fairly easily.
5.5.3 The Grinder
The Grinder does not support this. It may be possible with additional code hacking, but the path for this is not clear. Their third-party HTTP implementation means writing a custom solution may be challenging. Perhaps it would be possible using JNI and libCurl, although the author of the libCurl binding suggest there may be a memory leak in the C layer.
5.6.1 Load Runner
Windows .dll's may be loaded. Home-made libraries written in Load Runner's pseudo-C libs work fine as well. Additionally, function libraries can be embedded directly in the virtual user script.
External Java libraries can be accessed via the plugin architecture.
5.6.3 The Grinder
The Grinder offers lots of flexibility for loading and executing third party libraries. With Jython, any Java code may be called, and most python code may be run unchanged. And there is a decent collection of example scripts that comes with the Grinder distribution.
5.7.1 Load Runner
Load Runner has a powerful, UI-based scheduling tool which allows you great flexibility to schedule arbitrary amounts of load over time. Load can be incrementally stepped up and stepped down, by single threads or entire groups. There is a graphical schedule builder that can generate schedules of arbitrary complexity.
JMeter has UI-based scheduling that allows per-thread startup delays, as well as runs that start in the future. JMeter tests can run forever, for a specified time interval, or for a specified number of iterations for each thread.
5.7.3 The Grinder
No per-thread ramp-in. No generic scheduling tool. Primitive per-process (instead of per-thread) scheduling is possible but use of this feature probably reduces an Agent's maximum load-generation capacity, as the overhead of running a new process is far greater than the overhead of creating a new thread.
6.1.1 Load Runner
Load runner probably handles as much or more real-time data as any product out there. But they do it effectively. If you give the controller a beefy box to run on, you should have no problems.
Limited. The amount of transaction monitors you can have running is configurable. If more that one or two are going and the agents are producing a lot of transaction data, the UI takes all the CPU, bogs down and becomes unusable.
6.1.3 The Grinder
The grinder does very well here, probably better than Load Runner. By design, the agents only send a limited amount of real-time data back to the controller during a test run. And the sampling period is adjustable with a big friendly slider. This is a handy feature I didn't fully appreciate at first – if the network bandwidth numbers are updating too fast, it's hard to see how many digits are in the number before it updates again. But with the slider, you can lock that number down for enough time to really consider it.
6.2.1 Load Runner
Load Runner features very strong real-time monitoring in the controller. Client side graphs, such as total transactions per second, errors per second, can be displayed next to server side graphs like CPU use and disk activity. The user can drag and drop from a list of dozens of graph types.
Basic, table-based monitoring similar to what is in our previous test harness works properly. Other monitors threw null pointer exceptions.
6.2.3 The Grinder
The Grinder is good here. It has simple, sliding performance graphs for all transactions in one tab. These graphs are similar to what you see in the Windows Task Manager, where performance metrics older than a given amount of time slide off the left side of the graph. In addition, as in our previous test harness or JMeter, there is numeric data that periodically updates in a table.
Sometimes while a test is in progress, you want to make adjustments. Increase the load. Decrease the load. Bring another agent online.
6.3.1 Load Runner
Load Runner wrote the book on this topic, with its highly-flexible ability to start and stop load in the middle of a test, with individual agents, groups of agents, or the entire set of agents.
JMeter has the ability to interactively start and stop load on an agent-by-agent basis. It cannot interactively be done at the per-thread level, but agents and thread groups can have schedulers assigned to them.
6.3.3 The Grinder
The Grinder console does not have the ability to dynamically adjust the levels of load being generated by the agents. Coupled with its lack of a scheduler, this makes the Grinder the least flexible of the three tools when it comes to interactively setting load levels.
6.4.1 Load Runner
6.4.3 The Grinder
6.5.1 Load Runner
Load Runner comes with a powerful script-development tool, VUGen. This gives the test developer the option of developing icon-based test scripts, as well as the traditional code-view development environment. In addition. Load Runner can record web browser sessions to auto-generate scripts based on the recorded actions.
Scripts are based on XML. They can be written in your preferred text editor, or created in an icon-based UI in the controller window. I found this feature to be both easy to use and surprisingly flexible. There is also a recorder feature to let you interactively create your scripts.
6.5.3 The Grinder
The Grinder is the weakest of the three here. It does have a TCP Proxy feature that can record browser sessions into Jython scripts. But there is no integrated graphical environment for script development
I selected The Grinder due to several make-or-break issues. However, each tool has unique strengths and weaknesses. Which tool is ultimately best for you depends on a number of things, such as:
Does you budget allow for an expenditure ranging from several tens to hundreds of thousands of dollars?
Will you be testing in a windows-only environment?
What is the technical level of your scale testers?
Both of the open source projects have merits, but neither one is ideal. My approach will be to work with the Grinder development team to resolve the most serious offenders.
8 Appendix 1 – Additional information
Load Runner system requirements (controller must be on Windows!)
Linux/Solaris server monitoring (weak)
JMeter home page
The Grinder home page
The Grinder Manual
Windows IP address multi homing
9 Appendix 2 – Distinguishing features
These are some of the distinguishing features of each product:
Cool with Load Runner
highly developed, mature product
It is complex, but feature-rich
Problems w/ Load Runner
Extreme cost, both up front and ongoing
Limited load generation capacity based on license/key.
Limited ability to monitor server stats outside of windows.
Cool w/ Grinder
Jython scripting means rapid script development
Jython simplifies coding complex tasks
Good real-time feedback in the UI in most tabs.
Sockets based agent/controller communications. Trouble-free in our testing.
Problems w/ Grinder:(Since this original article was posted, many of these issues have been addressed. See the blog entry titled "The Grinder: Addressing the Warts.")
no scheduling; load is all-or-nothing
- no slow sockets, no prospects for easily fixing this
- Memory failures in a few large file download scenarios.
- Less technical expertise required
- Overall more “slick” or “polished” feel – availability of startup scripts, more utility in the UI.
- Limited feedback in the UI when the test is running
- Memory and CPU issues when downloading very large files
- The UI is buggy. Big pieces, including monitors, just don't work. Many Null Pointer Exceptions in the log, etc.