Kathleen Poulsen of Fidelity Investments gave a presentation at STAREAST 2017 sharing her experience using Hexawise to improve their software testing performance. Watch a 10 minute video with highlights of that talk:

We didn't really have what I'd call a scientific methodology to approaching the tests...

Our regression test suites were continuously expanding... We found there was a repition of tests.

We had 3 different projects that I will talk about that I feel like combinatorial or pairwise testing was the key to answering all of those problems.

Hexawise allows you to harness the power of combinatorial software testing with test plans designed to provide thourogh testing of interaction impacts on the software being testing. Hexawise provides more coverage with fewer tests.

All the teams that are using Hexawise can use that same file, they can talk to each other. [Another] thing I liked about Hexawise was the coverage chart... I go back to my business partners and say I am not running these tests. If they are important to you I add them back in with the click of a button. I love that... it was a game changer for me.

Using the Hexawise exporting options

the tests that we produced were converted into the given then when type scenarios automatically and when they are exported into excel you can use them to drive the Sellenium test automation framework. No additional work from us involved.

Using Hexawise's ability to create highly optimized test plans Fidelity was able to greatly reduce the number of tests while also greatly improving coverage.

We were able to reduce from 12,000 tests down to 600.

This type of result sounds amazing, and it is. But it is also what we find consisently from clients over and over. There are certain things people just cannot do well and designing test plans to cover incredible large numbers of interactions between test values and conditions is one of those things. Using highly optimized alogorithms to create test plans to cover these interactions in order to reliably create software customers will love is key. This also frees people to do what they do best.

Kathleen also discussed the significant improvement in communication within Fidelity that was brought about by using Hexawise.

The common language has become the test plan that comes out of Hexawise today.

Improving communiction is an area many organization see as important but finding concrete ways to achieve better communication is often difficult. We have designed Hexwise to aid the communication between stakeholders, including: software developers, software testers, product owners, help desk support staff and senior management.

The simplicity of this tool along with the way you can enter your parameters using the mind map tool, getting that coverage chart automatically out of it, having it export your data into a pretty commonly usable format - those are things that were teribly important to me. They gave me real value... I love that.

I can accomodate many differnt types of testing. We are testing at the class method level, at the services interface level, at the UI level...

Related: 84% of Software Defects Found in Production Could Have Been Found Using Pairwise Testing - Create a Risk-based Testing Plan With Extra Coverage on Higher Priority Areas - 2 Minute Introduction to Hexawise Software Testing Solution

By: John Hunter on Jun 9, 2017

Categories: Combinatorial Software Testing, Combinatorial Testing, Efficiency, Hexawise, Hexawise test case generating tool, Multi-variate Testing, Pairwise Software Testing, Pairwise Testing, Recommended Tool, Software Testing, Software Testing Presentations, Software Testing Efficiency, Testing Case Studies, User Experience

Coining a New Term

I'm coining a new term today, "grapefruit juice bugs."

My inspiration for this term is a blog post in the New York Times that David Pogue wrote. I was fascinated by the post and it got me to thinking about a particular kind of bugs in software that are more common than most people may realize. You could say that these bugs are surprisingly common. In fact, if you wanted to be more precise, you could even say that this term applies to a specific type of "surprisingly common type of surprising bugs." Let me explain.

There's something about the chemical makeup of grapefruit juice that makes it interact with our biology and a large number of different drugs in ways which result in dangerous conditions. For example, certain drugs lose their effectiveness dramatically when interacting with grapefruit juice which can have life-threatening consequences. Other times, the interactions with grapefruit juice can dramatically increase a drug's potency. This can result in "safe doses" becoming very unsafe.

Grapefruit Is a Culprit in More Drug Reactions

The 42-year-old was barely responding when her husband brought her to the emergency room. Her heart rate was slowing, and her blood pressure was falling. Doctors had to insert a breathing tube, and then a pacemaker, to revive her.

They were mystified: The patient’s husband said she suffered from migraines and was taking a blood pressure drug called verapamil to help prevent the headaches. But blood tests showed she had an alarming amount of the drug in her system, five times the safe level.

Did she overdose? Was she trying to commit suicide? It was only after she recovered that doctors were able to piece the story together.

“The culprit was grapefruit juice,” said Dr. Unni Pillai, a nephrologist in St. Louis, Mo. ...

The previous week, she had been subsisting mainly on grapefruit juice. Then she took verapamil, one of dozens of drugs whose potency is dramatically increased if taken with grapefruit. In her case, the interaction was life-threatening.

Last month, Dr. David Bailey, a Canadian researcher who first described this interaction more than two decades ago, released an updated list of medications affected by grapefruit. There are now 85 such drugs on the market, he noted, including common cholesterol-lowering drugs, new anticancer agents, and some synthetic opiates and psychiatric drugs, as well as certain immunosuppressant medications taken by organ transplant patients, some AIDS medications, and some birth control pills and estrogen treatments. ... Under normal circumstances, the drugs are metabolized in the gastrointestinal tract, and relatively little is absorbed, because an enzyme in the gut called CYP3A4 deactivates them. But grapefruit contains natural chemicals called furanocoumarins, that inhibit the enzyme, and without it the gut absorbs much more of a drug and blood levels rise dramatically.

For example, someone taking simvastatin (brand name Zocor) who also drinks a small 200-milliliter, or 6.7 ounces, glass of grapefruit juice once a day for three days could see blood levels of the drug triple, increasing the risk for rhabdomyolysis, a breakdown of muscle that can cause kidney damage.

 

So what do interactions between grapefruit juice and drugs have to do with software testing?

Like grapefruit juice's impact on prescription drugs, software testing involves critical interactions between different parts of the system. And risks exist when these different parts interact with one another. This is true whether you're talking about "large parts" interact in System Testing or "small parts" interact in Unit Testing.

Interactions between things are a very rich source of bugs in software. As anyone who has heard the infernal phrase "works on my machine" can tell you, software features and functions often work perfectly fine in many usage scenarios, hardware and software configurations , etc. - only to fail to work in ever-so-slightly different situations.

 

The difference between plain old every-day "Dual-Mode Faults" and "Grapefruit Juice Bugs"

A dual-mode fault occurs whenever two test inputs must both be present to trigger a defect. Most software testers start encountering them quite frequently within days of starting their jobs. Some examples:

  • This "buy" button works fine. Except when the customer is a "new user." (First, action = "click on the buy button" and Second, customer = "new user")

  • Transaction prices for share purchases are calculated correctly. Except when denominated in Japanese Yen. (First, Action = "sell shares" and Second, Currency = "Japanese Yen")

Like grapefruit juice's impact on prescription drugs, software testing involves critical interactions between different parts of the system. And risks exist when these different parts interact with one another. This is true whether you're talking about "large parts" interact in System Testing or "small parts" interact in Unit Testing.

While all grapefruit juice bugs are dual-mode faults, not all dual-mode faults are Grapefruit Juice Bugs:

  • Grapefruit juice bugs have got to have a little of the element of surprise in them. When you explain them to a developer, their first reaction should be "Huh? How is that even possible?" or at least "Hmmm... That's odd. Let me investigate."

  • Anything along the lines of "This feature usually works, except in IE6, when..." is almost definitely not a grapefruit juice bug. Problematic interactions with IE6 are an incredibly common type of dual-mode fault, not a surprising one.

Whenever you hear "works on my machine" replies to your bug reports, and it takes a while for the issue to be replicated, odds are pretty good that a grapefruit juice bug might be involved.

Here's an example of an especially surprising grapefruit juice bug. This excerpt from Apple's online help files that the company posted after users of the original iPad complained about problems with Wi-Fi connectivity. Certain screen brightness settings were causing problems with the Wi-Fi signals. I'm not even to begin to guess how one would have anything to do with the other.

Auto-Scripting-Exercises-at-1.30.13-PM1

How to identify grapefruit juice bugs during your testing?

What is a tester to do when faced with more possible potential grapefruit juice bugs than he can handle using traditional methods?

If you're a software tester trying to do your best to determine whether a feature or function in your System Under Test will work "on everyone's machine," you've got a nightmare on your hands . Really nasty combinatorial explosions arise when you consider all of the possible combinations that would be required to test multiple hardware options, multiple software options, multiple usage scenarios, multiple test data inputs (and multiple combinations of the test data itself), multiple ways in which users enter data, and all of the rest of the "stuff that could vary" when people use your application. If you take the time to think expansively about the possible variations in a medium-sized applications, Quadrillions of possible tests often result.

While not eating grapefruit and not drinking grapefruit juice might be wise if you are taking drugs, there is rarely, if ever, such an easy method for eliminating the possibility of negative results due to software interactions. Refusing to support IE 6 in order to avoid the disproportionate number of grapefruit juice-like problematic interactions associated with IE6 would be as close as you could come in the world of software.

Design of Experiments-based test design methods can help testers come to grips with this challenge. Orthogonal array software testing (often referred to as OATS or simply OA testing) is a test design strategy that allows us to efficiently detect bugs created by interactions within the system. Orthogonal array software testing is based on the principles of multifactor designed experiments as first explored by Sir RA Fisher.

Design of Experiments-based test design methods are very-closely related to pairwise testing (AKA allpairs testing, all pairs testing, and pairwise-testing). Any of these test design strategies will allow a software tester to quickly generate a set of tests that includes tests for every single pair of test inputs.

This approach to test design often has multiple advantages, including faster test creation, more varied test scenarios, 100% coverage of all potential dual-mode faults (including hard-to-predict grape-fruit juice bugs), and often a smaller resulting set of tests that will be quicker to execute. Having said that, it is by no means a magical silver bullet. This approach to test design requires test designers with above average analytical abilities to identify the appropriate Parameters and Values for their system under test; this is sometimes easier said than done because it requires a new mindset from test designers.

Software testers can take solace that the challenges of software testing, while significant, are simple when compared to trying to understand the effects of drug interactions in people.

Combinatorial testing can look at bugs created by the interaction between multiple (3, 4, 5, 6...) variables. So if there was a bug that didn't get triggered just by using Chrome on Windows but it would get triggered if you also tried to replace an existing photo in your profile with a new profile photo into your profile (test idea number 3), then pairwise testing might not catch it. Pairwise test design would create a set of tests that would include at least one test for each of these pairs:

  • Chrome & Windows and

  • Chrome & replace photo and

  • Windows & replace photo, but...

A set of pairwise might not fail to test for the specific combination of all three of those test inputs in the same test. With the use of combinatorial test design approaches, you could create test plans with 100% coverage for 3 way interactions and be sure that all 3-way interactions or 4-way interactions are covered. When you create sets of 3-way tests, 4-way tests, 5-way tests, and 6-way tests though, you'll quickly discover that the number of tests required starts to balloon.

Hexawise allows you to create test plans with the coverage interactions you desire. This allows you to create sets of tests from 2-way up all the way up to phenomenally-thorough 6-way sets of tests. In fact, it even lets you generate clever sets of risk-based tests that will, say, prioritize comprehensive 4-way coverage on 4 sets of Parameter Values while ensuring only pairwise coverage of the other, lower-priority, interactions in your system under tests. Hexawise also lets you create mixed strength test plans so if you have certain factors that you are very concerned about and want to provide coverage for more possible interactions you can set the interaction levels for those at a higher level.

 

Related: Hexawise Tip: Using Value Expansions and Value Pairs to Handle Dependent Values - Maximize Test Coverage Efficiency And Minimize the Number of Tests Needed - How to Model and Test CRUD Functionality - 25 Great Quotes for Software Testers

By: Justin Hunter on Feb 11, 2014

Categories: Bugs, Combinatorial Testing, Design of Experiments, Multi-variate Testing, Multi-variate Testing, Pairwise Software Testing, Software Testing, Testing Strategies

We have created a new site to highlight Hexawise videos on combinatorial, pairwise + orthogonal array software testing. We have posted videos on a variety of software testing topics including: selecting appropriate test inputs for pairwise and combinatorial software test design, how to select the proper inputs to create a pairwise test plan, using value expansions for values in the same equivalence classes.

Here is a video with an introduction to Hexawise:

 

 

Subscribe to the Hexawise TV blog. And if you haven't subscribed to the RSS feed for the main Hexawise blog, do so now.

By: John Hunter on Nov 20, 2013

Categories: Combinatorial Testing, Hexawise test case generating tool, Multi-variate Testing, Pairwise Testing, Software Testing Presentations, Testing Case Studies, Testing Strategies, Training, Hexawise tips

Recently, the following defect made the news and was one of the most widely-shared articles on the New York Times web site. Here's what the article, Computer Snag Limits Insurance Penalties on Smokers said:

A computer glitch involving the new health care law may mean that some smokers won’t bear the full brunt of tobacco-user penalties that would have made their premiums much higher — at least, not for next year.

The Obama administration has quietly notified insurers that a computer system problem will limit penalties that the law says the companies may charge smokers, The Associated Press reported Tuesday. A fix will take at least a year.

 

Tip of the Iceberg

This defect was entirely avoidable and predictable. Its safe to expect that hundreds (if not thousands) of similar defects related to Obamacare IT projects will emerge in the weeks and months to come. Had testers used straightforward software test design prioritization techniques, bugs like these would have been easily found. Let me explain.

 

There's no Way to Test Everything

If the developers and/or testers were asked how could this bug could sneak past testing, they might at first say something defensive, along the lines of: "We can't test everything! Do you know how many possible combinations there are?" If you include 40 variables (demographic information, pre-existing conditions, etc.) in the scope of this software application, there would be:

41,231,686,041,600,000

possible scenarios to test. That's not a typo: 41 QUADRILLION possible combinations. As in it would take 13 million years to execute those tests if we could execute 100 tests every second. There's no way we can test all possible combinations. So bugs like these are inevitably going to sneak through testing undetected.

 

The Wrong Question

When the developers and testers of a system say there is no way they could realistically test all the possible scenarios, they're addressing the wrong challenge. "How long would it take to execute every test we can think of?" is the wrong question. It is interesting but ultimately irrelevant that it would take 13 million years to execute those tests.

 

The Right Question

A much more important question is "Given the limited time and resources we have available for testing, how can we test this system as thoroughly as possible?" Most teams of developers and software testers are extremely bad at addressing this question. And they don't realize nearly how bad they are. The Dunning Kruger effect often prevents people from understanding the extent of their incompetence; that's a different post for a different day. After documenting a few thousand tests designed to cover all of the significant business rules and requirements they can think of, testers will run out of ideas, shrug their shoulders in the face of the overwhelming number of total possible scenarios and declare their testing strategy to be sufficiently comprehensive. Whenever you're talking about hundreds or thousands of tests, that test selection strategy is a recipe for incredibly inefficient testing that both misses large numbers of easily avoidable defects and wastes time by testing certain things again and again. There's a better way.

 

The Straightforward, Effective Solution to this Common Testing Challenge: Testers Should Use Intelligent Test Prioritization Strategies

If you create a well-designed test plan using scientific prioritization approaches, you can reduce the number of individual tests to test tremendously. It comes down to testing the system as thoroughly as possible in the time that's available for testing. There are well-proven methods for doing just that.

 

There are Two Kinds of Software Bugs in the World

Bugs that don't get found by testers sneak into production for one of two main reasons, namely:

  • "We never thought about testing that" - An example that illustrates this type of defect is one James Bach told me about. Faulty calculations were being caused by an overheated server that got that way because of a blocked vent. You can't really blame a tester who doesn't think of including a test involving a scenario with a blocked vent.

  • "We tested A; it worked. We tested B; it worked too.... But we never tested A and B together." This type of bug sneaks by testers all too often. Bugs like this should not sneak past testers. They are often very quick and easy to find. And they're so common as to be highly predictable.

 

Let's revisit the high-profile bug Obamacare bug that will impact millions of people and take more than a year to fix. Here's all that would have been required to find it:

  • Include an applicant with a relatively high (pre-Medicare) age. Oh, and they smoke.

 

Was the system tested with a scenario involving an applicant who had a relatively high age? I'm assuming it must have been.

Was the system tested with a scenario involving an applicant who smoked? Again, I'm assuming it must have been.

Was the system tested with a scenario involving an applicant who had a relatively high age who also smoked? That's what triggers this important bug; apparently it wasn't found during testing (or found early enough).

 

If You Have Limited Time, Test All Pairs

Let's revisit the claim of "we can't execute all 13 million-years-worth of tests. Combinations like these are bound to sneak through, untested. How could we be expected to test all 13 million-years-worth of tests?" The second two sentences are preposterous.

  • "Combinations like these are bound to sneak through, untested." Nonsense. In a system like this, at a minimum, every pair of test inputs should be tested together. Why? The vast majority of defects in production today would be found simply by testing every possible pair of test inputs together at least once.

  • "How could we be expected to test all 13 million-years-worth of tests?" Wrong question. Start by testing all possible pairs of test inputs you've identified. Time-wise, that's easily achievable; its also a proven way to cover a system quite thoroughly in a very limited amount of time.

 

Design of Experiments is an Established Field that was Created to Solve Problems Exactly Like This; Testers are Crazy Not to Use Design of Experiments-Based Prioritization Approaches

The almost 100 year-old field of Design of Experiments is focused on finding out as much actionable information as possible in as few experiments as possible. These prioritization approaches have been very widely used with great success in many industries, including advertising, manufacturing, drug development, agriculture, and many more. While Design of Experiments test design techniques (such as pairwise testing and orthogonal array testing / OA testing) are increasingly becoming used by software testing teams but far more teams could benefit from using these smart test prioritization approaches. We've written posts about how Design of Experiments methods are highly applicable to software testing here and here, and put an "Intro to Pairwise Testing" video here. Perhaps the reason this powerful and practical test prioritization strategy remains woefully underutilized by the software testing industry at large is that there are too few real-world examples explaining "this is what inevitably happens when this approach is not used... And here's how easy it would be to avoid this from happening to you in your next project." Hopefully this post helps raise awareness.

 

Let's Imagine We've Got One Second for Testing, Not 13 Million Years; Which Tests Should We Execute?

Remember how we said it would take 13 million years to execute all of the 41 quadrillion possible tests? That calculation assumed we could execute 100 tests a second. Let's assume we only have one second to execute tests from those 13 million years worth of tests. How should we use that second? Which 100 tests should we execute if our goal is to find as many defects as possible?

If you have a Hexawise account, you can to your Hexawise account to view the test plan details and follow along in this worked example. To create a new account in a few seconds for free, go to hexawise.com/free.

By setting the 40 different parameter values intelligently, we can maximize the testing coverage achieved in a very small number of tests. In fact, in our example, you would only need to execute only 90 tests to cover every single pairwise combination.

The number of total possible combinations (or "tests") that are generated will depend on how many parameters (items/factors) and how many options (parameter values) there are for each parameter. In this case, the number of total possible combinations of parameters and values equal 41 quadrillion.

 

insurance-bug-1

This screen shot shows a portion of the test conditions that would be included the first 4 tests of the 90 tests that are needed to provide full pairwise coverage. Sometimes people are not clear about what "test every pair" means. To make this more concrete, by way of a few specific examples, pairs of values tested together in the first part of test number 1 include:

  • Plan Type = A tested together with Deductible Amount = High

  • Plan Type = tested together with Gender = Male

  • Plan Type = A tested together with Spouse = Yes

  • Gender = Male tested together with State = California

  • Spouse = Yes tested together with Yes (and over 5 years)

  • And lots of other pairs not listed here

 

insurance-bug-2

This screen shot shows a portion of the later tests. You'll notice that the values are shown in purple italics. Those values listed in purple italics are not providing new pairwise coverage. You will note in the first tests every single parameter value is providing new pairwise coverage value, toward the end few parameter value settings are providing new pairwise coverage. Once a specific pair has been tested, retesting it doesn't provide additional pairwise coverage. Sets of Hexawise tests are "front loaded for coverage." In other words, if you need to stop testing at any point before the end of the complete set of tests, you will have achieved as much coverage as possible in the limited time you have to execute your tests (whether that is 10 tests or 30 tests or 83). The pairwise coverage chart below makes this point visually; the decreasing number of newly tested pairs of values that appear in each test accounts for the diminishing marginal returns per test.

 

You Can Even Prioritize Your First "Half Second" of Tests To Cover As Much As Possible!

insurance-bug-3

This graph shows how Hexawise orders the test plan to provide the greatest coverage quickly. So if you get through 37 of the 90 tests needed for full pairwise coverage you have already tested over 90% of all the pairwise test coverage. The implication? Even if just 37 tests were covered, there would be a 90% chance that any given pair of values that you might select at random would be tested together in the same test case by that point.

 

Was Missing This Serious Defect an Understandable Oversight (Because of Quadrillions of Possible Combinations Exist) or was it Negligent (Because Only 90 Intelligently Selected Tests Would Have Detected it)?

A generous interpretation of this situation would be that it was "unwise" for testers to fail to execute the 90 tests that would have uncovered this defect.

A less generous interpretation would be that it was idiotic not to conduct this kind of testing.

The health care reform act will introduce many such changes as this. At an absolute minimum, health insurance firms should be conducting pairwise tests of their systems. Given the defect finding effectiveness of pairwise testing coverage, testing systems any less thoroughly is patently irresponsible. And for health insurance software testing it is often wiser to expand to test all triples or all quadruples given the interaction between many variables in health insurance software.

Incidentally, to get full 3 way test coverage (using the same example as above) would require 2,090 tests.

 

Related: Getting Started with a Test Plan When Faced with a Combinatorial Explosion - How Not to Design Pairwise Software Tests - Efficient and Effective Test Design

By: Justin Hunter on Sep 26, 2013

Categories: Combinatorial Software Testing, Hexawise test case generating tool, Multi-variate Testing, Pairwise Software Testing, Software Testing, Testing Strategies

Many teams are trying to generate unusually powerful and varied sets of software tests by using Design of Experiments-based methods to generate many or most of their tests. The two most popular software test design methods are orthogonal array testing and pairwise testing. This article describes how these two approaches are similar but different and suggests that in most cases, pairwise testing is preferable.

Before advancing, it may be worth pointing out that Orthogonal Array Testing is also known as OA or OATS. Similarly, pairwise testing is sometimes referred to as all pairs testing, allpairs testing, pair testing, pair-wise testing, or simply 2-way testing. The difference between these two very similar approaches of pairwise vs. orthogonal array is that orthogonal array-based solutions require the same coverage goal that pairwise solutions do (e.g., that every pair of inputs is tested at least once) plus an additional hurdle/characteristic, that there be a uniform distribution throughout the domain.

I have studied the question of how can software testing inputs be combined most efficiently and effectively pretty steadily for the last 7 years. I started by searching the web for "Design of Experiments" and "software testing" and found references to Dr. Madhav Phadke (who, by coincidence, turns out was a former student of my father).

  • I discovered that Dr. Phadke had designed RDExpert which, although it had been primarily created to help with Research & Design projects in manufacturing settings, could also be used to select small sets of powerful test sets in software testing projects, using the Orthogonal Array-based test selection criteria.

  • I used RDExpert to create test sets (and compared those test sets against sets of tests that had been selected manually by software testers)

  • I gathered results by asking one tester to execute the manually selected tests and another tester to execute the the Orthogonal Array-based tests; the OA-based tests dramatically outperformed the manually-selected ones in terms of defects found per tester hour and defexts found overall.

So, in short, I had confirmed to my satisfaction that an OA-based test data combination strategy was far more effective than manually selecting combinations for the kinds of projects I was working on, but I was curious if other techniques worked better.

 

After more study I have concluded that:

  • Pairwise is more efficient and effective than orthogonal arrays for software testing.

  • Orthogonal Arrays are more efficient and effective for manufacturing, and agriculture, and advertising, and many other settings.

 

And we have built Hexawise as a software tool to help software producers test their software, based on what I have learned from my experience. We take full advantage of the greatly increased efficiency and effectiveness of letting testers to determine what needs to be tested and software algorithms to quickly create comprehensive test plans that provide more coverage with dramatically fewer tests.

But we also go well beyond this to create a software as a service solution that aids the software testing team with many huge advantages such as: automatically generating Expected Results in test scripts, automated importing of data from Excel or mind maps, exporting tests into other tools, preventing impossible to test for values from appearing together, and much more.

 

Why is a pairwise testing strategy better than an orthogonal array strategy?

  • Pairwise testing almost always requires fewer tests than orthogonal array-based solutions (it is possible, in some situations, for them to have an equal number of tests).

  • Remember, the reason that orthogonal array-based solutions require more tests than a pairwise solution to reach the coverage goal of testing all pairs of test conditions together in at least one test is the additional hurdle/characteristic that orthogonal array testing has, e.g., that there be a uniform distribution throughout the domain.

  • The "cost" of the extra tests (AKA experiments) is worth paying in many settings outside of the software testing industry because the results are non-binary in those tests. Someone seeking a desired darkness and gloss and luminosity and luster for a particular shade of green in the processing of film, for example, would benefit from with the information obtained from the added information gathered from orthogonal arrays.

  • In software testing, however, the added costs imposed by the the extra tests are not worth it. You're generally not seeking some ideal point in a continuum; you're looking to see what two specific pieces of data will trigger a defect when they appear in the same transaction. To identify that binary approach most efficiently and effectively, what you want is a pairwise solution (with fewer tests), not a longer list of orthogonal array-based tests.

 

Let me also add these points.

  • First, unlike some of my other views on combinatorial test design, my opinion on this narrow subject is not based on multiple empirical studies; it is based on (a) the reasoning I laid out above, and (b) a dozen or so conversations I've had with PhD's who specialize in the intersection of "Design of Experiments" and software test design, and (c) anecdotal evidence from using both methods.

  • Secondly, to my knowledge,very few, if any, studies have gathered empirical data showing benefits of pairwise solutions vs. orthogonal array-based solutions in software testing scenarios.

  • Thirdly, I strongly suspect that if you asked Dr. Phadke, he would give you his reasons for why orthogonal array-based solutions are appropriate (and even preferable) to pairwise test case selection methods for certain kinds of software projects. I have a huge amount of respect for both him and his son.

 

Time doesn't allow me to get into this last point much now, but "mixed strength" tests are another even more powerful test design approach for you to be aware of as well. With mixed strength testing solutions, the test designer is able to select a default coverage strength for the entire plan (e.g., pairwise / AKA 2-way coverage) and, in the same set of tests, select certain high priority values to receive higher coverage strength (e.g., 4-way coverage strength selected for each "Credit Rating" and "Income" and "Loan Amount" and "Loan to Value Ratio" would give you a palm that achieved pairwise coverage for everything in the plan plus comprehensive coverage for every imaginable combination of values from those four high priority parameters. This approach allows you to focus on risk-based testing considerations.

 

Sorry if I got a bit long-winded. It's a topic I'm passionate about.

Originally posted on Stack Exchange, Additional note added after the first 3 comments were submitted:

@Hannibal, @Peter K., and @MichaelF, Thanks for your comments! If you'd like to read more about this stuff, I recommend the multiple links available through this "bundle of links" about pairwise testing and combinatorial testing. In particular, Michael Bolton's article on pairwise testing is directly relevant and very clearly written. It is one of the few introductory articles around that accurately describes the difference between orthogonal array-based solutions and pairwise solutions. If I remember correctly though, the example Michael uses is a rare exception to the rule; the OA solution has the same number of tests as an optimal pairwise solution does.

Related: The Empirical Evidence for Using Pairwise and Combinatorial Software Testing - 3 Strategies to Maximize Effectiveness of Your Tests - Hexawise TV

More than 100 Fortune 500 firms use Hexawise to design their software tests. While large companies pay six figures per year for enterprise licenses, Hexawise is available for free to schools, open source projects, other non-profits, and teams of up to 5 users from any kind of company. Sign up for your Hexawise account.

By: John Hunter and Justin Hunter on Jun 11, 2013

Categories: Combinatorial Testing, Design of Experiments, Efficiency, Multi-variate Testing, Pairwise Software Testing, Software Testing, Testing Strategies, Experimenting

A client informed us that they had created (and used) approximately 3,500 test cases to test the search functionality of their application. They had a strong suspicions that (a) they should be able to test the search functionality of their application with fewer tests, (b) the tests they had accidentally omitted tests of many hundreds of plausible combinations of values that would be useful to test for (but did not know how to precisely identify were those gaps were without a huge amount of work), and that (c) many of these tests were quite inefficient in that they repeated many steps that had already been tested in other tests in the plan (even if they were not 100% duplicative of any other single test in the plan).

This client should have spoken to Lanette Creamer before they got into that situation. Lanette is a testing expert and blogger with ideas worth paying attention to. For example, her paper, Reducing Test Case Bloat, is well worth reading as is her blog.

There are times when what you cut may not be bloat. There are some situations where the decisions are the equivalent of “Do we cut off the arm or the head?” Well, a person can live without an arm. If you are in a situation where you are so time constrained that critical areas will be untested, you can still communicate the risk, be transparent and use a strategy to test the most important areas first. It is possible to plan for and do testing for a very time constrained project.

 

Of course avoiding this situation is best. Improving testing processes to use the best thoughts and tools is a better option. Cutting the bloat can allow resources to be applied to those areas they are really needed. Often though, people are scared of trying new ideas and cling to old methods, even if that results in the organization having to take increased risks by failing to test critical areas sufficiently. They are just more scared of trying new ideas than of getting away with saying we need more funding if you want more testing.

Lanette's article provides 8 specific suggestions for process improvements to reduce bloat. The first suggestion is to use combinatorial testing tools to greatly improve coverage while reducing workload. Another suggestion is to run the bloat reduction ideas by the stakeholders.

As part of your plan to reduce bloat, it can be helpful to state your assumptions about who is important and where you are placing testing priority and why. When reducing test case bloat you are taking a calculated risk. You are weighing the risk of being unable to test new features by insisting on testing every legacy case against the risk of purposefully not running some tests. When you share your starting assumptions with your stakeholders you offer them the chance to counter with their own assumptions and often you can clarify the boundaries of your testing this way to avoid gaps in testing or duplication.

 

See the full article for more good ideas on how to get better results for the existing testing resources available to your organization.

 

Related: Design of Experiments is about Learning ASAP and, in Software Testing, Finding Bugs ASAP - Pairwise and Combinatorial Software Testing in Agile Projects - Cem Kaner: Testing Checklists = Good / Testing Scripts = Bad?

By: John Hunter and Justin Hunter on Apr 24, 2013

Categories: Combinatorial Software Testing, Efficiency, Multi-variate Testing, Software Testing, Testing Strategies

20100302-dnpabhucxq6ucrs5hqdnhwnbku

Design of Experiments in Software Testing - Pairwise and Combinatorial - Hexawise

Justin Hunter @Hexawise:

 

Removing inefficiency is good, sure, but it is not why Design of Experiments is so friggin' powerful. Saying DoE is interesting to know about because it can help identify and remove specific inefficiencies is a bit like saying Canada is a good country to visit because you can sometimes find a good cup of coffee there. To my mind, saying DoE is primarily about removing inefficiency misses the main point.

Design of Experiments is so powerful because it allows practitioners to predictably, systematically, and consistently find out more useful, actionable information in much less time than they would otherwise take to obtain this information (if they could find it at all with their less-structured approaches).

In manufacturing circles (e.g., when engineers produce new prototypes), DoE's ability to do this is no longer questioned. This is because leaders like George Box taught people in industry how to apply DoE and they gathered conclusive evidence that DoE allowed manufacturers to learn much faster through techniques like applying factorial designs. Box and other DoE experts (Taguchi, Montgomery, my dad, etc.) dealt with skeptical manufacturing engineers for four decades by showing them the facts and using DoE on the skeptics' own projects right under their noses. The evidence that DoE allows manufacturers to learn much faster (about a wide variety of learning goals) than the other methods they used prior to 1960 is incontrovertible.

In 2010, in the gradually maturing field of software testing, Design of Experiments-based methods of test case design has not caught on much at all yet. As an industry, it's adoption of DoE-based approaches is roughly where manufacturing was in 1960. Most software testers, even very good ones, don't know anything at all about how DoE can help them. Many other software testers have heard a bit about pairwise but mistakenly think pairwise and related, structured, DoE-based, test case selection method can't help them.

Even some of the best testers in the world who have written some of the most clearly-written and well-reasoned articles about pairwise approaches do not (in my view) seem to fully-understand: (a) how powerful the benefits are, (b) how often the approach can be applied / in how many diverse kinds of testing situations they can be utilized, and/or (c) how consistently the efficiency and effectiveness benefits are be generated when they are used properly. DoE methods, including pairwise and n-wise and mixed strength automatic test condition generation (made possible by tools like our Hexawise tool and also, to a great extent by James Bach's free AllPairs tool) allow software testers to learn much faster about critically important questions like: (1) where are the bugs?, (2) what is causing the bugs to appear?, (3) am I confident I have efficiently tested for a huge range of combinations of values in the System Under Test that might trigger defects? (4) am I succeeding in avoiding redundant repetition of steps in many test cases?, (5) how many bugs would be likely to find if we were to continue to run the next 100 tests?, etc.

In summary, the reason for the existence of Design of Experiments methods (whether we're talking about their applicability to testing software as efficiently and effectively as possible, or DoE methods' applicability to a huge variety of other objectives) - and, for that matter, the reason that they have been continuously refined and improved for 40+ years - is that DoE methods consistently and predictably allow users to learn actionable results as quickly as possible.

 

Related: Maximize Test Coverage Efficiency And Minimize the Number of Tests Needed - Pairwise and Combinatorial Software Testing in Agile Projects - Video Highlight Reel of Hexawise

By: Justin Hunter on Feb 11, 2013

Categories: Design of Experiments, Multi-variate Testing, Recommended Tool, Testing Strategies

A combinatorial explosion is when the configuration settings and user actions and data entered etc. makes it impossible to test everything. The number of tests required to individually test every single possibility is many thousands of times greater than could realistically be tested.

When faced with taking over an existing software applications without a good test suite (or any test plan) often is daunting. And the problems of creating an unfathomable number of tests face you due to combinatorial explosion. Hexawise is a software as a service that aids in dealing with this dilemma for software testers. Software test plans are created that provide far better coverage than is seen in practice with a tiny fraction of the test required for complete combinatorial coverage (that is testing every possible combination [pairwise or 3, 4, 5... way] individually).

The Google Maps test plan provides a good example of combinatorial explosion faced by the testers (in this case, those who tested Google Maps). Take a look at the Google Maps test plan by login to your Hexawise account (creating a demo account is free and simple). The Google Maps test plan is one of 9 samples currently provided in Hexawise.

For creating your own test plan, while you are exploring the software application and testing it out to find "where the weak points are," you will probably find it useful to vary things as much as possible, repeat your actions as little as possible. Those points are true whether you're doing relatively informal lightly documented exploratory testing or more heavily documented test scripts. It addition, since a large percentage of defects can be triggered by the interaction of just two test inputs, it would be nice, if you had time, to test every single possible combination involving two test inputs; that's the rationale behind allpairs, pairwise and orthogonal array-based test case prioritization methods.

To recreate a similar - very early draft - plan for yourself, I'd suggest going through the following steps to put together a relatively small number of highly informative end-to-end-ish tests:

Ask what can change as users go through the system? Think about configuration settings, user actions, data formats, data ranges, etc. even throw in more "creative" ideas like user personas. Let your creativity and common sense guide you. Enter those in as parameters.

Ask how those parameters can change? (for the parameter "Browser" enter IE7, IE8, FF, etc.) Put those in as values under each parameter (entering constraints as required)

Ask does that variation matter? When possible (when it doesn't matter as much) use equivalence classes and be biased towards fewer values - at least for your early draft tests.

Ask what special paths thorough the system do you want to be sure to include? (Most common happy path, paths to trigger certain business rules, etc.)

Click the Create Tests button in Hexawise and you'll instantly get a very nice draft starter set of highly varied tests. If they look like they're relatively interesting and don't miss hugely important things, start informally executing them and you'll be sure to learn some more things as you do about the system's weak points that would result in you going back to those draft tests and iterating them to make them stronger and cover more.

To get a bit more on using this approach see our case studies. Hexawise TV provides narrated videos online showing how to make your life easier as a software tester.

 

Related: Hexawise Tip: Using Value Expansions and Value Pairs to Handle Dependent Values - 3 Strategies to Maximize Effectiveness of Your Tests - Pairwise and Combinatorial Software Testing in Agile Projects

By: John Hunter and Justin Hunter on Jan 2, 2013

Categories: Combinatorial Software Testing, Exploratory Testing, Multi-variate Testing, Pairwise Software Testing, Software Testing, Testing Strategies

84 percent coverage in 20 tests

Hexawise test coverage graph showing 83.9% coverage in just 20 tests

 

Among the many benefits Hexawise provides is creating a test plan that maximizes test coverage with each new scenario tested. The graph above shows that after just 20 test 83.9% of the test combinations have been tested. Read more about this in our case study of a mortgage application software test plan. Just 48 test combinations are needed to test for every valid pair (3.7 million possible tests combinations exist in this case). If you are lost now, this video may help.

The coverage achieved by the first few tests in the plan will be quite high (and the graph line will point up sharply) then the slope will decrease in the middle of the plan (because each new test will tend to test fewer net new pairs of values for the first time) and then at the end of the plan the line will flatten out quite a lot (because by the end, relatively few pairs of values will be tested together for the first time).

One of the benefits Hexawise provides is making that slope as steep as possible. The steeper the slope the more efficient your test plan is. If you repeat the same tests of pairs and triples and... while not taking advantage of the chance to test, untested pairs and triples you will have to create and run far more test than if you intelligently create a test plan. With many interactions to test it is far too complex to manually derive an intelligent test plan. A combinatorial testing tool, like Hexawise, that maximizes test plan efficiency is needed.

For any set of test inputs, there is a finite number of pairs of values that could be tested together (that can be quite a large number). The coverage chart answers, after each tests, what percentage of the total number of pairs (or triples, etc.) that could be tested together have been tested together so far?

The Hexawise algorithms achieve the following objectives that help testers find as many defects as possible in as few tests as possible. In each and every step of each and every test case, the algorithm chooses a test condition that will maximize the number of pairs that can be covered for the first time in the test case. (Or, the maximum number of triplets or quadruplets, etc. based on the thoroughness setting defined by the user). Allpairs (AKA pairwise) is a well known and easy to understand test design strategy. Hexawise lets users create pairwise sets of tests that will test not only every pair but it also allows test designers to generate far more thorough sets of tests (3-way to 6-way coverage). This allows users to "turn up the coverage dial" and generate tests that cover every single possible triplet of test inputs together at least once (or every 4-way combination or 5-way combination or 6-way combination).

Note that the coverage ratio Hexawise shows is based on the factors entered as items to be tested: not a code coverage percentage. Hexawise sorts the test plan to front load the coverage of the tuple pairs, not the coverage of the code paths. Coverage of code paths ultimately depends on how good a job the test designer did at extracting the relevant parameters and values of the system under test. You would expect there to be some loose correlation between coverage of identified tuple pairs and coverage of code paths in most typical systems.

If you want to learn more about these concepts, I would recommend Scott's Scott Sehlhorst articles on pairwise and combinatorial test design. They are some of the clearest introductory articles about pairwise and combinatorial testing that I have seen. They also contain some interesting data points related to the correlation between 2-way / allpairs / pairwise / n-way coverage (in Hexawise) and the white box metrics of branch coverage, block coverage and code coverage (not measurable by Hexawise).

In Software testing series: Pairwise testing, for example, Scott includes these data points:

 

  • We measured the coverage of combinatorial design test sets for 10 Unix commands: basename, cb, comm, crypt, sleep, sort, touch, tty, uniq, and wc... The pairwise tests gave over 90 percent block coverage.

 

  • Our initial trial of this was on a subset Nortel’s internal e-mail system where we able cover 97% of branches with less than 100 valid and invalid testcases, as opposed to 27 trillion exhaustive testcases.

 

  • A set of 29 pair-wise... tests gave 90% block coverage for the UNIX sort command. We also compared pair-wise testing with random input testing and found that pair-wise testing gave better coverage.

 

Related: Why isn't Software Testing Performed as Efficiently and Effecively as it could be? - Video Highlight Reel of Hexawise – a pairwise testing tool and combinatorial testing tool - Combinatorial Testing, The Quadrant of Massive Efficiency Gains

Specific guidance on how to view the percentage of coverage graph for the test plan in Hexawise:

 

When working on your test plan in Hexawise, to get the checklist to be visible, click on the two downward arrow keys located shown in the image:

How-To Progress Checklists-2 inline

Then you'll want to open up the "Advanced" list. So you might need to click here:

Advanced How-To Progress Checklist inline

Then the detailed explanation will begin when you click on "Analyze Tests"

Decreasing Marginal Returns inline

 

This post is adapted (and some new content added) from comments posted by Justin Hunter and Sean Johnson.

By: John Hunter on Feb 3, 2012

Categories: Combinatorial Software Testing, Combinatorial Testing, Efficiency, Multi-variate Testing, Pairwise Software Testing, Pairwise Testing, Scripted Software Testing, Software Testing, Software Testing Efficiency

Combinatorial Software Test Design - Beyond Pairwise Testing

 

I put this together to explain combinatorial software test design methods in an accessible manner. I hope you enjoy it and that, if you do, that you'll consider trying to create test cases for your next testing project (whether you choose our Hexawise test case generator or some other test design tool).

 

Where I'm Coming From

As those of you know who read my posts, read my articles, and/or have attended my testing conference presentations, I am a passionate proponent of these approaches to software test design that maximize variation from test case to test case and minimize repetition. It's not much of an exaggeration to say I hardly write or talk publicly about any other software testing-related topics. My own consistent experiences and formal studies indicate that pairwise, orthogonal array-based, and combinatorial test design approaches often lead to a doubling of tester productivity (as measured in defects found per tester hour) as compared to the far more prevalent practice in the software testing industry of selecting and documenting test cases by hand. How is it possible that this approach generates such a dramatic increase in productivity? What is so different between the manually-selected test cases and the pair-wise or combinatorial testing cases? Why isn't this test design technique far more broadly adopted than it is?

 

A Common Challenge to Understanding: Complicated, Wonky Explanation

My suspicion is that a significant reason that combinatorial software testing methods are not much more widely adopted is that many of the articles describing it are simply too complex and/or too abstract for many testers to understand and apply. Such articles say things like:

A. Mathematical Model

 

A pairwise test suite is a t-way interaction test suite where t = 2. A t-way interaction test suite is a mathematical structure, called a covering array.

Definition 1 A covering array, CA(N; t, k, |v|), is an N × k array from a set, v, of values (symbols) such that every N × t subarray contains all tuples of size t (t-tuples) from the |v| values at least once [8].

The strength of a covering array is t, which defines, for example, 2-way (pairwise) or 3-way interaction test suite. The k columns of this array are called factors, where each factor has |v| values. In general, most software systems do not have the same number of values for each factor. A more general structure can be defined that allows variability of |v|.

Definition 2 A mixed level covering array, MCA (N; t, k, (|v1|,|v2|,..., |vk|)), is an N × k array on |v| values, where

| v |␣ ␣k | vi | , with the following properties: (1) Each i␣1

column i (1 ␣ i ␣ k) contains only elements from a set Si of size |vi|. (2) The rows of each N × t subarray cover all t-tuples of values from the t columns at least once.

  • "Construct Pairwise Test Suites Based on the Bak-Sneppen Model of Biological Evolution" World Academy of Science, Engineering and Technology 59 2009 - Jianjun Yuan, Changjun Jiang

 

If you're a typical software tester, even one motivated to try new methods to improve your skills, you could be forgiven for not mustering up the enthusiasm to read such articles. The relevancy, the power, and the applicability of combinatorial testing - not to mention that this test design method can often double your software testing efficiency and increase the thoroughness of your software testing - all tend to get lost in the abstract, academic, wonky explanations that are typically used to describe combinatorial testing. Unfortunately for pragmatic, action-oriented software testing practitioners, many of the readily accessible articles on pairwise testing and combinatorial testing tend to be on the wonky end of the spectrum; an exception to that general rule are the good, practitioner-oriented introductory articles available at combinatorialtesting.com.

 

A Different Approach to Explaining Combinatorial Testing and Pairwise Testing

In the photograph-rich, numbers-light, presentation embedded above, I've tried to explain what combinatorial testing is all about without the wonky-ness. The benefits from structured variation and from using combinatorial test design is, in my view, wildly under-appreciated. It has the following extremely important benefits:

  • Less repetition from test case to test case

    • In the context of discussing testing's "pesticide paradox" James Bach, I believe, used the analogy that following in someone's footsteps is a very good way to survive traversing through a mine field but a generally lousy way to find software defects efficiently.
    • Maximizing variation from test case to test case, as a general rule, is an absolutely spectacular way to find defects quickly.
    • There are thousands, if not trillions of relevant combinations to select from when identifying test cases to execute; computer algorithms will be able to solve the problem of "how can maximum variation be achieved?" far better than human brains can.
  • More coverage of combinations of test inputs

    • Most of the time, since awareness of pairwise and combinatorial testing methods remain low in the software testing community, combining all possible pairs of values in at least one test case is not even a conscious goal of testers.
    • Even if this were a goal of their test design strategy, testers would have a tremendous challenge in trying to achieve such a goal: with hundreds, thousands or tens of thousands of targeted combinations to cover, losing track of a significant number of them and/or forgetting to include them in software tests is virtually a foregone conclusion unless a test case generator is used.
    • More thorough coverage leads to more defects being found.
  • Efficiency (Testers can "turn the coverage dial" to achieve maximum efficiency with a minimal number of tests)

    • The efficiency and effectiveness benefits of pairwise testing have been demonstrated in testing projects every major industry.
    • I wanted to prominently include the message that testers using test case generators have the option to dramatically increase the testing thoroughness levels of the tests they generate because it is a topic that often gets ignored in introductions to pairwise testing case studies and introductions
  • Thoroughness - (Testers can also "turn the coverage dial" to achieve maximum thoroughness if that is their goal)

    • Too often, tester's view pairwise as a technique that focuses on a very small number of curiously strong tests; that is only part of the story.
    • This can lead to the /false/ impression that combinatorial testing methods are inappropriate where high levels of testing thoroughness are required.
    • You can create very different sets of tests that are as thorough as possible (given your understanding of what you are testing) no matter whether you have 1 hour to execute tests or one month to test.

 

Other Recommended Sources of Information on Pairwise and Combinatorial Testing:

By: Justin Hunter on Oct 7, 2010

Categories: Combinatorial Software Testing, Combinatorial Testing, Design of Experiments, Hexawise test case generating tool, Multi-variate Testing, Pairwise Software Testing, Pairwise Testing, Recommended Tool, Testing Strategies, Uncategorized

20100127-ht4mjknjnmwce46fp7m2jst7q

 

All the quotes below are from the inside cover of Statistics for Experimenters written by George Box, Stuart Hunter, and William G. Hunter (my late father). The Design of Experiments methods expressed in the book (namely, the science of finding out as much information as possible in as few experiments as possible), were the inspiration behind our software test case generating tool. In paging through the book again today, I found it striking (but not surprising) how many of these quotes are directly relevant to efficient and effective software testing (and efficient and effective test case design strategies in particular):

  • "Discovering the unexpected is more important than confirming the known." - George Box

  • "All models are wrong; some models are useful." - George Box

  • "Don't fall in love with a model."

  • How, with a minimum of effort, can you discover what does what to what? Which factors do what to which responses?

  • "Anyone who has never made a mistake has never tried anything new." - Albert Einstein

  • "Seek computer programs that allow you to do the thinking."

  • "A computer should make both calculations and graphs. Both sorts of output should be studied; each will contribute to understanding." - F. J. Anscombe

  • "The best time to plan an experiment is after you've done it." - R. A. Fisher

  • "Sometimes the only thing you can do with a poorly designed experiment is to try to find out what it died of." - R. A. Fisher

  • The experimenter who believes that only one factor at a time should be varied, is amply provided for by using a factorial experiment.

  • Only in exceptional circumstances do you need or should you attempt to answer all the questions with one experiment.

  • "The business of life is to endeavor to find out what you don't know from what you do; that's what I called 'guessing what was on the other side of the hill.'" - Duke of Wellington

  • "To find out what happens when you change something, it is necessary to change it."

  • "An engineer who does not know experimental design is not an engineer." - Comment made by to one of the authors by an executive of the Toyota Motor Company

  • "Among those factors to be considered there will usually be the vital few and the trivial many." - J. M. Juran

  • "The most exciting phrase to hear in science, the one that heralds discoveries, is not 'Eureka!' but 'Now that's funny...'" - Isaac Asimov

  • "Not everything that can be counted counts and not everything that counts can be counted." - Albert Einstein

  • "You can see a lot by just looking." - Yogi Berra

  • "Few things are less common than common sense."

  • "Criteria must be reconsidered at every stage of an investigation."

  • "With sequential assembly, designs can be built up so that the complexity of the design matches that of the problem."

  • "A factorial design makes every observation do double (multiple) duty." - Jack Couden

Where the quotes are not attributed, I'm assuming the quote is from one of the authors. The most well known of the quotes not attributed, above, "All models are wrong; some models are useful." is widely attributed to George Box in particular, which is accurate. Although I forgot to confirm that suspicion with him when I saw him over Christmas break, I suspect most of them are from George (as opposed to from Stu or my dad); George is 90 now and still off-the-charts smart, funny, and is probably the best story teller I've met in my life. If he were younger and on Twitter, he'd be one of those guys who churned out highly retweetable chestnuts again and again. [Update - George Box died in 2013]

 

Related thoughts

As you know if you've read my blog before, I am a strong proponent of using the Design of Experiments principles laid out in this book and applying them in field of software testing to improve the efficiency and effectiveness of software test case design (e.g., by using pairwise software testing, orthogonal array software testing, and/or combinatorial software testing techniques). In fact, I decided to create my company's test case generating tool, called Hexawise, after using Design of Experiments-based test design methods during my time at Accenture in a couple dozen projects and measuring dramatic improvements in tester productivity (as well as dramatic reductions in the amount of time it took to identify and document test cases). We saw these improvements in every single pilot project when we used these methods to identify tests.

My goal, in continuing to improve our Hexawise test case generating tool, is to help make the efficiency-enhancing Design of Experiments methods embodied in the book, accessible to "regular" software testers, and more more broadly adopted throughout the software testing field. Some days, it feels like a shame that the approaches from the Design of Experiments field (extremely well-known and broadly used in manufacturing industries across the globe, in research and development labs of all kinds, in product development projects in chemicals, pharmaceuticals, and a wide variety of other fields), have not made much of an inroad into software testing. The irony is, it is hard to think of a field in which it is easier, quicker, or immediately obvious to prove that dramatic benefits result from adopting Design of Experiments methods than software testing. All it takes is for a testing team to decide to do a simple proof of concept pilot. It could be for as little as a half-day's testing activity for one tester. Create a set of pairwise tests with Hexawise or another t00l like James Bach's AllPairs tool. Have one tester execute the tests suggested by the test case generating tool. Have the other tester(s) test the same application in parallel. Measure four things:

  1. How long did it take to create the pairwise / DoE-based test cases?

  2. How many defects were found per hour by the tester(s) who executed the "business as usual" test cases?

  3. How many defects were found per hour by the tester who executed the pairwise / DoE-based tests?

  4. How many defects were identified overall by each plan's tests?

These four simple measurements will typically demonstrate dramatic improvements in:

  • Speed of test case identification and documentation

  • Efficiency in defects found per hour

As well as consistent improvements to:

  • Overall thoroughness of testing.

 

A Suggestion: Experiment / Learn / Get the Data / Let the Efficiency and Effectiveness Findings Guide You

I would be thrilled if this blog post gave you the motivation to explore this testing approach and measure the results. Whether you've used similar-sounding techniques before or never heard of DoE-based software testing methods before, whether you're a software testing newbie or a grizzled veteran, I suspect the experience of running a structured proof of concept pilot (and seeing the dramatic benefits I'm confident you'll see) could be a watershed moment in your testing career. Try it! If you're interested in conducting a pilot, I'd be happy to help get you started and if you'd be willing to share the results of your pilot publicly, I'd be able to provide ongoing advice and test plan review. Send me an email or leave a comment.

To the grizzled and skeptical veterans, (and yes, Mr, Shrini Kulkarni / @shrinik who tweeted "@Hexawise With all due respect. I can't credit any technique the superpower of 2X defect finding capability. sumthng else must be goingon" before you actually conducted a proof of concept using Design of Experiments-based testing methods and analyzed your findings, I'm lookin' at you), I would (re)quote Sophocles: "One must try by doing the thing; for though you think you know it, you have no certainty until you try." For newer testers, eager to expand your testing knowledge (and perhaps gain an enormous amount of credibility by taking the initiative, while you're at it), I'd (re)quote Cole Porter: "Experiment and you'll see!"

I'd welcome your comments and questions. If you're feeling, "Sounds too good to be true, but heck, I can secure a tester for half a day to run some of these DoE-based / pairwise tests and gather some data to see whether or not it leads to a step-change improvement in efficiency and effectiveness of our testing" and you're wondering how you'd get started, I'd be happy to help you out and do so at no cost to you. All I'd ask is that you share your findings with the world (e.g., in your blog or let me use your data as the firms did with their findings in the "Combinatorial Software Testing" article below).

 

Related:

By: Justin Hunter on Jan 27, 2010

Categories: Combinatorial Testing, Design of Experiments, Hexawise test case generating tool, Multi-variate Testing, Software Testing