ctd

We test the Java Virtual Machine (JVM). Recently some of our JVM product code has been pulled out into shareable runtime components (such as GC, JIT, RAS, Port and Thread libraries) to be used as building blocks for a multitude of other programming languages besides Java, such as Ruby, Python, Javascript (see Eclipse OMR for further details). With this open-sourcing of some of our product code, we need to rethink, rebuild, and refactor our existing test approach. Our new approach can be summarized by this mantra, “Fast, Low and Simple”.

Fast. Tests in open-source communities are typically not run in isolated, pristine test labs. Often they are run on a developers laptop. This means our overnight suite of tens of thousands of tests are not suitable for sharing. We want to design our tests to give us good functional coverage, with the minimal set of tests required. Once the low-level APIs that required testing are identified, we apply combinatorial test design (CTD) principles to new API to keep test numbers low and test cycles short.

Low. We can no longer rely solely on a massive set of Java tests written to exercise Java API, as this would not benefit the other language communities relying on the shared runtime components. We need to push the tests down to the software layers below the languages that are built upon the runtime components. By testing at this language-agnostic layer, we are able to avoid excess duplication and keep the length of the test runs short.

Simple. We’d like our tests to be structured in a standardized way. We should aim to reduce the number of tools required by the tests, and when we need them, look to use open-source tools rather than proprietary solutions. By moving our tests to an open-source test framework, adhering to a coding standard for test source and using common approach for test output, we make tests much easier to maintain, triage and debug.

Let’s see our approach looks with an example. We’ll drill down to find some of the simplest units to help test the Just In Time (JIT) component, the opcodes. Why? Because we want low and simple. In computing, an opcode (abbreviated from operation code) is the portion of a machine language instruction that specifies the operation to be performed. Beside the opcode itself, instructions usually specify the data they will process, in form of operands. If the behaviour of the opcode is incorrect, everything built on top of it may be incorrect. So, for the OMR JIT component (Figure 1), we test the opcodes first, to catch any problems as low in the software stack as possible, to the root of a problem.

Figure 1: OMR JIT component

OMRLayer

Some opcodes are unary (only one operand), like the return opcode. Some are binary, like add. Some are even ternary. If you break them out by data types, you get an opcodes explosion, iadd (adding 2 integers), ladd (adding 2 longs), etc. There are hundreds of different opcodes.

In an upcoming post, we’ll dig into some details on how we modeled and tested these fundamental pieces.

Combinatorial Test Design (CTD) is a way to design a minimal set of tests that provide good/adequate functional coverage. This is a brief overview of CTD to introduce the concept and describe how we are finding it useful for Functional Verification (FV). It is an approach that can be applied at every level of testing, including system testing, but for the purpose of this guide, we will use FV examples. When we test our products, we want good functional coverage of our source code. Functional coverage does not mean the same thing as code coverage. 100% code coverage means that our tests exercised every line of the source code. That sounds good, but it doesn’t mean much, if our tests are poor. 100% functional coverage means we exercise all lines of the source code with all possible variants or input values of the code. To be clear, it is unrealistic to achieve 100% functional coverage and if we did, it would be too many tests to actually run and maintain. CTD offers a better approach, one that brings us high functional coverage in as few tests as possible.

Let’s take a look at a simple example:

static Integer mathyJunkFunction(int num1, Integer num2, String num3) {

return new Integer(num1) / (num2 + new Integer(num3));

}

If we test this function with the following:

Integer result = mathyJunkFunction(1, new Integer(1), “1”);

We will have 100% code coverage, but we will have not done a very good job of testing the function. As you can see in this naive example, we see multiple defects that this test does not catch.

Defect 1: What if num3 has a value of null (throws NumberFormatException)?

Defect 2: What if num3 is “1” and num2 is -1, or “0” and 0, or any other combination that sum to 0 (throws ArithmeticException)?

Defect 3: What if num2 is null (throws NullPointerException)?

This is a very simple example, that does not take into account implicit inputs or environmental factors that may affect the behaviour of real world source code, such as environment variables, command line options, platform specific behaviour or hardware configuration details. In real world cases, we would also include any implicit inputs and their values. Let’s ignore implicit inputs for our example. The 3 parameters of the function are the explicit inputs we need to consider in our tests. If we want to model the test space for this function, we need to look at the “values of interest” for each of them. If our parameters were custom data structures or objects, we would need to provide the list of ‘equivalence classes’ as the values of interest. In this example, we are dealing with known data types, so our job of defining ‘values of interest’ to use in our tests is easier.

num1 (integer): {MAX_VALUE, 1, 0, -1, MIN_VALUE}

num2 (Integer): {MAX_VALUE, 1, 0, -1, MIN_VALUE, null}

num3 (String): {“MAX_VALUE”, “1”, “0”, “-1”, “MIN_VALUE”, null}

If we tested all combinations of all values of the 3 inputs (100% functional coverage), we would have 5 x 6 x 6 = 180 test cases. But as many studies of typical source code have shown, most defects are caused by faulty logic involving 1 input or the interaction of 2 inputs. Because of this, we know that if we apply limits, creating a test case for every pair of values (known as pairwise testing), we will typically have 80+% functional coverage (unless the code under test is atypical of most source code). When we take every unique combination of value pairs, we arrive at 29 test cases (27 good path, 2 bad path) as shown in Table 1 and 2.

Table 1: Pairwise testing for mathyJunkFunction() – Good Path

goodPath-1

Table 2: Pairwise testing for mathyJunkFunction() – Bad Path

badPath-1

29 test cases are more manageable than 180. When we look at the list of test cases, we see that for the 3 defects we flagged in our naive example,

Test Case 1 of the Bad Path list (in Table 2) would catch Defect 1 (the NumberFormatException)

Test Case 2 of the Bad Path list would catch Defect 3 (the NullPointerException)

Test Cases 3, 10, 16 catch Defect 2 (the ArithmeticException)

With inspection of the source code, we could decide to reduce our test case numbers even further, but if we were testing something more complex, or were black box testing, we could implement the set as described and have confidence that we would catch the majority of defects. We have applied CTD to a lot of the FV in the OMR Eclipse project and designs relating to other 2015 feature requests. Our plan is to enhance our understanding and use of CTD in 2016 including some automation experiments and to continue to see the many benefits from our efforts.

8thdaytesting

Development complete, on the 7th day they rested, on the 8th day we tested.

Fast, Low and Simple: Transitioning to Open-Source

A Quick ‘n’ Dirty Intro to Combinatorial Test Design