Sunday, December 7, 2014

Microbenchmarks -- A step to avoid towards Software Performance Tuning

Performance Improvements in a large-scale software application is often taken right at the end or when a critical issue is reported by customers or other stakeholders. The more the age and adoption of the product, the more is the expectation that the product will run slower than before. The key is always in writing code with good and time-tested algorithms. And when you still have to undertake performance tuning, a meticulous approach is what is required. One of them is called Microbenchmarking.

A microbenchmark is a test designed to measure a very small unit of performance e.g. the time to call a synchronized method versus a nonsynchronized method, the overhead in creating a thread vs. using a thread pool, the time taken to execute one algorithm against another etc. Doesn't this sound interesting? Yes, they are the basic building blocks of the whole exercise of bettering Software application performance.

Unfortunately, it is often very difficult to write a correct microbenchmark. Consider the following code.

public void doTest()
{
     // Main Loop
     double l;
     long then = System.currentTimeMillis();

     
     for (int i = 0; i < nLoops; i++)
         l =
generateFibonacci(50);
     long now = System.currentTimeMillis();
     System.out.println("Elapsed time: " + (now - then));
}
 

...
 

private double generateFibonacci (int n)
{
    if (n < 0) throw new IllegalArgumentException("Must be > 0");
    if (n == 0) return 0d;
    if (n == 1) return 1d;
    double d = fibImpl1(n - 2) + fibImpl(n - 1);
    return d;
}



The calculated Fibonacci numbers in the function generateFibonacci are never read. This gives a freehand to the compilers in JVM 7 and 8 to discard the calculations. All smart compilers will end up in executing the following code and skip the function altogether.


public void doTest()
{
     // Main Loop
     double l;
     long then = System.currentTimeMillis();

 
    long now = System.currentTimeMillis();
     System.out.println("Elapsed time: " + (now - then));
}


The program will always print the elapsed time as a few milliseconds irrespective of the implementation of the Fibonacci method or the number of times the loop is supposed to be executed. This defeats the very purpose of the test. So, what can we do to get around this problem?

Make sure that each and every computed value is read in some way or the other. One easy way to do that is to make sure that the computations are read into an instance field which is preferably a volatile. The need to use a volatile variable in this example applies even when the microbenchmark is single-threaded.

Be especially wary when thinking of writing a threaded microbenchmark. When several threads are executing small bits of code, the potential for synchronization bottlenecks (and other thread artifacts) is quite large. Results from threaded microbenchmarks often lead to spending a lot of time optimizing away synchronization bottlenecks that will rarely appear in real code—at a cost of addressing more pressing performance needs.


Consider the case of two threads calling a synchronized method in a microbenchmark. Because the benchmark code is small, most of it will execute within that synchronized method. Even if only 50% of the total microbenchmark is within the synchronized method, the odds that as few as two threads will attempt to execute the synchronized method at the same time is quite high. The benchmark will run quite slowly as a result, and as additional threads are added, the performance issues caused by the increased contention will get even worse. The net is that the test ends up measuring how the JVM handles contention rather than the goal of the microbenchmark.

In a microbenchmark, the inputs must be precalculated. The test must not spend time in finalizing the inputs. One of the performance characteristics of Java is that code performs better the more it is executed, a topic that is covered in Chapter 4. For that reason, microbenchmarks must include a warm-up period, which gives the compiler a chance to produce optimal code. The advantages and disadvantages of a warm-up period are discussed in depth later in this chapter. For microbenchmarks, a warm-up period is required; otherwise, the microbenchmark is measuring the performance of compilation rather than the code it is attempting to measure.

Writing a microbenchmark is hard. There are very limited times when it can be useful. Be aware of the pitfalls involved, and make the determination if the work involved in getting a reasonable microbenchmark is worthwhile for the benefit—or if it would be better to concentrate on more macro-level tests.

 
References

Java Performance : The Definitive Guide
JVM Specifications

No comments:

Post a Comment