On Unit testing Bentley's binary search overflow condition

One of the few unit tests one would write for a simple arithmetic operation such as addition is a check for overflow i.e. if the resulting value is of the operand type and therefore does not cause overflow error. More often than not, programmers mistakenly consider this issue to be GC's responsibility in case of managed code languages (C#, Java). The examples below are in C# adapted from java implementations in Joshua Bloch's post.

A classic case of uncaught overflow which remain undiscovered for 20 years was in Jon Bentley's amazing Programming Pearls book implementation of binary search. (The book by the way is a highly recommended reading for every engineer!). The erroneous code follows.

 

public static int BinarySearch(int[] a, int key)
        {
            int low = 0;
            int high = a.Length - 1;

            while (low  key)
                    high = mid - 1;
                else
                    return mid; // key found
            }
            return -(low + 1); // key not found.
        }

The bug resides in the addition line #6 where int mid = (low + high) / 2; i.e. in case of int.Max. Modern compilers are intelligent enough to detect these errors during compile time

So for example, all the following three statements will "Overflow in constant value computation"

int a = int.MaxValue + 1;
var a = int.MaxValue + 1;
dynamic a = int.MaxValue + 1;

However, the compiler would not protect you from shooting your own foot if you insist on doing things like

int max = int.MaxValue;
            int a = max + 1;

In this case, what you get in the value a is -2147483648 which is int.MinValue. If developer thinks var would protect him, think again because CLI considers int+int as int. Therefore, the following will result in int.MinValue as well.

 int max = int.MaxValue;
var a = max + 1;
as well as
dynamic a = max + 1;

Now get back to the original topic of binary search overflow. A simple test to get this value would be

BinarySearch(Enumerable.Range(0, int.MaxValue - 1).ToArray(), new Random().Next());

where you would create a large array of pseudo random numbers. However, an effort to create an array of size in.Max would result in an out of memory exception. A test method stub looks like follows.

        [TestMethod()]
        public void BinarySearchTest()
        {
            int[] a = Enumerable.Range(0, int.MaxValue - 1).ToArray();
            int key = 0; // TODO: Initialize to an appropriate value
            int expected = 0; // TODO: Initialize to an appropriate value
            int actual;
            actual = Program.BinarySearch(a, key);
            Assert.AreEqual(expected, actual);
            Assert.Inconclusive("Verify the correctness of this test method.");
        }

Trying to simulate this with shorts is still hard because counter intuitively, the sum of shorts is actually an integer. Therefore, the following statement results in a compile time error.

short c = 10, d = 10;
short e = c + d;

Therefore you cannot simulate the condition unless you cast it back

short mid = (short)((low + high)/2);

which kind of makes it a moot point anymore. And this is the reason why this bug has manifested itself without resolution for such a long time in a popular algorithm like binary search. The fixed version substitutes

short mid = (short)((low + high)/2);

with

int mid = low + ((high - low) / 2);

which are both functionally equivalent but the second statement does not go out of bounds since the subtraction operation along with division guarantees the int.Max bound.

        public static int BinarySearchFixed(int[] a, int key)
        {
            int low = 0;
            int high = a.Length - 1;
            while (low  key)
                    high = mid - 1;
                else
                    return mid; // key found
            }
            return -(low + 1); // key not found.
        }

It can be optimized by using the bitwise shift operator >> in C# for dividing by 2. Since we don't have an unsigned shift, there would be some casting needed to unsigned int essentially increasing the range.

 
public static int BinarySearchOptimized(int[] a, int key)
        {
            int low = 0;
            int high = a.Length - 1;

            while (low <= high)
            {
                int mid = (int)(((uint) high + (uint) low) >> 1);
                int midVal = a[mid];

                if (midVal < key)
                    low = mid + 1;
                else if (midVal > key)
                    high = mid - 1;
                else
                    return mid; // key found
            }
            return -(low + 1); // key not found.
        }

Is this perfect? May be. How about uint to int casting here, is this downsizing going to work. In most cases, it would. In order to test, the empirical testing and formal methods would help but nothing is guaranteed and hence the iterative process of static code analysis and peer code reviews.
BinarySearch VS.NET 2010 Solution Download

Share