3-sum nzp algorithm

We introduce 3-sum nzp, an algorithm that solves 3-sum in quadratic time but about 300% faster than the fastest presently known algorithm found in the literature (as of September 2012). This post can also be used as a tutorial to explain how algorithm complexity affects performance. You can download the the code discussed in this post, including the running time tests here:

Introduction

I first learned about this problem recently, while auditioning Coursera’s class on algorithms by Sedgewick and Wayne (Sep. 2012). Sedgewick and Wayne use different versions of 3-sum to illustrate that the complexity of an algorithm is the more important factor to take into account when attacking a problem (versus the very common practice of using a slow algorithm and just throwing money at it via expensive hardware).

The 3-sum problem is so basic that I kept thinking that there had to be a better way to solve it than the presently accepted ‘best solution’ and, indeed, there was: the 3-sum-nzp algorithm presented here, about 300-400% faster than the current ‘fastest’ solution. On Sep. 9, 2012, I posted in the class’s forum a subset of the timings presented here (unfortunately, the forum is no longer public).

We now go over the problem definition and over pretty much every solution of the 3-sum problem, starting with the brute force approach and ending with the new nzp algorithm. Finally, we present timing results for all the routines for entries of various sizes and present links to the related online references.

The 3-sum problem

The 3-sum problem is defined as follows:

“Given N distinct integers, what triples (A,B,C) are such that

A+B+C=0?”

Some of the variations of this problem are:

Is there at least one triple (A,B,C) such that A+B+C=0?
Is there a pair (A,B) that adds up to a third integer C, i.e., A+B=C?

In addition, there are problems that are 3-sum-hard, i.e., problems that we can linearly map to 3-sum, so we can modify the algorithms that solve 3-sum to solve the others. One such problem is 3-collinear, which is defined as:

“Given N distinct points on a plane, are there 3 points that lie in the same line?”

There are a number of algorithms that solve 3-sum in $O( N^2\cdot \log N)$ or in $O( N^2)$. Whether there is an algorithm that can solve 3-sum in sub-quadratic time is an open problem; nobody has found one for the general case – no parallelism, no specific statistical distribution of the data, no specific data type – and it is widely believed that there is none.

In this post we present a discussion of some 3-sum algorithms and introduce a new one, 3-sum-nzp, that although also quadratic, has a small constant so it performs about 3.0-3.5x faster than the fastest known 3-sum algorithm. This is nothing to snub at because, other than finding a sub-quadratic algorithm, which would be a life achievement, the only thing that we can do is to improve on the constant of the running time, e.g., an $\sim N/2$ algorithm is not going to allow us to solve orders-of-magnitude larger problems than an $\sim N$ one would but it definitely is a better choice.

Brute force 3-sum $- O(N^3)$

This is the naive implementation of 3-sum; it simply loops over the array 3 times. Its order is cubic so it quickly loses its usefulness for anything other than benchmarking.

def sum3_brute_force(a):
    res = []
    N = len(a)

    # Python ranges exclude the last term, i.e., range(0,3) = [0,1,2]
    for i in range(0, N-2):               # for i = 0 to N-3
        for j in range(i+1, N-1):         # for j = i+1 to N-2
            for k in range (j+1, N):      # for k = j+1 to N-1
                if a[i] + a[j] + a[k] == 0:
                    res.append( [a[i], a[j], a[k]] )
    return res

3-sum with binary sort $- O(N^2 \cdot \log N)$ average

This is a slight variation of the brute force version. First, we sort the array in $( N\cdot \log N)$ time with Merge sort or Quick sort. Then we iterate over the array twice to find the first two elements of the triple, a[i] and a[j], in the same way done by the brute force algorithm; this process is $O( N^2)$. However, instead of searching for the third element of the triple using a third loop, like in the brute force approach, we look for it using a binary search over the already sorted array. Since the complexity of the binary search is $O(\log N)$ we have a total complexity of $O(N^2\cdot \log N)$

def binary_search(a, key):
    lo, hi = 0, len(a)-1
    while (lo <= hi):
	mid = lo + (hi-lo)/2
	if key < a[mid]:   hi = mid - 1
	elif key > a[mid]: lo = mid + 1
	else:              return mid
    return None

def sum3_with_sort(a):
    res = []
    N = len(a)
    a.sort()           #  O(N log N)

    # Python ranges exclude the last term, i.e., range(0,3) = [0,1,2]
    for i in range(0, N-2):           # for i = 0 to N-3
        for j in range(i+1, N-1):     # for j = i+1 to N-2
            val = -(a[i] + a[j])
                if a[i] < a[j] < val and binary_search( a, val ):
                    res.append( [a[i], a[j], val] )
    return res

3-sum with hash table $- O(N^2)$ average

We sped up the brute force version by realizing that if we already have two elements of the triple we could search for the third one in $O(\log N)$ time using binary search. The hash table approach extends this: we can load the elements of the array in a hash table and then verify the existence of a third element a[k] such that a[i] + a[j] + a[k]=0 in constant time. Again, the first two loops remain as they were in the brute force approach but the third loop is replaced, not with a $O(\log N)$ binary search but with a $O(1)$ hash table lookup:

def sum3_with_hash_table(a):
    res = []
    N = len(a)
    d = {}
    for i in range(N):
 	d[a[i]] = 1
    a.sort()	                        # O(N log N)

    # Python ranges exclude the last term, i.e., range(0,3) = [0,1,2]
    for i in range(0, N-2):             # for i = 0 to N-3
        for j in range(i+1, N-1):       # for j = i+1 to N-2
            val = -(a[i] + a[j])
            if a[i] < a[j] < val and val in d:
                res.append( [a[i], a[j], val] )
    return res

Python, Java and other modern languages include hash tables as standard data structures. In the sample code above, we used dictionaries, the hash tables of Python. We create a hash table d using the elements of the array as keys. We do not care about the values of the hash table, only about the hash table's ability to verify membership in constant time. Thus, all the keys of the dictionary have a dummy value of 1. Alternatively, in Python, we could use sets that are also implemented using hash tables and thus, also have a membership checking of $O(1)$.

3-sum quadratic time $- O(N^2)$ for average and worst cases

This was the fastest known 3-sum algorithm. It keeps two pointers that move from the extremes of the array toward the center, bracketing possible triples. The following algorithm is that shown in Wikipedia's article about 3-sum, modified so that it will return all the triples that add up to zero (Wikipedia has the existence version of 3-sum, i.e., it returns immediately after it finds a triple that adds up to zero):

def sum3_quad(S):
    res = []
    n = len(S)
    S.sort()
    for i in range(0,n-2):  # for i=0 to n-3 do:
        a = S[i]
        k = i+1
        l = n-1
        while k 0:
                l = l-1
            else:
                k = k+1
    return res

3-sum quadratic nzp $- O(N^2)$ for average and worst cases

Presently it is widely believed that 3-sum's lower bound for the general case is quadratic and that, in particular, the algorithm just shown is the fastest known. This does not mean that we cannot find a faster algorithm; only that this faster algorithm would also have a quadratic performance. Here we present such an algorithm that, in general, is 350% faster than the one described in Wikipedia.

We are interested in triples (A, B, C) such that $$A+B+C=0$$

Like in the previous algorithms, we start by sorting the array and adding its elements to a hash table. We then divide the problem in 2 parts: finding the triples that contain a zero as one of its elements and then finding those that don't. These two cases cover all the possible cases because, since all the elements of the array are distinct, we cannot have a triple that has more than one zero.

The triple has a 0

We first address the case in which one of the elements of the triple (A,B,C) is zero. By force, this element lies between the other two, i.e., if, say, B=0, then A = -C. Thus, to find triples of the form (A, 0, C) we iterate over the positive values of the array, in linear time, and find, in constant time, whether their negatives are in the array by checking their membership in the hash table. If there are more positive numbers than negative ones, we iterate over the negative ones instead. The running time of this process is

$O(1)$ in the best case, when there is a single positive (or negative value)
$\sim N/2$ in the worst case, when there are the same number of positive and negative entries in the array.

Let's see this with an example. We start with the following set with N=8 entries:

$$a = [30, -40, -20, -10, 40, 0, 10, 5]$$

First we sort the array and we mark 3 locations: that of the last negative value, n; of zero, z; and of the first positive value, p. Thus, if 0 is an element of the array then z = n+1 and p = z+1 = n+2; otherwise, z is undefined and p = n+1.

a = [-40, -20, -10, 0, 5, 10, 30, 40]
      ^         ^   ^  ^          ^
      0         n   z  p          N-1

In our example we have n=2, z=3 and p=4.

For purposes of both the implementation and performance, we rearrange the negative numbers so they are sorted in descending order, i.e., now both positive and negative numbers are sorted in the ascending value of their magnitudes. This change does not alter the values of n, z or p:

a = [-10, -20, -40, 0, 5, 10, 30, 40]
      ^         ^   ^  ^          ^
      0         n   z  p          N-1

Now we execute the first step, namely the discovery of the triples that contain a zero. Since there are less negative numbers than positive ones, we iterate over the negative values of the array, from 0 to n, where n is largest index of a negative value. For each such negative number A we check for the membership of -A in the hash table. If this is the case then (A, 0, -A) is a triple that adds up to zero and we can add it to the result. The partial result at the end of this step is:

$$res = [ [-10, 0, 10], [-40, 0, 40] ]$$

The triple does not have a 0

Now we consider the case in which the triple does not have a zero as one of its elements. We are going to split this problem in two: first, finding the triples whose largest element in absolute terms is positive and then, finding those triples in which it is negative.

Hence, first consider the case in which the largest value of the triple in absolute terms is positive, i.e., $$|A|>|B|>|C|$$ and A>0. Since 0 is not in the triple, then both B and C have to be negatives. Now consider the possible values of C. C cannot be smaller than -A/2 because if it were, B would have to be larger than -A/2. However, this contradicts that |B|>|C|. In addition, B cannot be equal to C because all the integers of the array are distinct. Hence, both B and C are negative numbers such that

$$-A \lt B \lt -A/2 \lt C \lt 0$$.

To give a numerical example, if A=5, C has to be larger than -A/2= -2.5, i.e., $C \in \{-1,-2\}$, and B has to be smaller than -A/2 but larger than -A, i.e., $B \in \{-3,-4\}$.

Since the negative numbers are sorted in ascending order of magnitudes, then we know that if there is a triple (A,B,C) that has an A>0 then C is to be found at the beginning of the array and it will have a magnitude larger than -A/2. Thus, we iterate over such possible values of C and, in constant time, verify the membership of the remaining member of the triple, i.e., B=-(A+C).

Notice that it is here, in this second iteration, where the quadratic factor creeps in. Going over all the positive numbers has a cost of $\sim N/2$ in the worst case, when there are the same number of positive and negative numbers in the array. Going over the negative numbers, although quadratic, is inexpensive because we only iterate over the negative values larger than -A/2. This makes the search efficient.

Continuing with our example, we first check for triples in which A = 5. Now we iterate over the negative numbers, starting with C = -10 but $$-10 < -A/2 = -2.5$$ and thus no possible triple can have both 5 and -10 under the assumption that 5 is the element of the triple with the largest magnitude. Thus, instead of iterating over all the negative values of the array, we iterated over a single one.

The same thing happens when we consider the next positive value, i.e., A = 10. The first possible negative value is, again, C = -10 but $$-10 < -A/2 = -5$$ and thus no possible triple that adds to zero can have both -10 and 10 (unless of course, one of the elements of the triple is 0 in which case, the triple would had already been found in the first step).

Now we consider the next positive value, i.e., A=30. The first possible negative match is C=-10 which is larger that -A /2 = -15. Thus we look up, in constant time, whether the hash table has a key equal to $$B = -(A+C) = -(30-10) = -20$$ and, indeed, it does so (A, B, C) = (30, -10, -20) is added to the result.

After iterating over the positive values and adding the results to those obtained when checking for triples with zeroes, we have:

$$res = [ [-10, 0, 10], [-40, 0, 40], [30, -10, -20]]$$

The third (and last) step is the dual of the second step: the case in which the number with the largest absolute value of the triple is negative. This step is exactly the same as the second step, simply exchanging the roles of the positive and the negative numbers, i.e., we iterate over the negative numbers and for each of them, iterate over the positive numbers as long as they are smaller than -A/2, and search for the remaining value using the keys of the hash table. The result after this last step is:

$$res = [ [-10, 0, 10], [-40, 0, 40], [30, -10, -20], [-40, 30, 10]]$$

This concludes the search.

3-sum quadratic zp $- O(N^2)$ for average and worst cases

I wrote 3-sum nzp in 2012. Seven years later, in Sept. 2019, I was cleaning the website and ran over the post; it was showing its age; the sample code was in Python 2.7 which is pretty much phased out by now, so I needed to update the code to 3.4. In the process, I got to think about that nzp was faster than other two-loop quadratic versions because it didn't waste time testing sums that were impossible to add up to zero; the impossible combinations were discarded fast using bounds, one for the positive case, and one for the negative case. Wouldn't it be great to impose both bounds simultaneously? Somehow we'd have to get rid of the different positive and negative cases and treat every value in equal terms; this would make the algorithm clearer, simpler, maybe even faster. And thus is how 3-sum zp came to be. Let's start.

We will use the same array as an example:

$$a = [30, -40, -20, -10, 40, 0, 10, 5]$$

and as usual, we will start sorting it, in $O( N\cdot \log N)$ time with Merge sort or Quick sort:

$$a = [-40, -20, -10, 0, 5, 10, 30, 40]$$

If the smallest value of the array is 0 or positive, then the solution is the empty set and we are done; otherwise, the smallest value of the array is negative; let's call its absolute value $\mbox{offset}$. In this case

$$\mbox{offset} = \left|-40\right| = 40$$

Now we shift the array by $\mbox{offset}$ to make all the elements non-negative:

$$\hat{a} = [0, 20, 30, 50, 45, 50, 70, 80]$$

Our original objective is to find triples (A, B, C such that A+B+C=0; we added $3\times\mbox{offset}$ to each of A, B, and C, for a total of

$$\mbox{target}=3\times\mbox{offset}$$

so now our new objective is to find in the shifted array triples $(\hat{A}, \hat{B}, \hat{C})$ such that

$$\hat{A}+\hat{B}+\hat{C}=\mbox{target}$$

such that $\hat{X} = X + \mbox{offset}$ for the element X of the array a. In our example,

$$\mbox{target}=3\times40 = 120$$

The bounds of $\hat{C}$

We will be using the condition that

$$\hat{A} \lt \hat{B} \lt \hat{C}$$

many times. $\hat{C}$ acquires its maximum value, $\hat{C}_{max}$, when both $\hat{A}$ and $\hat{B}$ assume their smallest possible values. This happens when $\hat{A} = 0$ and and $\hat{B} = 1$, i.e., $\hat{B}$ can never be 0 because $\hat{A} \lt \hat{B}$. Thus,

$$0+1+\hat{C}_{max}=\mbox{target}$$

$$\hat{C}_{max} = \mbox{target}-1$$

For our example, in which the target is 120, the triple that maximizes the value of $\hat{C}$ is (0, 1, 119). We cannot obtain this triple from our set $\hat{a}$ because it does not contain neither 1 nor 119, though.

$\hat{C}$ acquires its minimum value, $\hat{C}_{min}$, when $\hat{A}+\hat{B}$ assumes its largest possible value, i.e., when both $\hat{A}$ and $\hat{B}$ are as close as possible to $\hat{C}$. $\hat{B}$ is as close as possible to $\hat{C}$ when $\hat{B} = \hat{C}-1$ and, because $\hat{A} \lt \hat{B}$, $\hat{A}$ is as close as possible to $\hat{C}$, when $\hat{A} = \hat{B}-1 = \hat{C}-2$. Hence, for $\hat{C} = \hat{C}_{min}$ we have

$$(\hat{C}_{min}-2)+(\hat{C}_{min}-1)+\hat{C}_{min}=\mbox{target}$$

$$\hat{C}_{min} = \mbox{offset}+1$$

For our example, in which the target is 120, the triple that minimizes the value of $\hat{C}$ is (39, 40, 41). Again, we cannot obtain this triple from our set $\hat{a}$ because it does not contain neither 39 nor 41, though.

The bounds of $\hat{A}$

The minimum and maximum values of $\hat{A}$ depend on the value of $\hat{C}$ but, since
we now know that the range of possible values for $\hat{C}$ is $\left[\hat{C}_{min},\hat{C}_{max} \right]$, then for each of these values we can find the $\hat{A}_{min}$ and $\hat{A}_{max}$.

For a given $\hat{C}$, $\hat{A}$ will assume its smallest value when $\hat{B}$ is as large as possible, i.e, when $\hat{B} = \hat{C}-1$. Thus,

$$\hat{A}_{min}+(\hat{C}-1)+\hat{C}=\mbox{target}$$

$$\hat{A}_{min} = \mbox{target}+1-2\hat{C}$$

In our example, if $\hat{C} = 50$, then the smallest that $\hat{A}$ can be is $\hat{A}_{min} = 120+1-2\times 50 = 21$.

Likewise, for a given $\hat{C}$, $\hat{A}$ will be as large a possible when $\hat{A} = \hat{B}-1$. Thus,

$$\hat{A}_{max}+(\hat{A}+1)+\hat{C}=\mbox{target}$$

$$\hat{A}_{max} = \frac{\mbox{target}-1-\hat{C}}{2}$$

In our example, if $\hat{C} = 50$, then the largest that $\hat{A}$ can be is $\hat{A}_{max} = (120-1-50)/2 = 34.5$. In our case, the values are integers, so $\hat{A}_{max} = 34$.

Finishing off

Since we know that $\hat{C} \in \left[\hat{C}_{min},\hat{C}_{max} \right]$, and that for each of these $\hat{C}$ we know that $\hat{A} \in \left[\hat{A}_{min},\hat{A}_{max} \right]$, then all its left to do is to loop over these values using a hash table to determine, in $O(1)$ wherther our objective is satisfied, i.e., whether $\hat{B} = \mbox{target} - \hat{A} - \hat{C}$ is in $\hat{a}$.

Sample code

The following code, easy to port to languages like C that do not have iterators, has an average speed up over the other quadratic algorithms of about 3x. Hash tables are not a standard data type in C but they are not difficult to write.

def signum(x):
    return (x > 0) - (x < 0)

def binary_search_2(a, key):
    """ searches an array 'a' for a value in log(n) time
    returns whether it found the item and the last array pos used"""
    lo, hi = 0, len(a)-1
    while (lo <= hi):
        mid = lo + (hi-lo)/2
        if key < a[mid]:      hi = mid - 1
        elif key > a[mid]:    lo = mid + 1
        else:                 return True, mid
    return False, mid

def sum3_nzp(a):
    """ nzp without iterators (easy to port to, say, C)"""
    res = []
    a.sort()

    # fast rejection
    if ( a[0]*a[-1] == 0 ) or ( signum(a[0]) == signum(a[-1]) ):
        return res

    d = {}
    for i in range(N):
        d[a[i]] = 1

    zero_found, i = binary_search_2(a, 0)
    if zero_found:	n, p = i-1, i+1
    elif a[i] < 0:	n, p = i, i+1
    else:		n, p = i-1, i

    # reverse negatives
    for i in range((n+1)/2):
        a[i], a[n-i] = a[n-i], a[i]

    # first deal with zero case
    if zero_found:
        num_neg = n+1
        num_pos = len(a) - num_neg
        if zero_found:
            num_pos -= 1

        if num_neg <= num_pos:
            l, h = 0, n+1
        else:
            l, h = p, N
        for i in range(l, h):
            if -a[i] in d:
                res.append([a[i], 0, -a[i]])

    # now assume |A| > |B| > |C| and A > 0.. then B < 0 and C < 0
    for i in range(p,N):
        A = a[i]
        bound = A/2.
        for j in range(0, n+1):
                C = a[j]
                if -C < bound:
                    B = -(A + C)
                    if B in d:
                        res.append([A, B, C])
                else:
                    break

    # now assume that |A| > |B| > |C| and A < 0.. then B > 0 and C > 0
    for i in range(0, n+1):
        A = a[i]
        bound = -A/2.
        for j in range(p,N):
            C = a[j]
            if C < bound:
                B = -(A + C)
                if B in d:
                    res.append([A, B, C])
            else:
                break

    return res

The auxiliary signum() and binary_search_2() functions used in this version are also used in the following version that uses iterators. Languages like Java and Python that support them can do away with the indexing of the arrays and marginally improve the performance to about 3.5x.

def sum3_nzp(a):
    res = []
    a.sort()

    # fast rejection
    if ( a[0]*a[-1] == 0 ) or ( signum(a[0]) == signum(a[-1]) ):
        return res

    # using set to check for membership (same as hash table)
    d =set(a)

    zero_found, i = binary_search_2(a, 0)
    if zero_found:	n, p = i-1, i+1
    elif a[i] < 0:	n, p = i, i+1
    else:		n, p = i-1, i

    neg = a[:n+1]
    neg.reverse()		# reverse negatives
    pos = a[p:]

    # first deal with zero case
    if zero_found:
        if len(neg)	<= len(pos):	v = neg
        else:				v = pos
        for A in v:
            if -A in d:
                res.append([A, 0, -A])

    # now assume |A| > |B| > |C| and A > 0.. then B < 0 and C < 0
    for A in pos:
        bound = A/2.
        for C in neg:
            if -C < bound:
                if -(A+C) in d:
                    res.append([A, -(A+C), C])
            else:
                break

    # now assume that |A| > |B| > |C| and A < 0.. then B > 0 and C > 0
    for A in neg:
        bound = -A/2.
        for C in pos:
            if C < bound:
                if -(A+C) in d:
                    res.append([A, -(A+C), C])
            else:
                break

    return res

Experimental results

The following are timings of runs with arrays with up to 320 elements; both the brute force and the binary sort versions take several minutes to complete with arrays not much bigger than those. These timings were obtained in a 2.4 GHz dual core MacPro with 4 Gb of RAM and 4 Mb of L2 cache, averaged over 100 runs using arrays with uniformly-distributed random data.

Below the timings we have the corresponding estimate of the running time of each algorithm: that of the brute force 3-sum is cubic, i.e., ∼N^∼2.8, and that of the 3-sum based on the binary sort is slightly above quadratic, i.e., ∼N^∼2.1, which is what is expected because its order is $O(N\cdot \log N)$. The solutions of all the algorithms are checked against those of the brute force algorithm. The running time of the quadratic algorithms are presented as reference but the size of the arrays is too small to generate confidence; we'll improve them in the following section when we use large arrays.

	      brute f. w/ binsort w/ hash t.   quadrat.  quad. nzp speedup
   10	1.332e-04	1.240e-04	6.053e-05	4.687e-05	4.069e-05	1.2x
   20	7.839e-04	5.628e-04	1.902e-04	1.696e-04	8.370e-05	2.0x
   40	5.221e-03	2.383e-03	6.823e-04	6.679e-04	2.336e-04	2.9x
   80	3.753e-02	1.031e-02	2.551e-03	2.618e-03	7.524e-04	3.5x
  160	2.804e-01	4.581e-02	9.890e-03	1.045e-02	2.703e-03	3.9x
  320	2.231e+00	2.003e-01	4.054e-02	4.281e-02	1.121e-02	3.8x

    brute f.:	T(N)=1.798e-07 x N ^ 2.813
  w/ binsort:	T(N)=9.374e-07 x N ^ 2.127
  w/ hash t.:	T(N)=7.062e-07 x N ^ 1.884
    quadrat.:	T(N)=4.766e-07 x N ^ 1.971
   quad. nzp:	T(N)=6.961e-07 x N ^ 1.636

Next we have timings with large arrays, with thousands and tens of thousands of entries. We are no longer computing the timings for the brute force and the binary-sort-based versions; they take too long. However, now that we are using large arrays we can get reliable running times for the quadratic algorithms. The following timings were averaged over 20 runs. If you are running the code, toggle the flag that selects the set of tests to run.

	    w/ hash t.   quadrat.  quad. nzp speedup
   10  6.165e-05	4.745e-05	3.960e-05	1.2x
   20	1.884e-04	1.718e-04	8.515e-05	2.0x
   40	6.581e-04	6.553e-04	2.169e-04	3.0x
   80	2.555e-03	2.584e-03	7.543e-04	3.4x
  160	9.699e-03	1.032e-02	2.591e-03	4.0x
  320	3.938e-02	4.261e-02	1.105e-02	3.9x
  640	1.592e-01	1.732e-01	4.283e-02	4.0x
 1280	6.710e-01	7.234e-01	1.888e-01	3.8x
 2560	2.861e+00	3.018e+00	8.962e-01	3.4x
 5120	1.154e+01	1.266e+01	3.941e+00	3.2x
10240	4.897e+01	5.216e+01	1.648e+01	3.2x
20480	2.363e+02	2.329e+02	8.252e+01	2.8x

  w/ hash t.:	T(N)=4.409e-07 x N ^ 2.000
    quadrat.:	T(N)=3.819e-07 x N ^ 2.026
   quad. nzp:	T(N)=1.933e-07 x N ^ 1.957

As expected, both the version of the 3-sum based on the hash table and the quadratic version from Wikipedia show a quadratic running time, i.e., ∼N^∼2.0. The 3-sum nzp algorithm is also quadratic but its coefficient, i.e., ∼2.0e-7, is half of that of the other two, i.e., ∼4.0e-7. This translates into a consistent 3x speedup over the other two quadratic versions, i.e., a 300% speedup.

In practice the situation is even better than this. Consider a usual algorithm problem, e.g., sorting: the running time of the algorithm is primarily a function of the size of the input data. We can have 10, 1000 or a trillion numbers to sort, i.e., the size of the input array is not constrained. However, 3-sum is an unusual problem because its definition constrains the size of the array. If the data type that we are using is 8-bit numbers (i.e., type char), then the maximum size of the input array is 256 (i.e., from -127 to 128) because of the constraint that all the numbers of the array have to be distinct, i.e., it is just not possible to have arrays with more than 256 chars where all the values are distinct. Likewise, if the data corresponds to pixels on a screen, with a typical linear resolution of, say, about 1500 pixels, then the maximum size of the array is about 1500 elements. Moreover, these are maximums for which every element of the domain is in the array; in the average case, if we are dealing with chars we are likely to have fewer than 256 elements in the array.

With this in mind, we could consider typical cases as having arrays with hundreds to thousands of elements due to the requirement on the uniqueness of each data point. As we can see from the timings, 3-sum-nzp beats the other two quadratic algorithms for any array size, large or small. However, it is around the range of hundreds to thousands that 3-sum nzp has its best relative performance of close to 4x. In this range, the time spent searching for triples is an excellent tradeoff of both algorithm overhead and limitations on resources, e.g., due to the creation of the hash table and sorting, which lowers performance for very small arrays, and to cache misses incurred when comparing widely different locations of an array, which lowers performance for very large ones.

Andres Castano