# Odd man out

Recently I was reminded of my APL and J implementations of an algorithm that solves a problem you probably didn't have, and in a way you probably don't want to use. Using APL or J isn't just for bonus unreadability; those languages have built-in features that make implementing this algorithm easier.

You can see the APL in the image above. Here's the J:

x=:(1 3 6 4 2 6 3 4 5 5 1)

echo +/ (2^ - c+1 - {. $c) * (~:/ (1+ c e. c =: i. >. 2 ^. >. / x) #: x)

Here's the problem: Given an array of integers, in which all but one value is duplicated, find the odd man outâ€”the integer which only appears once in the array.

Or put another way, all the values except one in the array are literal duplicates (they appear twice). Find the singular value.

There's an obvious, terrible algorithm: Iterate over the list, and for each value, search the list to see if it appears anywhere else. That's O(N^{2}).

There's an obvious O(N lg N) algorithm: Sort the list, then iterate over it, checking for a value that's not the same as the adjacent value. This also requires O(N) space if you're not allowed to change the list.

A pretty obvious O(N) algorithm is to do, in effect, a radix sort. Iterate over the list. For each value, consult a bit vector with one bit for each possible value. If the bit is unset, set it; if it's set, clear it. At the end, the vector will have a single bit set, corresponding to the unique value. This unfortunately requires at least O(N) space, and potentially EXPSPACE if the values in the list are sparse. And you'll probably have to make two passes over the list to determine the range you have to support.

And that suggests the good algorithm: XOR all the values together. Since addition in GF(2), i.e. exclusive-or, is self-inverting, all the duplicate values cancel out; and since it's associative and commutative, it doesn't matter what order the values are in. At the end you have the single value.

A variant of that last algorithm is what I implemented in APL and J , but rather than simply applying XOR across the list, I ... did something else.

## Artisanal bits

My algorithm, as implemented in APL and J, scorns built-in binary-arithmetic features and manipulates bits the old fashioned way: as vectors and matrices of 0's and 1's.

For the brave (or really bored), here it is.

- In APL and J, the {iota} monadic function, represented by "i." in J, generates a vector of the first N integers (starting from 1 in APL, or 0 in J).
The parameter my code supplies to iota is the result of:

- The maximum dyadic function is applied to the input, resulting in the greatest value in the input. That's the ">./x" in the code above; in J, ">." is the {maximum} and {ceiling} function, depending on context, "/" is the {apply} operator, and "x" here is the variable I assigned the input vector to.
- I take the log base 2 of that value ("2 ^.").
- I take the ceiling of that. The entire expression for all of step 2 then is ">. 2 ^. >./x", if anyone's still trying to follow the code.

- So now I have a vector of the integers from 0 to one less than the ceiling of the lg (log base 2) of the highest value in the list. In this case, that's (0 1 2), because the highest value is 6 and lg(6) <= 3. I assign this to the variable "c".
- The {epsilon} dyadic function, "e.", finds all the elements in vector
*A*which are also in vector*B*. Here ("c e. c")*A*and*B*are both c, and everything in c is in c, so they're all there. The result of epsilon is a vector with 1 where the corresponding element from*A*appears in*B*, so "c e. c" will be the vector (1 1 1 ...) of the same length as c. With this input it's (1 1 1). - Add 1 to each element of c ("1+ c"), giving us (2 2 2). Why? Here's a hint: this vector is long enough to represent each of the bits we need to represent any value in the input, and a bit is a base-2 value.
- Now the {encode} (aka "antibase") dyadic function, "#:", comes into play. This produces a matrix where each row is the base-2 representation of the corresponding value from x. That (2 2 2) vector from step 5 tells encode what bases to use for each row (here, base 2 for all of them). In J, we actually could have skipped most of the steps above, because there's a monadic {encode} which defaults to base 2; but my J version was a reimplementation of the APL version, and I was working with limited documentation when I wrote
*that*, and, well, it's more chindogish this way. - Apply the {unequal} operator "~:" across the matrix from step 6. This uses the apply operator "/" we saw in step 2 again; it's also called "reduce" and "insert". It inserts the specified dyad ({unequal} in this case) between each item in the array on the right, and evaluates the result. For a matrix, a dyad operates independently on each column of adjoining rows. Since not-equal on {0,1} is equivalent to XOR, this XORs the rows of the table here, producing a single array of bits representing the result.
- Take the {shape} of c, that (2 2 2) vector from step 5. This gives us an array containing the dimensions of c. c is a vector, so this is a 1-item array and the value in it is the length of c. Use the head operator to convert that to an atom (integer value): "{. $c". This gives us the number of bits required to represent any value in x, because c had one entry for each bit position we need.
- Subtract the value computed in step 8 from each member of c, and negate the result, and add 1 to each value ("- c+1 - {. $c"). That gives us (2 1 0): the exponent (base 2) for each bit position.
- Raise 2 to the power of the vector from step 9, giving us a vector of powers of 2: (4 2 1).
- Multiply that vector by the answer from step 7, which is a binary representation, in the form of a vector of 1's and 0's, of the answer. This is the "*" operator. It's a piecewise multiply: the first value of the LHS times the first value of the RHS, and so on, giving a vector of the same shape. So if step 7 gave us (0 1 0), step 11 gives us (0 2 0). (Once again, there's a simpler way in APL and J to convert a bit vector to decimal representation, but this is impressively convoluted, no?)
- Apply addition across the result from 11, summing the values of the bit positions and giving the final answer.

A simpler way to do this in J is just "#.~:/#:x", which is: expand elements of x into binary representation, as rows of a matrix; apply XOR along the columns, giving a vector; convert the vector from binary back to decimal. (Read from right to left, with #: being expand-to-binary, etc.)

Anyway, that was my first example of a chindogu algorithm. My next few posts will be about more interesting problems and algorithms that still don't quite seem to reach the point of doing something usefulâ€”though they were conceived in response to what someone thought was a useful problem.