{Numb, Comput}erphile - Mike Pound and Tom Crawford "What's The Best Hash Function to Use"

February 01, 2024

Can you do an iterated hash function which distributes the collisions in some optimal manner? It would be a kind of tree of hash tables with a possibly different hash function operating at each level. See FM-index and Wavelet Tree. It might be interesting to consider Boolean Expression Diagrams to compute hashes: see Steve Bagley and David Brailsford on Data Compression and Weird Stuff.

Abstractly, a hash function is a mapping f from a vector space V of bits onto a binary scalar {0.1} where f(v) = 0 means "The vecor v of bits is in the table" and f(v) = 1 means "The vector v is not in the table." Another way to think about this is that the hash function is a two-coloring of a graph which, for an n dimensional vector space is the set of vertices of an n dimensional (hyper) cube. Think of the coordinates of the vertices on n orthogonal axes: if it's a unit cube with one vertex at the origin then the vertices are each either at 0 or 1 on each of the n axes. Then you can think of the hash table as being a relation which is an adjacency matrix, which specifies exactly those vertices which have edges between them. This is a diagonally symmetric n by n matrix with one entry for each row and column, and that entry at i,j is 1 if the vertex number i is connected to the vertex number j and 0 otherwise. Then there are special colorings of the vertices of the graph called bipartite which are those colorings where every edge in the adjacency matrix connects two vertices which are different: one is in the hash and the other is not. In these cases I conjecture that the relation has a form which can be split into two functional relations: one which maps the vertices in the hash to the ones outside, and the other which maps the ones outside to the ones inside. See Galois connection. Now it may turn out that there's a theorem in combinatorics which says something about these bipartite graphs: that there is some way that they can be used to construct the dominating cycles that Sophie Maclean talked about in Numberphile - Sophie Maclean on the Catalan Numbers. To prove this you would need to define some (pair of) partial orderings on the sets of elements that are in the hash and those that are not. See my tweet about Eugenia Cheng's talk.