Once has received the list of checksums of the
blocks of *B*, it must search *A* for any blocks at any offset that
match the checksum of some block of *B*. The basic strategy is to compute
the 32-bit rolling checksum for a block of length *S* starting at each byte
of *A* in turn, and for each checksum, search the list for a match. To do
this our implementation uses a simple 3 level searching scheme.

The first level uses a 16-bit hash of the 32-bit rolling checksum and a
2^{16} entry hash table. The list of checksum values (i.e., the
checksums from the blocks of *B*) is sorted according to the 16-bit hash of
the 32-bit rolling checksum. Each entry in the hash table points to the first
element of the list for that hash value, or contains a null value if no element
of the list has that hash value.

At each offset in the file the 32-bit rolling checksum and its 16-bit hash are calculated. If the hash table entry for that hash value is not a null value, the second level check is invoked.

The second level check involves scanning the sorted checksum list starting with the entry pointed to by the hash table entry, looking for an entry whose 32-bit rolling checksum matches the current value. The scan terminates when it reaches an entry whose 16-bit hash differs. If this search finds a match, the third level check is invoked.

The third level check involves calculating the strong checksum for the
current offset in the file and comparing it with the strong checksum value in
the current list entry. If the two strong checksums match, we assume that we
have found a block of *A* which matches a block of *B*. In fact the
blocks could be different, but the probability of this is microscopic, and in
practice this is a reasonable assumption.

When a match is found, sends the data in *A*
between the current file offset and the end of the previous match, followed by
the index of the block in *B* that matched. This data is sent immediately a
match is found, which allows us to overlap the communication with further
computation.

If no match is found at a given offset in the file, the rolling checksum is
updated to the next offset and the search proceeds. If a match is found, the
search is restarted at the end of the matched block. This strategy saves a
considerable amount of computation for the common case where the two files are
nearly identical. In addition, it would be a simple matter to encode the block
indexes as runs, for the common case where a portion of *A* matches a
series of blocks of *B* in order.