The Shift-AND algorithm is similar to KMP and Z-algorithm, it inputs a pattern P and a long text T, outputs in linear time all occurrences of P in T. While KMP only maintains the longest prefix of P that is also a suffix of the examining prefix of T, Shift-AND keeps track of all prefixes of P that match any suffixes of the examining prefix of T. The algorithm is structured as below.
Problem Input:
As an example, we take:
- a pattern
P = "nina"
of length m = 4, - and a text
T = "ninjaninan"
of length n = 10.
Shift-AND:
Initialization:
Operation:
We iterate through each character of T, from position 1 to position n. At each step i (), we maintain the invariant of bitset A in time O(1):
A = [(A << 1) OR 1 ] AND M[i]
.
Step i=1:
Examining prefix of T: “n”
A = [(A << 1) OR 1 ] AND M[i] = "0001" AND "0101" = "0001"
.
Only the last bit of A is 1, meaning only the length-1 prefix of P matches a suffix of the current prefix of T.
Step i=2:
Examining prefix of T: “ni”
A = [(A OR 1)] AND M[i] = "0011" AND "0010" = "0010"
.
The second-to-last bit of A is 1, meaning the length-2 prefix of P matches a suffix of the current prefix of T.
Step i=3:
Examining prefix of T: “nin”
A = [(A << 1) OR 1 ] AND M[i] = "0101" AND "0101" = "0101"
.
The length-1 and length-3 prefixes of P each matches a suffix of the current prefix of T.
Step i=4:
Examining prefix of T: “ninj”
A = [(A << 1) OR 1 ] AND M[i] = "1011" AND "0000" = "0000"
.
Since the character "j"
does not appear in P, its corresponding bitset defaults to all 0. At this step, no prefix of P matches any suffix of the current prefix of T.
…
Step i=9:
Examining prefix of T: “ninjanina”
A = [(A << 1) OR 1 ] AND M[i] = "1011" AND "1000" = "1000"
.
At this step, since the first bit of A is set to 1, we have found an occurrence of P inside T.
Step i=10:
Examining prefix of T: “ninjaninan”
A = [(A << 1) OR 1 ] AND M[i] = "0001" AND "0101" = "0001"
.
The last bit of A is 1, meaning the length-1 prefix of P matches a suffix of the current prefix of T.
Algorithm terminates.
Output:
After each step, we check bitset A, if the first bit of A is 1 (implying that the prefix with length m of P matches a suffix of the current prefix of T) then output a match.
For this example, we found 1 occurrence of P in T (the matching ends at position 9 of T).
References:
- Shift-and visualization, Alvaro Videla, blog.
great