From HSC Technical Wiki

Main: Pattern Matching using Rabin-Karp algorithm

 Rabin-Karp Algorithm:

 



 Algorithm Analysis:

Role of Hashing in Rabin-Karp:

The key idea in Rabin-Karp algorithm is to develop a classification of patterns into possible matches and not possible matches. It is based the nature of the hashing mechanism. At simplest level a hash function is a modulo function. which is used to map a number into another smaller set of numbers. A modulo function (not "%" of C-language, which is actually a remainder operation). is defined as: a mod b = a - b* floor((float)a/(float)b where a and b are integers. We use the definition of modulo as implemented in Microsoft Excel (which use INT() instead of floor function), note that modulo is negative numbers is undefined, though MS-Excel won't tell you that! For serious fans of Modulo function do check out the debate at dr. math.

The Rabin-Karp algorithm uses the hash function to generation a hash value for each m letters in the text and then compares these one by one with the hash value generated from the Pattern. A match of hash values indicates a "potential" match and then a complete match is attempted. The spurious matches are expected and depend on the nature of the hash function. Thumb rule is to select a prime number much larger than m such that the chances of spurious matches is reduced.

An improvement on the traditional mod function for a large Text is to reuse the hash value obtained of 1..m letters in 2..m+1 letters. Thereby one reduces the computational complexity of finding mod of large numbers.

So two techniques are combined to obtain modulo of large numbers using Horner's method and rolling Hash i.e. obtaining hash value of 2..m+1 digits from 1..m digits is a series of digits/letters.

maintainer: rakesh.mawa@hsc.com

Categories: PatternMatching Software

<< Main.PatternMatching-Knuth-Morris-Prat | PatternMatching-Trail | Main.PatternMatching-Boyer-Moore >>

Retrieved from http://wiki.hsc.com/wiki/Main/PatternMatching-Rabin-Karp
Page last modified on September 17, 2009, at 07:56 AM