loading
koncpt-img

String-matching, also known as pattern matching, is a fundamental aspect of computer science and is used in various applications like data validation, data scraping, data transformation, and more. In this article, we will explore three popular string-matching algorithms and illustrate how to implement them in Python.

1. Naive String-Matching Algorithm

The Naive string-matching algorithm is a straightforward method where we slide the pattern over the text one by one and check for a match. If a match is found, the process is continued until all characters match.

Python code:

def naive_search(text, pattern):
    n = len(text)
    m = len(pattern)
    
    # Slide pattern over text one by one
    for i in range(n - m + 1):
        j = 0
        while(j < m):
            if (text[i + j] != pattern[j]):
                break
            j += 1
 
        if (j == m):
            print("Pattern found at index", i)
text = "ABCCDDAEFG"
pattern = "CDD"
naive_search(text, pattern)

2. Rabin-Karp Algorithm

The Rabin-Karp Algorithm is an efficient string-matching method that uses hashing. It hashes the pattern and the text’s substrings of the same length, comparing the hashes instead of the actual strings. This approach reduces the time complexity, especially for longer patterns and texts.

Python code:

# Following program is the python implementation of
# Rabin Karp Algorithm given in CLRS book
 
# d is the number of characters in the input alphabet
d = 256
  
def rabin_karp_search(pattern, text, q):
    m = len(pattern)
    n = len(text)
    p = 0    # hash value for pattern
    t = 0    # hash value for text
    h = 1
    i = 0
    j = 0
  
    # The value of h would be "pow(d, m-1)%q"
    for i in range(m-1):
        h = (h * d) % q
  
    # Calculate the hash value of pattern and first window
    # of text
    for i in range(m):
        p = (d * p + ord(pattern[i])) % q
        t = (d * t + ord(text[i])) % q
  
    # Slide the pattern over text one by one
    for i in range(n - m + 1):
        if p == t:
            for j in range(m):
                if text[i + j] != pattern[j]:
                    break
                else: j += 1
 
            if j == m:
                print("Pattern found at index " + str(i))
 
        if i < n - m:
            t = (d*(t - ord(text[i]) * h) + ord(text[i + m])) % q
 
            # We might get negative value of t, converting it
            # to positive
            if t < 0:
                t = t + q
 
text = "ABCCDDAEFG"
pattern = "CDD"
q = 101 # A prime number
rabin_karp_search(pattern, text, q)

3. Knuth-Morris-Pratt (KMP) Algorithm

The KMP algorithm is an improvement over the naive pattern search. It uses the observation that when a mismatch happens, the initial part of the pattern may still match a part of the text. It uses a “lps” array (longest proper prefix which is also suffix) to skip unnecessary comparisons.

Python code

def compute_lps_array(pattern, m, lps):
    length = 0  # length of the previous longest prefix suffix
    lps[0] = 0  # lps[0] is always 0
    i = 1
 
    while i < m:
        if pattern[i] == pattern[length]:
            length += 1
            lps[i] = length
            i += 1
        else:
            if length != 0:
                length = lps[length - 1]
            else:
                lps[i] = 0
                i += 1
 
def kmp_search(pattern, text):
    m = len(pattern)
    n = len(text)
 
    lps = [0]*m
    j = 0  # index for pattern[]
 
    compute_lps_array(pattern, m, lps)
 
    i = 0  # index for text[]
    while i < n:
        if pattern[j] == text[i]:
            i += 1
            j += 1
 
        if j == m:
            print("Pattern found at index " + str(i - j))
            j = lps[j - 1]
 
        elif i < n and pattern[j] != text[i]:
            if j != 0:
                j = lps[j - 1]
            else:
                i += 1
text = "ABCCDDAEFG"
pattern = "CDD"
kmp_search(pattern, text)

The KMP algorithm offers a significantly improved time complexity of O(n) in comparison to the naive approach, making it a popular choice for pattern searching tasks.

In summary, string-matching is a powerful technique with wide-ranging applications. Choosing the right string-matching algorithm for your task depends on the specific needs and constraints of your project. The naive approach, Rabin-Karp, and KMP algorithms represent different trade-offs between implementation complexity and runtime efficiency. By understanding these different options, you can choose the best approach for your specific needs.

ABOUT LONDON DATA CONSULTING (LDC)

We, at London Data Consulting (LDC), provide all sorts of Data Solutions. This includes Data Science (AI/ML/NLP), Data Engineer, Data Architecture, Data Analysis, CRM & Leads Generation, Business Intelligence and Cloud solutions (AWS/GCP/Azure).

For more information about our range of services, please visit: https://london-data-consulting.com/services

Interested in working for London Data Consulting, please visit our careers page on https://london-data-consulting.com/careers

More info on: https://london-data-consulting.com

Write a Reply or Comment

Your email address will not be published. Required fields are marked *

    wpChatIcon