Finding the Longest Common Substring Between Two Strings in Java

Finding the longest common substring between two strings is a common problem in programming, especially when dealing with text analysis, bioinformatics, or even in everyday applications like plagiarism detection. A common substring is a sequence of characters that appears in the same order in both strings, and the longest common substring is simply the one with the greatest length. In this tip, we’ll explore a couple of ways to achieve this in Java.

Method 1: Dynamic Programming Approach

The dynamic programming approach is one of the most efficient ways to solve this problem. It uses a 2D array to keep track of the lengths of common substrings between two strings as they are built up character by character.

Here’s how it works:

We create a 2D array dp where dp[i][j] represents the length of the longest common substring ending at indices i-1 in the first string and j-1 in the second string.
If the characters at these indices are the same, we extend the length of the common substring found so far.
We update the maximum length whenever a longer common substring is found.

Here’s the implementation:

public class LongestCommonSubstring {
    public static String findLongestCommonSubstring(String s1, String s2) {
        int maxLength = 0; // To store the maximum length of common substring
        int endIndex = 0;  // To store the end index of the longest common substring in s1
        int[][] dp = new int[s1.length() + 1][s2.length() + 1];

        for (int i = 1; i <= s1.length(); i++) {
            for (int j = 1; j <= s2.length(); j++) {
                if (s1.charAt(i - 1) == s2.charAt(j - 1)) {
                    dp[i][j] = dp[i - 1][j - 1] + 1;
                    if (dp[i][j] > maxLength) {
                        maxLength = dp[i][j];
                        endIndex = i; // Update the end index of the substring in s1
                    }
                }
            }
        }
        return s1.substring(endIndex - maxLength, endIndex);
    }

    public static void main(String[] args) {
        String s1 = "abcdfghijk";
        String s2 = "abedfgh";
        System.out.println("Longest Common Substring: " + findLongestCommonSubstring(s1, s2));
    }
}

Output:

Longest Common Substring: fgh

Explanation:

The code initializes a 2D array dp where each cell keeps track of the length of the common substring found so far.
Whenever two characters match (s1.charAt(i - 1) == s2.charAt(j - 1)), the length of the substring is incremented.
If a longer substring is found, the code updates maxLength and endIndex to keep track of where this substring ends in s1.
Finally, the longest common substring is extracted from s1 using the substring method.

Method 2: Sliding Window with Hashing (Optimized for Small Alphabets)

While dynamic programming is efficient, there’s another interesting approach using sliding windows and hashing, particularly useful when the strings consist of a small set of characters (like lowercase letters). This approach involves checking substrings of varying lengths and using a hash set to detect common substrings.

import java.util.HashSet;
import java.util.Set;

public class LongestCommonSubstringHashing {

    public static String findLongestCommonSubstring(String s1, String s2) {
        String result = "";
        int maxLength = 0;

        for (int len = 1; len <= Math.min(s1.length(), s2.length()); len++) {
            Set<String> set = new HashSet<>();

            // Add all substrings of length 'len' from the first string to the set
            for (int i = 0; i <= s1.length() - len; i++) {
                set.add(s1.substring(i, i + len));
            }

            // Check substrings of length 'len' from the second string
            for (int j = 0; j <= s2.length() - len; j++) {
                String substring = s2.substring(j, j + len);
                if (set.contains(substring) && len > maxLength) {
                    maxLength = len;
                    result = substring;
                }
            }
        }
        return result;
    }

    public static void main(String[] args) {
        String s1 = "abracadabra";
        String s2 = "ecadabrak";
        System.out.println("Longest Common Substring: " + findLongestCommonSubstring(s1, s2));
    }
}

Output:

Longest Common Substring: cadabra

Explanation:

This method uses a sliding window technique to compare substrings of increasing lengths between the two strings.
For each length, it creates a hash set of substrings from the first string and then checks if any substrings of the same length from the second string exist in the set.
This approach can be particularly efficient for small alphabets where hashing and substring checks are very fast.

Finding the longest common substring between two strings can be tackled in various ways depending on the constraints and requirements. The dynamic programming approach is a go-to method for its clarity and efficiency in most cases, while hashing can offer a neat alternative under certain conditions. Exploring these methods not only helps solve the problem but also deepens the understanding of string manipulation techniques in Java. So, the next time you need to compare strings, you’ll have multiple tools at your disposal!