Skip to content

Manu577228/Advanced-String-Algorithms-Java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


⚡ Advanced String Algorithms — Java

"Strings are the DNA of computation — master them, and you master the machine."


🧬 What Is This?

A curated, production-quality Java library of the most sophisticated string algorithms ever devised — algorithms that power search engines, compilers, bioinformatics tools, and beyond.

This repository goes far beyond the basics. Every algorithm here is a mathematical masterpiece with deep theoretical roots, implemented cleanly and efficiently in Java.


🗂️ Algorithms Included

Algorithm Category Complexity What It Does
🔁 Booth's Algorithm Canonical Form O(n) Finds the lexicographically smallest rotation of a string
🔄 Duval's Algorithm Lyndon Words O(n) Decomposes a string into Lyndon words in linear time
🌳 EER Tree Palindromes O(n) Builds a palindrome tree — tracks all distinct palindromic substrings
🧬 Farach's Algorithm Suffix Trees O(n) Optimal suffix tree construction for integer alphabets
🤖 Suffix Automaton Automata O(n) The smallest DFA that accepts all suffixes of a string
🗜️ Ukkonen's Algorithm Suffix Trees O(n) Real-time, online suffix tree construction — a true classic

📁 Project Structure

Advanced-String-Algorithms-Java/
└── src/
    └── main/
        └── java/
            └── org/bharadwaj/t/
                ├── BoothsAlgorithm/
                ├── DuvalsAlgorithm/
                ├── EERTreeAlgorithm/
                ├── FarachsAlgorithm/
                ├── SuffixAutomatonAlgorithm/
                └── UkkonensAlgorithm/

🚀 Getting Started

Prerequisites

  • Java 11 or higher
  • Maven or Gradle (optional)

Clone the Repository

git clone https://github.com/Manu577228/Advanced-String-Algorithms-Java.git
cd Advanced-String-Algorithms-Java

Compile & Run (Example — Ukkonen's Algorithm)

javac src/main/java/org/bharadwaj/t/UkkonensAlgorithm/*.java
java -cp src/main/java org.bharadwaj.t.UkkonensAlgorithm.Main

🧠 Deep Dives

🔁 Booth's Algorithm — Lexicographic Rotation

Booth's algorithm finds the starting index of the lexicographically least rotation of a string in O(n) time. It's a cornerstone of string normalization and is used in canonical form detection, graph isomorphism, and data compression.

Key insight: Uses a failure function similar to KMP to avoid recomparison.

🔄 Duval's Algorithm — Lyndon Factorization

Duval's algorithm decomposes any string into a sequence of Lyndon words — strings that are strictly smaller than all of their rotations. The factorization is unique (Chen–Fox–Lyndon theorem) and runs in O(n) time with O(1) space.

Applications: String sorting, suffix array construction, algebraic combinatorics.

🌳 EER Tree (Palindromic Tree)

The EER Tree (Eertree / Palindrome Tree) is a data structure that encodes all distinct palindromic substrings of a string in a compact tree. It has at most n+2 nodes for a string of length n.

Applications: Palindrome counting, eertree-based DP, genome analysis.

🧬 Farach's Algorithm — Suffix Trees for Integer Alphabets

Farach's algorithm is theoretically optimal — it constructs suffix trees for integer alphabets in O(n) time, improving on earlier algorithms. It uses a clever divide-and-conquer approach based on odd/even position merging.

Applications: Full-text indexing, bioinformatics, pattern matching at scale.

🤖 Suffix Automaton (SAM)

The Suffix Automaton is the minimal DAWG (Directed Acyclic Word Graph) that recognizes all suffixes of a given string. It has O(n) states and transitions and is one of the most powerful string data structures ever developed.

Applications: Substring search, LCS computation, string counting.

🗜️ Ukkonen's Algorithm — Online Suffix Tree

Ukkonen's is perhaps the most celebrated string algorithm — it builds a suffix tree online, one character at a time, in O(n) total time. The use of suffix links and active points makes it beautifully efficient.

Applications: Pattern matching, data compression, plagiarism detection.


📊 Complexity Overview

Algorithm          | Time      | Space     | Online?
-------------------|-----------|-----------|--------
Booth's            | O(n)      | O(n)      | No
Duval's            | O(n)      | O(1)      | Yes ✅
EER Tree           | O(n)      | O(n)      | Yes ✅
Farach's           | O(n)      | O(n)      | No
Suffix Automaton   | O(n)      | O(n)      | Yes ✅
Ukkonen's          | O(n)      | O(n)      | Yes ✅

💡 Why These Algorithms?

Most string algorithm repositories stop at KMP, Rabin-Karp, or Z-algorithm. This repo starts where others end. These are:

  • 📐 Mathematically deep — rooted in automata theory, combinatorics on words, and formal language theory
  • ⚙️ Practically powerful — used in real-world systems from Linux grep to bioinformatics pipelines
  • 🧩 Interview gold — rare enough to impress, important enough to matter

🤝 Contributing

Contributions are warmly welcome! If you'd like to:

  • 🐛 Fix a bug → open an Issue first
  • ➕ Add an algorithm → open a Pull Request with tests
  • 📖 Improve docs → direct PRs welcome
# Fork → Clone → Branch → Code → PR
git checkout -b feature/your-algorithm-name

📜 License

This project is licensed under the MIT License — use it freely, attribution appreciated.


Made with 🧠 + ☕ by Manu577228

If this helped you, consider leaving a ⭐ — it means a lot!

About

A collection of advanced String algorithms with efficient implementations and clear explanations.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages