A curated, production-quality Java library of the most sophisticated string algorithms ever devised — algorithms that power search engines, compilers, bioinformatics tools, and beyond.
This repository goes far beyond the basics. Every algorithm here is a mathematical masterpiece with deep theoretical roots, implemented cleanly and efficiently in Java.
| Algorithm | Category | Complexity | What It Does |
|---|---|---|---|
| 🔁 Booth's Algorithm | Canonical Form | O(n) |
Finds the lexicographically smallest rotation of a string |
| 🔄 Duval's Algorithm | Lyndon Words | O(n) |
Decomposes a string into Lyndon words in linear time |
| 🌳 EER Tree | Palindromes | O(n) |
Builds a palindrome tree — tracks all distinct palindromic substrings |
| 🧬 Farach's Algorithm | Suffix Trees | O(n) |
Optimal suffix tree construction for integer alphabets |
| 🤖 Suffix Automaton | Automata | O(n) |
The smallest DFA that accepts all suffixes of a string |
| 🗜️ Ukkonen's Algorithm | Suffix Trees | O(n) |
Real-time, online suffix tree construction — a true classic |
Advanced-String-Algorithms-Java/
└── src/
└── main/
└── java/
└── org/bharadwaj/t/
├── BoothsAlgorithm/
├── DuvalsAlgorithm/
├── EERTreeAlgorithm/
├── FarachsAlgorithm/
├── SuffixAutomatonAlgorithm/
└── UkkonensAlgorithm/
- Java 11 or higher
- Maven or Gradle (optional)
git clone https://github.com/Manu577228/Advanced-String-Algorithms-Java.git
cd Advanced-String-Algorithms-Javajavac src/main/java/org/bharadwaj/t/UkkonensAlgorithm/*.java
java -cp src/main/java org.bharadwaj.t.UkkonensAlgorithm.Main🔁 Booth's Algorithm — Lexicographic Rotation
Booth's algorithm finds the starting index of the lexicographically least rotation of a string in O(n) time. It's a cornerstone of string normalization and is used in canonical form detection, graph isomorphism, and data compression.
Key insight: Uses a failure function similar to KMP to avoid recomparison.
🔄 Duval's Algorithm — Lyndon Factorization
Duval's algorithm decomposes any string into a sequence of Lyndon words — strings that are strictly smaller than all of their rotations. The factorization is unique (Chen–Fox–Lyndon theorem) and runs in O(n) time with O(1) space.
Applications: String sorting, suffix array construction, algebraic combinatorics.
🌳 EER Tree (Palindromic Tree)
The EER Tree (Eertree / Palindrome Tree) is a data structure that encodes all distinct palindromic substrings of a string in a compact tree. It has at most n+2 nodes for a string of length n.
Applications: Palindrome counting, eertree-based DP, genome analysis.
🧬 Farach's Algorithm — Suffix Trees for Integer Alphabets
Farach's algorithm is theoretically optimal — it constructs suffix trees for integer alphabets in O(n) time, improving on earlier algorithms. It uses a clever divide-and-conquer approach based on odd/even position merging.
Applications: Full-text indexing, bioinformatics, pattern matching at scale.
🤖 Suffix Automaton (SAM)
The Suffix Automaton is the minimal DAWG (Directed Acyclic Word Graph) that recognizes all suffixes of a given string. It has O(n) states and transitions and is one of the most powerful string data structures ever developed.
Applications: Substring search, LCS computation, string counting.
🗜️ Ukkonen's Algorithm — Online Suffix Tree
Ukkonen's is perhaps the most celebrated string algorithm — it builds a suffix tree online, one character at a time, in O(n) total time. The use of suffix links and active points makes it beautifully efficient.
Applications: Pattern matching, data compression, plagiarism detection.
Algorithm | Time | Space | Online?
-------------------|-----------|-----------|--------
Booth's | O(n) | O(n) | No
Duval's | O(n) | O(1) | Yes ✅
EER Tree | O(n) | O(n) | Yes ✅
Farach's | O(n) | O(n) | No
Suffix Automaton | O(n) | O(n) | Yes ✅
Ukkonen's | O(n) | O(n) | Yes ✅
Most string algorithm repositories stop at KMP, Rabin-Karp, or Z-algorithm. This repo starts where others end. These are:
- 📐 Mathematically deep — rooted in automata theory, combinatorics on words, and formal language theory
- ⚙️ Practically powerful — used in real-world systems from Linux
grepto bioinformatics pipelines - 🧩 Interview gold — rare enough to impress, important enough to matter
Contributions are warmly welcome! If you'd like to:
- 🐛 Fix a bug → open an Issue first
- ➕ Add an algorithm → open a Pull Request with tests
- 📖 Improve docs → direct PRs welcome
# Fork → Clone → Branch → Code → PR
git checkout -b feature/your-algorithm-nameThis project is licensed under the MIT License — use it freely, attribution appreciated.
Made with 🧠 + ☕ by Manu577228
If this helped you, consider leaving a ⭐ — it means a lot!