Skip to content

feat(algorithms): add bubble, insertion, counting, and quicksort#570

Open
Robertboy18 wants to merge 4 commits into
leanprover:mainfrom
Robertboy18:algorithms-sorting-timem
Open

feat(algorithms): add bubble, insertion, counting, and quicksort#570
Robertboy18 wants to merge 4 commits into
leanprover:mainfrom
Robertboy18:algorithms-sorting-timem

Conversation

@Robertboy18
Copy link
Copy Markdown

I wanted to contribute more of the standard sorting algorithms alongside the existing MergeSort development. This PR adds stable bubble sort, stable insertion sort, counting sort for natural-number lists, and a duplicate-safe randomized quicksort, all using the existing TimeM style so the algorithms come with executable definitions, correctness theorems, and comparison-cost bounds.

For stability, I added a shared StableByValue predicate. The idea is that if we filter the input and output by any fixed value, we should get the same subsequence. This captures the usual list-level meaning of stability: equal elements appear in the same relative order after sorting. Bubble sort is stable because it swaps only when the next element is strictly smaller, never when two elements are equal. Insertion sort is stable because it processes the input from right to left and inserts a new element before equal elements already in the sorted tail. Counting sort is stable for List ℕ because each value’s filtered subsequence is completely determined by how many copies of that value occur.

The randomized quicksort part adds a small RandomTimeM abstraction: a PMF over timed computations. I used this because randomized algorithms do not produce one timed result; they produce a distribution over possible timed runs. The quicksort implementation samples a pivot index uniformly, removes that pivot occurrence, and partitions the rest into values below, equal to, and above the pivot. The equal bucket is not recursively sorted, so duplicates are handled directly instead of relying on a distinctness assumption.

For time complexity, the deterministic algorithms prove their expected comparison bounds directly in TimeM. For randomized quicksort, this PR proves sortedness and permutation correctness for every supported run, a quadratic worst-case comparison bound for every supported run, and the corresponding expected-time bound. It also adds reusable expected-time algebra for RandomTimeM, including the monad law for ENNReal expectations and a formal one-step expected-time recurrence for the actual quicksort implementation. I included the standard harmonic O(n log n) bound for the all-distinct rank recurrence as the next bridge toward the sharper average-case theorem.

AI usage: I used GPT-5.5 as an assistant while working on this PR, mainly for help navigating longer Lean proofs, trying proof refactors using some of the more advanced automation tactics. I reviewed the generated suggestions, ran the relevant CSLib checks locally, and take responsibility for the final code. Let me know if something looks bad or unnecessary.

Concretely, this PR adds:

  • StableByValue as a shared stability predicate for list sorting.
  • Stable bubble sort with sortedness, permutation, stability, and quadratic time proofs.
  • Stable insertion sort with sortedness, permutation, stability, and quadratic time proofs.
  • Counting sort on List ℕ with count preservation, sortedness, permutation, stability, and exact time proofs.
  • RandomTimeM, with expected-value and expected-time lemmas for randomized timed computations.
  • Randomized quicksort with PMF semantics, duplicate-safe three-way partitioning, correctness for every supported run, quadratic worst-case time, expected-time bounds, and an expected-time recurrence lemma.

Validation run locally:

  • lake build Cslib
  • lake test
  • lake lint
  • lake exe checkInitImports
  • lake exe lint-style
  • lake exe mk_all --module

Copilot AI review requested due to automatic review settings May 17, 2026 19:57
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds foundational sorting infrastructure and several list-sorting algorithms (with correctness/stability and comparison-count cost models) to the Lean portion of the library, including support for randomized timed computations.

Changes:

  • Introduce StableByValue as a general stability predicate for list sorting results.
  • Add deterministic stable sorts (insertion sort, bubble sort), counting sort for , and randomized quicksort with expected-time results.
  • Add RandomTimeM utilities for expectations over PMF (TimeM ...), and re-export new modules from Cslib.lean.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
Cslib/Algorithms/Lean/Sorting.lean Adds a stability predicate (StableByValue) used by stable sort proofs.
Cslib/Algorithms/Lean/RandomTimeM.lean Introduces RandomTimeM plus expected value/time definitions and lemmas.
Cslib/Algorithms/Lean/QuickSort/QuickSort.lean Adds randomized quicksort, correctness proofs, and worst/expected-time bounds.
Cslib/Algorithms/Lean/InsertionSort/InsertionSort.lean Adds stable insertion sort with correctness/stability and time bounds.
Cslib/Algorithms/Lean/CountingSort/CountingSort.lean Adds counting sort for with correctness/stability and exact time formula.
Cslib/Algorithms/Lean/BubbleSort/BubbleSort.lean Adds stable bubble sort with correctness/stability and time bounds.
Cslib.lean Re-exports the new sorting and randomized-time modules from the umbrella module.

pointwise worst-case theorem.
-/
theorem randomizedQuickSort_expectedTime_le_quadratic (xs : List α) :
RandomTimeM.expectedTime (randomizedQuickSort xs) ≤ (2 * xs.length * xs.length : ℕ) := by
Comment thread Cslib/Algorithms/Lean/RandomTimeM.lean Outdated
(base : ENNReal) + p.expectedTimeENNReal := by
rw [expectedTimeENNReal_bind]
simp only [expectedTimeENNReal, expectedNat_pure]
rw [← expectedNat]
⟨sortedPrefix.ret ++ [pass.ret.2], pass.time + sortedPrefix.time⟩
termination_by xs => xs.length
decreasing_by
simp_wf
Comment thread Cslib/Algorithms/Lean/RandomTimeM.lean Outdated
Comment on lines +156 to +173
/-- Expected running time is the weighted average of the time component of each result. -/
noncomputable def expectedTime (p : RandomTimeM ℕ α) : ℝ :=
expectedValue p fun result => result.time

/-- Expected running time as an extended nonnegative real. -/
noncomputable def expectedTimeENNReal (p : RandomTimeM ℕ α) : ENNReal :=
expectedNat p fun result => result.time

/-- The real and `ENNReal` expected-time views agree after converting back to `ℝ`. -/
theorem expectedTime_eq_toReal_expectedTimeENNReal (p : RandomTimeM ℕ α) :
p.expectedTime = p.expectedTimeENNReal.toReal := by
unfold expectedTime expectedTimeENNReal expectedValue expectedNat
rw [ENNReal.tsum_toReal_eq]
· simp [ENNReal.toReal_mul]
· intro result
exact ENNReal.mul_ne_top
(ne_of_lt ((p.coe_le_one result).trans_lt ENNReal.one_lt_top))
(by exact_mod_cast (WithTop.coe_ne_top : (result.time : ENNReal) ≠ ⊤))
Comment thread Cslib/Algorithms/Lean/Sorting.lean Outdated
namespace Cslib.Algorithms.Lean

/-- `ys` preserves the order of equal values from `xs`. -/
abbrev StableByValue {α : Type} [DecidableEq α] (xs ys : List α) : Prop :=
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants