Skip to content

[Feature #21264] Replace C extension with pure Ruby implementation for Ruby >= 3.3#155

Open
jinroq wants to merge 35 commits into
ruby:masterfrom
jinroq:replace_c_to_ruby
Open

[Feature #21264] Replace C extension with pure Ruby implementation for Ruby >= 3.3#155
jinroq wants to merge 35 commits into
ruby:masterfrom
jinroq:replace_c_to_ruby

Conversation

@jinroq

@jinroq jinroq commented Feb 15, 2026

Copy link
Copy Markdown

Summary

https://bugs.ruby-lang.org/issues/21264

Rewrite Date and DateTime C extension as pure Ruby, targeting Ruby 3.3+.
Ruby < 3.3 continues to use the existing C extension as before.

  • Ruby >= 3.3: Pure Ruby implementation (~7,900 lines across 10 files in lib/date/)
  • Ruby < 3.3: Existing C extension (ext/date/) compiled via rake-compiler

All 143 tests pass with 162,593 assertions on both paths.

Motivation

  • Improves portability: no C compiler required for Ruby 3.3+
  • Makes the codebase easier to read, debug, and contribute to
  • Enables Ractor compatibility without C-level thread safety concerns
  • Aligns with the broader Ruby ecosystem trend toward pure Ruby default gems

Architecture

The version branch (RUBY_VERSION >= "3.3") is applied at three layers:

Layer Ruby >= 3.3 Ruby < 3.3
lib/date.rb require_relative pure Ruby files require 'date_core' (C ext)
ext/date/extconf.rb Generates dummy Makefile (no-op) create_makefile('date_core')
Rakefile task :compile is a no-op Rake::ExtensionTask compiles C ext
C option Purpose Pure Ruby
USE_PACK Bit-pack mon/mday/hour/min/sec into a single integer for memory efficiency Not needed — uses standard instance variables (@nth, @jd, @df, @sf, @of, @sg)
TIGHT_PARSER Stricter Date._parse (disabled by default in C via /* #define TIGHT_PARSER */) Matches C default behavior (loose parser) — TIGHT_PARSER logic is not implemented

Timezone table auto-generation

lib/date/zonetab.rb is now auto-generated from ext/date/zonetab.list (the same source used by gperf to generate zonetab.h for the C extension). This
ensures the Ruby and C timezone tables stay in sync automatically.

Component Description
ext/date/generate-zonetab-rb Ruby script that reads zonetab.list and produces the Ruby hash table
ext/date/prereq.mk update-zonetab target now runs generate-zonetab-rb alongside gperf
.github/workflows/update.yml CI commits regenerated lib/date/zonetab.rb alongside ext/date changes

Pure Ruby file structure

File Description
lib/date/core.rb Date class (civil, ordinal, commercial, JD, arithmetic, comparison)
lib/date/parse.rb Date._parse, _iso8601, _rfc3339, _rfc2822, _xmlschema, _jisx0301
lib/date/datetime.rb DateTime subclass (hour, min, sec, offset)
lib/date/strptime.rb strptime parsing
lib/date/strftime.rb strftime formatting
lib/date/zonetab.rb Timezone offset table (auto-generated from zonetab.list)
lib/date/patterns.rb Regex patterns for parsing
lib/date/constants.rb Calendar reform constants (ITALY, ENGLAND, GREGORIAN, etc.)
lib/date/time.rb Date#to_time, Time#to_date, Time#to_datetime
lib/date/version.rb Date::VERSION

Performance note

DateTime is deprecated and no performance optimization has been done for it. The benchmark numbers for DateTime are provided for reference only.

Changes

  • Rakefile: Branch on RUBY_VERSION for compile/test task setup; test depends on compile for Ruby < 3.3
  • date.gemspec: Include both lib/**/*.rb and ext/date/* files; set extensions
  • ext/date/extconf.rb: Generate dummy Makefile on Ruby >= 3.3, build C ext otherwise
  • ext/date/generate-zonetab-rb (new): Script to auto-generate lib/date/zonetab.rb from ext/date/zonetab.list
  • ext/date/prereq.mk: Add generate-zonetab-rb to update-zonetab target
  • .github/workflows/update.yml: Include lib/date/zonetab.rb in auto-commit scope
  • lib/date.rb: Branch on RUBY_VERSION for require path; add require 'timeout' for parse timeout support
  • lib/date/*.rb (new): Pure Ruby implementation (10 files, ~7,900 lines)

Benchmark: C extension vs Pure Ruby (Ruby 4.0.1)

Date class methods
Method C ext (i/s) Pure Ruby (i/s) Ratio
Date.new 3,974,217 1,531,119 38.5%
Date.new(no args) 4,509,731 1,415,434 31.4%
Date.civil 4,631,684 2,137,457 46.1%
Date.civil(sg) 4,372,123 2,026,949 46.4%
Date.civil(-1) 4,587,855 1,872,146 40.8%
Date.civil(neg) 4,477,827 2,084,786 46.6%
Date.jd 4,807,135 3,973,489 82.7%
Date.ordinal 2,982,925 2,208,698 74.0%
Date.commercial 2,476,732 1,799,799 72.7%
Date.today 175,435 443,836 253.0%
Date.valid_civil? 10,746,810 1,513,351 14.1%
Date.valid_civil?(false) 10,823,626 2,051,868 19.0%
Date.valid_ordinal? 4,160,590 1,625,978 39.1%
Date.valid_commercial? 3,178,432 580,553 18.3%
Date.valid_jd? 16,512,281 12,386,993 75.0%
Date.gregorian_leap? 14,950,071 8,380,877 56.1%
Date.gregorian_leap?(1900) 14,080,523 8,051,206 57.2%
Date.julian_leap? 16,991,200 10,169,182 59.8%
Date parse methods
Method C ext (i/s) Pure Ruby (i/s) Ratio
Date._parse(iso) 239,001 1,506,581 630.4%
Date._parse(us) 118,865 644,263 542.0%
Date._parse(eu) 159,928 556,533 348.0%
Date._parse(rfc2822) 82,441 264,068 320.3%
Date.parse(iso) 212,655 720,361 338.7%
Date.parse(us) 112,487 411,137 365.5%
Date.parse(eu) 148,339 373,372 251.7%
Date.parse(compact) 133,285 731,634 548.9%
Date._strptime 2,737,334 1,412,082 51.6%
Date.strptime 1,390,385 943,652 67.9%
Date.strptime(complex) 1,116,618 373,480 33.4%
Date._iso8601 750,095 1,613,186 215.1%
Date._rfc3339 501,479 664,953 132.6%
Date._rfc2822 373,428 613,062 164.2%
Date._xmlschema 787,848 1,595,289 202.5%
Date._httpdate 397,646 688,085 173.0%
Date._jisx0301 733,059 1,425,575 194.5%
Date.iso8601 547,109 731,563 133.7%
Date.rfc3339 414,614 427,392 103.1%
Date.rfc2822 310,970 412,405 132.6%
Date.xmlschema 574,804 732,143 127.4%
Date.httpdate 302,846 439,218 145.0%
Date.jisx0301 540,940 687,365 127.1%
Date instance methods
Method C ext (i/s) Pure Ruby (i/s) Ratio
Date#year 20,593,327 15,021,287 72.9%
Date#month 21,446,329 15,000,410 69.9%
Date#day 18,289,725 14,989,635 82.0%
Date#wday 19,708,683 15,560,398 79.0%
Date#yday 15,578,182 18,117,325 116.3%
Date#jd 18,533,155 20,384,518 110.0%
Date#ajd 7,244,720 5,030,639 69.4%
Date#mjd 11,192,998 19,249,248 172.0%
Date#amjd 9,935,512 2,221,272 22.4%
Date#ld 11,415,318 19,199,854 168.2%
Date#start 20,057,435 20,487,414 102.1%
Date#cwyear 4,475,221 18,366,865 410.4%
Date#cweek 4,523,128 18,683,158 413.1%
Date#cwday 20,551,980 11,273,834 54.9%
Date#leap? 18,147,973 9,743,760 53.7%
Date#julian? 17,546,611 10,199,752 58.1%
Date#gregorian? 17,932,339 13,141,146 73.3%
Date#sunday? 20,750,735 12,003,952 57.8%
Date#monday? 20,790,772 11,854,532 57.0%
Date#saturday? 20,400,215 11,924,834 58.5%
Date arithmetic / comparison
Method C ext (i/s) Pure Ruby (i/s) Ratio
Date#+1 5,922,467 3,491,742 59.0%
Date#+100 5,719,971 3,531,438 61.7%
Date#-1 4,050,015 3,223,107 79.6%
Date#-Date 1,900,127 2,379,705 125.2%
Date#>>1 3,133,030 1,989,987 63.5%
Date#>>12 3,018,894 1,938,665 64.2%
Date#<<1 2,206,690 1,877,250 85.1%
Date#next_day 4,710,869 3,260,116 69.2%
Date#prev_day 4,041,438 3,046,170 75.4%
Date#next_month 2,742,850 1,900,557 69.3%
Date#prev_month 2,102,104 1,826,171 86.9%
Date#next_year 2,629,005 1,749,549 66.5%
Date#prev_year 1,836,359 1,687,470 91.9%
Date#succ 5,421,671 3,270,318 60.3%
Date#<=> 11,416,636 7,788,569 68.2%
Date#=== 12,411,827 7,174,659 57.8%
Date#== 2,702,128 9,236,888 341.8%
Date#< 7,324,156 7,443,056 101.6%
Date#> 7,559,944 7,485,990 99.0%
Date#eql? 10,519,187 10,476,624 99.6%
Date#hash 13,768,431 10,380,449 75.4%
Date iteration / formatting / conversion
Method C ext (i/s) Pure Ruby (i/s) Ratio
Date#upto(+30) 140,859 163,117 115.8%
Date#downto(-30) 114,805 161,888 141.0%
Date#step(+30,7) 711,797 834,724 117.3%
Date#to_s 3,751,091 4,260,907 113.6%
Date#inspect 560,483 1,357,800 242.3%
Date#asctime 2,444,606 1,676,951 68.6%
Date#strftime 2,704,672 3,867,297 143.0%
Date#strftime(%Y-%m-%d) 3,051,536 3,252,162 106.6%
Date#strftime(%A %B) 2,911,007 1,725,704 59.3%
Date#strftime(%c) 1,963,184 1,522,719 77.6%
Date#strftime(%x) 2,585,481 2,066,510 79.9%
Date#strftime(composite) 1,581,662 1,880,930 118.9%
Date#iso8601 3,871,147 3,840,760 99.2%
Date#rfc3339 1,787,396 2,382,235 133.3%
Date#rfc2822 1,939,882 1,720,221 88.7%
Date#xmlschema 3,936,023 3,722,464 94.6%
Date#httpdate 1,601,616 1,925,300 120.2%
Date#jisx0301 2,813,683 2,420,539 86.0%
Date#to_date 22,264,202 21,011,318 94.4%
Date#to_datetime 6,826,666 349,452 5.1%
Date#to_time 1,908,133 1,452,386 76.1%
Date#new_start 5,596,668 4,097,288 73.2%
Date#julian 5,827,327 3,818,120 65.5%
Date#gregorian 5,857,213 3,832,377 65.4%
Date#italy 5,237,481 3,860,512 73.7%
Date#england 5,504,857 3,846,989 69.9%
Date Marshal.dump 524,047 586,972 112.0%
Date Marshal.load 562,881 607,496 107.9%
Date#deconstruct_keys(nil) 3,478,812 3,878,040 111.5%
Date#deconstruct_keys(year) 5,298,346 4,469,402 84.4%
Date#deconstruct_keys(y/m/d) 3,919,147 1,841,221 47.0%
DateTime class methods (deprecated — not optimized)
Method C ext (i/s) Pure Ruby (i/s) Ratio
DateTime.civil 1,875,912 351,426 18.7%
DateTime.jd 1,808,918 637,926 35.3%
DateTime.ordinal 1,526,460 601,551 39.4%
DateTime.commercial 1,358,985 352,835 26.0%
DateTime.now 138,945 314,764 226.5%
DateTime.parse(iso) 86,155 35,336 41.0%
DateTime.parse(rfc2822) 75,692 137,568 181.7%
DateTime.strptime 390,524 94,598 24.2%
DateTime.iso8601 332,657 142,625 42.9%
DateTime.rfc3339 360,583 226,296 62.8%
DateTime.rfc2822 284,958 201,992 70.9%
DateTime.xmlschema 354,039 155,424 43.9%
DateTime.httpdate 293,315 214,761 73.2%
DateTime.jisx0301 347,867 150,365 43.2%
DateTime instance methods (deprecated — not optimized)
Method C ext (i/s) Pure Ruby (i/s) Ratio
DateTime#year 20,098,127 15,042,689 74.8%
DateTime#month 21,237,840 16,111,993 75.9%
DateTime#day 18,548,994 16,062,127 86.6%
DateTime#hour 21,333,617 17,826,341 83.6%
DateTime#min 18,521,195 17,755,413 95.9%
DateTime#sec 21,045,565 17,879,183 85.0%
DateTime#sec_fraction 10,109,921 17,794,083 176.0%
DateTime#offset 9,615,641 6,137,640 63.8%
DateTime#zone 3,181,535 1,334,388 41.9%
DateTime#wday 20,440,568 16,914,214 82.7%
DateTime#yday 14,260,455 17,826,177 125.0%
DateTime#jd 19,268,548 20,221,226 104.9%
DateTime#ajd 2,085,063 886,014 42.5%
DateTime#+1 5,377,767 1,662,382 30.9%
DateTime#+frac 281,200 477,164 169.7%
DateTime#-1 3,983,009 1,594,373 40.0%
DateTime#-DT 1,316,729 396,869 30.1%
DateTime#>>1 2,909,249 1,935,724 66.5%
DateTime#<<1 2,128,352 1,829,635 86.0%
DateTime#next_day 5,325,907 1,598,982 30.0%
DateTime#prev_day 3,787,627 1,527,316 40.3%
DateTime#next_month 2,838,329 1,825,383 64.3%
DateTime#prev_month 2,096,188 1,726,255 82.4%
DateTime#next_year 2,627,272 1,701,474 64.8%
DateTime#prev_year 1,963,389 1,616,029 82.3%
DateTime#<=> 12,096,597 7,730,511 63.9%
DateTime#=== 11,192,664 6,077,881 54.3%
DateTime#== 2,707,665 9,205,819 340.0%
DateTime#eql? 10,865,195 10,341,655 95.2%
DateTime#hash 13,709,136 7,672,208 56.0%
DateTime formatting / conversion (deprecated — not optimized)
Method C ext (i/s) Pure Ruby (i/s) Ratio
DateTime#to_s 1,892,182 212,119 11.2%
DateTime#inspect 468,910 189,554 40.4%
DateTime#strftime 1,546,127 298,049 19.3%
DateTime#strftime(%Y%m%d%z) 1,680,875 1,006,879 59.9%
DateTime#strftime(%c) 2,063,165 1,035,746 50.2%
DateTime#strftime(%s) 3,244,980 999,168 30.8%
DateTime#iso8601 1,414,142 208,727 14.8%
DateTime#rfc3339 1,308,541 210,804 16.1%
DateTime#rfc2822 1,780,011 1,200,135 67.4%
DateTime#xmlschema 1,415,768 214,005 15.1%
DateTime#httpdate 1,656,273 1,279,407 77.2%
DateTime#jisx0301 1,212,199 467,964 38.6%
DateTime#new_offset(0) 5,246,751 1,157,741 22.1%
DateTime#new_offset(str) 3,408,551 487,457 14.3%
DateTime#new_offset(rat) 2,033,443 1,151,965 56.7%
DateTime#to_date 6,530,329 3,094,153 47.4%
DateTime#to_datetime 22,361,492 21,171,076 94.7%
DateTime#to_time 917,658 680,739 74.2%
DateTime Marshal.dump 532,858 361,435 67.8%
DateTime Marshal.load 551,158 336,273 61.0%
DateTime#deconstruct_keys(nil) 879,547 721,487 82.0%
DateTime#deconstruct_keys(y/h) 4,462,768 2,341,063 52.5%
Time conversion
Method C ext (i/s) Pure Ruby (i/s) Ratio
Time#to_date 3,891,482 1,677,908 43.1%
Time#to_datetime 1,245,844 1,049,600 84.2%

C implementation has been rewritten as faithfully as possible in pure Ruby.

[Feature #21264]

https://bugs.ruby-lang.org/issues/21264
@jinroq jinroq changed the title Replace C extension with pure Ruby implementation for Ruby >= 3.3 [Feature #21264] Replace C extension with pure Ruby implementation for Ruby >= 3.3 Feb 15, 2026
@jeremyevans

Copy link
Copy Markdown
Contributor

Date was originally written in Ruby prior to Ruby 1.9.3. It was rewritten in C to significantly increase performance. When Date was written in Ruby, it's low performance made it a common bottleneck in Ruby applications. I think for this to be considered, you need to provide comprehensive benchmarks showing that performance does not decrease significantly.

Comment thread lib/date/constants.rb Outdated

MONTHNAMES = [nil, "January", "February", "March", "April", "May", "June",
"July", "August", "September", "October", "November", "December"]
.map { |s| s&.encode(Encoding::US_ASCII)&.freeze }.freeze

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put # encoding: US-ASCII at the beginning.

@nobu

nobu commented Feb 15, 2026

Copy link
Copy Markdown
Member

A simple benchmark to just create objects:

require 'benchmark'
require 'date'

N = 10000
Benchmark.bm do |bm|
  bm.report("Time") {N.times {Time.now}}
  bm.report("Date") {N.times {Date.today}}
end

With ruby 4.1.0dev (2026-02-14T07:03:18Z master 2065b55980) +PRISM [arm64-darwin25], and master:

$ ruby -I./lib bench.rb
          user     system      total        real
Time  0.001656   0.000023   0.001679 (  0.001675)
Date  0.002735   0.000062   0.002797 (  0.002827)

This PR:

$ ruby -I./lib bench.rb
          user     system      total        real
Time  0.001018   0.000013   0.001031 (  0.001031)
Date  0.007624   0.000151   0.007775 (  0.007776)

Interestingly, this PR makes Time.now faster.

@jeremyevans

Copy link
Copy Markdown
Contributor

@nobu you should probably benchmark with benchmark-driver or benchmark-ips. With a runtime of only ~1ms, it's hard to get statistically valid results. Considering I don't think date modifies the implementation of Time.now, it seems unlikely there would be an significant performance difference.

A benchmark should include most of the methods in the library. When I was working on home_run, I had a set of comprehensive benchmarks to see the differences in performance compared to the (at the time) Ruby implementation. It included a decent set of benchmarks (https://github.com/jeremyevans/home_run/blob/master/bench/cpu_bench.rb), though I would certainly switch the backend to use benchmark-driver or benchmark-ips for this.

@nobu

nobu commented Feb 15, 2026

Copy link
Copy Markdown
Member

For the mean time, just tried Benchmark.ips.

master:

Warming up --------------------------------------
            Time.now   206.000 i/100ms
          Date.today    46.000 i/100ms
Calculating -------------------------------------
            Time.now      2.096k (± 0.1%) i/s  (477.11 μs/i) -     10.506k in   5.012541s
          Date.today    459.375 (± 0.7%) i/s    (2.18 ms/i) -      2.300k in   5.006967s

This PR:

Warming up --------------------------------------
            Time.now    206.000 i/100ms
          Date.today    16.000 i/100ms
Calculating -------------------------------------
            Time.now      2.143k (± 0.6%) i/s  (466.72 μs/i) -     10.918k in   5.095787s
          Date.today    166.713 (± 0.0%) i/s    (6.00 ms/i) -    848.000 in   5.086612s

Agree there seems to be a lot of room for optimization.
The current extension is line-by-line translation from Ruby to C and not optimized for C.
This PR looks also line-by-line in reverse and doubly non-optimal.

@jeremyevans

Copy link
Copy Markdown
Contributor

The current extension is line-by-line translation from Ruby to C and not optimized for C.

I don't believe the line-by-line translation part is 100% accurate, though it may be true for large portions of the library. The primary implementation difference between the current C implementation and the previous (pre Ruby 1.9.3) Ruby implementation was that the previous Ruby implementation always eagerly converted from whatever the input format was to ajd (e.g. https://github.com/ruby/ruby/blob/ruby_1_9_2/lib/date.rb#L1621-L1629). That's the primary reason it was so slow. home_run pioneered the idea of not converting eagerly to ajd, only doing the conversion later when it was actually needed. That same basic approach was used by tadf when he rewrote date from Ruby to C. See https://bugs.ruby-lang.org/issues/4068 for background on that change.

I think we'd be willing to accept a small performance decrease to switch the C implementation with a Ruby implementation. However, a ~3x performance decrease is way too much to consider switching, IMO. As I mentioned earlier, Date was often a bottleneck in application code before Ruby 1.9.3, that's the reason I worked on home_run. So performance should be a primary consideration when deciding whether to switch to an alternative implementation.

| Implementation | i/s | μs/i |
| :--- | :--- | :--- |
| System (C ext) | 347.5k | 2.88 |
| Pre-optimization (pure Ruby) | 313.5k | 3.19 |
| Post-optimization (pure Ruby) | 380.0k | 2.63 |

| Implementation | i/s | μs/i |
| :--- | :--- | :--- |
| System (C ext) | 4.32M | 0.23 |
| Pre-optimization (pure Ruby) | 312k | 3.20 |
| Post-optimization (pure Ruby) | 1.67M | 0.60 |

**5.4x speedup** (312k → 1.67M i/s). Reached approximately **39%** of the C extension's performance.

| Implementation | i/s |
| :--- | :--- |
| System (C ext) | 4.50M |
| Pre-optimization (pure Ruby) | 311k |
| Post-optimization (pure Ruby) | 1.63M |

For cases where the fast path is not applicable (e.g., Julian calendar or BCE years), performance remains equivalent to the previous implementation (no changes).

The fast path is applied when all of the following conditions are met:

1. `year`, `month`, and `day` are all `Integer`.
2. The date is determined to be strictly Gregorian (e.g., `start` is `GREGORIAN`, or a reform date like `ITALY` with `year > 1930`).

By satisfying these conditions, the implementation skips six `self.class.send` calls, `Hash` allocations, redundant `decode_year` calls, and repetitive array generation.

| Implementation | i/s |
| :--- | :--- |
| System (C ext) | 9.58M |
| Pre-optimization (pure Ruby) | 458k |
| Post-optimization (pure Ruby) | 2.51M |

**5.5x speedup** (458k → 2.51M i/s). Reached approximately **26%** of the C extension's performance.

| Implementation | i/s |
| :--- | :--- |
| System (C ext) | 9.59M |
| Pre-optimization (pure Ruby) | 574k |
| Post-optimization (pure Ruby) | 2.53M |

**4.4x speedup.**

1. **Added a Fast Path** — For `Integer` arguments and Gregorian calendar cases, the entire method chain of `numeric?` (called 3 times) and `valid_civil_sub` is skipped. Instead, month and day range checks are performed inline.
2. **Eliminated Repeated Array Allocation in `valid_civil_sub`** — Changed the implementation to reference a `MONTH_DAYS` constant instead of creating a new array `[nil, 31, 28, ...]` on every call.

| Case | System (C ext) | Pre-optimization | Post-optimization |
| :--- | :--- | :--- | :--- |
| Date.jd | 4.12M | 462k | 1.18M |
| Date.jd(0) | 4.20M | 467k | 1.19M |
| Date.jd(JULIAN) | 4.09M | 468k | 1.22M |
| Date.jd(GREG) | 4.07M | 467k | 1.21M |

**Approximately 2.6x speedup** (462k → 1.18M i/s). Reached approximately **29%** of the C extension's performance.

The fast path is effective across all `start` patterns (`ITALY` / `JULIAN` / `GREGORIAN`). The following processes are now skipped:

- `valid_sg` + `c_valid_start_p` (numerous type checks)
- `value_trunc` (array allocation for `Integer`)
- `decode_jd` (array allocation for standard Julian Days)
- `d_simple_new_internal` (`canon` + flag operations + method call overhead)

| Case | System (C ext) | Pre-optimization | Post-optimization | Improvement |
| :--- | :--- | :--- | :--- | :--- |
| Date.ordinal | 2.66M | 170k | 645k | 3.8x |
| Date.ordinal(-1) | 1.87M | 119k | 639k | 5.4x |
| Date.ordinal(neg) | 3.08M | 107k | 106k | (Slow path) |

**3.8x to 5.4x speedup** in cases where the fast path is applicable. Reached approximately **24% to 34%** of the C extension's performance.

`Date.ordinal(neg)` remains on the slow path (equivalent to previous performance) because the year -4712 does not meet the fast path condition (`year > REFORM_END_YEAR`).

| Case | System (C ext) | Pre-optimization | Post-optimization | Improvement |
| :--- | :--- | :--- | :--- | :--- |
| Date.commercial | 2.18M | 126k | 574k | 4.5x |
| Date.commercial(-1) | 1.45M | 85k | 560k | 6.6x |
| Date.commercial(neg) | 2.84M | 93k | 90k | (Slow path) |

**4.5x to 6.6x speedup** in cases where the fast path is applicable. Reached approximately **26% to 39%** of the C extension's performance.

Inlined the ISO week-to-JD conversion:

1. Obtain the JD for Jan 1 using `c_gregorian_civil_to_jd(year, 1, 1)` (requires only one method call).
2. Directly calculate `max_weeks` (52 or 53) from the ISO weekday to perform a week range check.
3. Calculate the Monday of Week 1 using: `base = (jd_jan1 + 3) - ((jd_jan1 + 3) % 7)`.
4. Directly calculate the JD using: `rjd = base + 7*(week-1) + (day-1)`.

This bypasses the entire previous chain of `valid_commercial_p` → `c_valid_commercial_p` → `c_commercial_to_jd` → `c_jd_to_commercial` (verification via inverse conversion).

| Case | System (C ext) | Pre-optimization | Post-optimization | Improvement |
| :--- | :--- | :--- | :--- | :--- |
| valid_ordinal? (true) | 3.76M | 221k | 3.38M | 15.3x |
| valid_ordinal? (false) | 3.77M | 250k | 3.39M | 13.6x |
| valid_ordinal? (-1) | 2.37M | 148k | 2.67M | 18.0x |

**15x to 18x speedup.** Performance reached **90% to 112%** of the C extension, making it nearly equivalent or even slightly faster.

Since `valid_ordinal?` does not require object instantiation and only involves leap year determination and day-of-year range checks, the inline cost of the fast path is extremely low, allowing it to rival the performance of the C extension.

| Case | System (C ext) | Pre-optimization | Post-optimization | Improvement |
| :--- | :--- | :--- | :--- | :--- |
| valid_commercial? (true) | 2.94M | 167k | 1.09M | 6.5x |
| valid_commercial? (false) | 3.56M | 218k | 1.08M | 5.0x |
| valid_commercial? (-1) | 1.79M | 104k | 1.07M | 10.3x |

**5x to 10x speedup.** Performance reached approximately **30% to 37%** of the C extension.

The same ISO week validation logic used in the `Date.commercial` fast path (calculating `max_weeks` from the JD of Jan 1 and performing `cwday`/`cweek` range checks) has been inlined. The reason it does not rival the C extension as closely as `valid_ordinal?` is due to the remaining overhead of a single method call to `c_gregorian_civil_to_jd(year, 1, 1)`.

| Method | i/s |
| :--- | :--- |
| Date.valid_jd? | 9.29M |
| Date.valid_jd?(false) | 9.68M |

It is approximately **3.3x faster** compared to the C extension benchmarks (Reference values: 2.93M / 2.80M). The simplification to only perform type checks has had a significant impact on performance.

| Method | Pre-optimization | Post-optimization | Improvement |
| :--- | :--- | :--- | :--- |
| Date.gregorian_leap?(2000) | 1.40M | 7.39M | 5.3x |
| Date.gregorian_leap?(1900) | 1.39M | 7.48M | 5.4x |

It is approximately **4.5x faster** even when compared to the C extension reference values (1.69M / 1.66M).

For `Integer` arguments, the implementation now performs the leap year determination inline, skipping three method calls: the `numeric?` check, `decode_year`, and `c_gregorian_leap_p?`. Non-`Integer` arguments (such as `Rational`) will fall back to the conventional path.

| Method | Pre-optimization | Post-optimization | Improvement |
| :--- | :--- | :--- | :--- |
| Date.julian_leap? | 2.27M | 8.98M | 4.0x |

It is approximately **3.2x faster** even when compared to the C extension reference value (2.80M).

For `Integer` arguments, the implementation now skips calls to `numeric?`, `decode_year`, and `c_julian_leap_p?`, returning the result directly via an inline `year % 4 == 0` check.

| Method | Pre-optimization | Post-optimization | Improvement |
| :--- | :--- | :--- | :--- |
| Date#year | 3.27M | 10.06M | 3.1x |

It is approximately **2.8x faster** even when compared to the C extension reference value (3.65M).

In cases where `@nth == 0 && @has_civil` (which covers almost all typical use cases), the implementation now skips the `m_year` → `simple_dat_p?` → `get_s_civil` method chain as well as `self.class.send(:f_zero_p?, nth)`, returning `@year` directly.

Add early return in `m_mon` when `@has_civil` is already true,
skipping `simple_dat_p?` check and `get_s_civil`/`get_c_civil`
method call overhead. Same pattern as `m_real_year`.

Benchmark results (Ruby 4.0.1, benchmark-ips):

  Date#month:     C 21,314,867 ips -> Ruby 14,302,144 ips (67.1%)
  DateTime#month: C 20,843,168 ips -> Ruby 14,113,170 ips (67.7%)

Add early return in `m_mday` when `@has_civil` is already true,
skipping `simple_dat_p?` check and `get_s_civil`/`get_c_civil`
method call overhead. Same pattern as `m_real_year` and `m_mon`.

Benchmark results (Ruby 4.0.1, benchmark-ips):

  Date#day:     C 18,415,779 ips -> Ruby 14,248,797 ips (77.4%)
  DateTime#day: C 18,758,870 ips -> Ruby 13,750,236 ips (73.3%)

Add early return in `m_wday` when `@has_jd` is true and `@of` is nil
(simple Date), inlining `(@jd + 1) % 7` directly. This skips
`m_local_jd`, `get_s_jd`, `c_jd_to_wday` method call overhead.

Benchmark results (Ruby 4.0.1, benchmark-ips):

  Date#wday:     C 20,923,653 ips -> Ruby 11,174,133 ips (53.4%)
  DateTime#wday: C 20,234,376 ips -> Ruby  3,721,404 ips (18.4%)

Note: DateTime#wday is not covered by this fast path since it
requires offset-aware local JD calculation.

Add fast path in `m_yday` for simple Date (`@of.nil?`) with
`@has_civil` already computed. When the calendar is proleptic
Gregorian or the date is well past the reform period, compute
yday directly via `YEARTAB[month] + day`, skipping `m_local_jd`,
`m_virtual_sg`, `m_year`, `m_mon`, `m_mday`, and other method
call overhead.

Benchmark results (Ruby 4.0.1, benchmark-ips):

  Date#yday:     C 16,253,269 ips -> Ruby 1,942,757 ips (12.0%)
  DateTime#yday: C 14,927,308 ips -> Ruby   851,319 ips ( 5.7%)

Note: DateTime#yday is not covered by this fast path since it
requires offset-aware local JD calculation.

Multiple optimizations to `Date#+` and its object creation path:

1. Eliminate `instance_variable_set` in `new_with_jd_and_time`:
   Replace 10 `instance_variable_set` calls with a protected
   `_init_with_jd` method using direct `@var =` assignment.
   Benefits all callers (Date#+, Date#-, Date#>>, DateTime#+, etc).

2. Avoid `self.class.send` overhead in `Date#+`:
   Replace `self.class.send(:new_with_jd, ...)` chain with direct
   `self.class.allocate` + `obj._init_with_jd(...)` (protected call).

3. Eager JD computation in `Date.civil` fast path:
   Compute JD via Neri-Schneider algorithm in `initialize` instead
   of deferring. Ensures `@has_jd = true` from creation, so `Date#+`
   always takes the fast `@has_jd` path.

4. Add `_init_simple_with_jd` with only 4 ivar assignments:
   For simple Date fast path, skip 7 nil assignments that `allocate`
   already provides as undefined (returns nil).

5. Fix fast path condition to handle `@has_civil` without `@has_jd`:
   When only civil data is available, compute JD inline via
   Neri-Schneider before addition.

Benchmark results (Ruby 4.0.1, benchmark-ips):

  Date#+1:   C 5,961,579 ips -> Ruby 3,150,254 ips (52.8%)
  Date#+100: C 6,054,311 ips -> Ruby 3,088,684 ips (51.0%)
  Date#-1:   C 4,077,013 ips -> Ruby 2,488,817 ips (61.0%)

  Date#+1 progression:
    Before:                 1,065,416 ips (17.9% of C)
    After ivar_set removal: 1,972,000 ips (33.1% of C)
    After send avoidance:   2,691,799 ips (45.2% of C)
    After eager JD + 4-ivar init: 3,150,254 ips (52.8% of C)

Date#-1: C 4,077,013 ips -> Ruby 2,863,047 ips (70.2%)
Date#-1 progression:
  Before:                    989,991 ips (24.3% of C)
  After Date#+ optimization: 2,488,817 ips (61.0% of C)
  After Date#- fast path:    2,863,047 ips (70.2% of C)

Date#<<1: C 2,214,936 ips -> Ruby 1,632,773 ips (73.7%)
Date#<<1 progression:
  Before:                       205,555 ips ( 9.3% of C)
  After Date#>> optimization:  1,574,551 ips (71.1% of C)
  After direct fast path:      1,632,773 ips (73.7% of C)

- Ruby version: 4.0 (Docker)
- C baseline: bench/results/20260215/4.0.1_system.tsv
- Tool: benchmark-ips

 ┌──────────────┬─────────┬────────────┬─────────┐
 │  Benchmark   │ C (ips) │ Ruby (ips) │ Ruby/C  │
 ├──────────────┼─────────┼────────────┼─────────┤
 │ Date#<<1     │  2.21 M │     1.62 M │  1/1.4x │
 ├──────────────┼─────────┼────────────┼─────────┤
 │ DateTime#<<1 │  2.13 M │   177.53 K │ 1/12.0x │
 └──────────────┴─────────┴────────────┴─────────┘

Changes: Replaced the slow path of Date#<< which delegated to self >> (-n) with   an inlined version of Date#>>'s slow path logic. This eliminates the extra method call, sign negation, and redundant condition checks.

- Date#<< (Date only): reaches 71% of C performance
- DateTime#<< (with offset): remains at 1/12x due to the slow path being exercised more heavily

- Ruby version: 4.0 (Docker)
- C baseline: bench/results/20260215/4.0.1_system.tsv
- Tool: benchmark-ips

 ┌──────────────┬─────────┬───────────────────┬──────────────────┬─────────┐
 │  Benchmark   │ C (ips) │ Ruby before (ips) │ Ruby after (ips) │ after/C │
 ├──────────────┼─────────┼───────────────────┼──────────────────┼─────────┤
 │ Date#<=>     │ 11.84 M │          635.23 K │           2.99 M │  1/4.0x │
 ├──────────────┼─────────┼───────────────────┼──────────────────┼─────────┤
 │ DateTime#<=> │ 12.24 M │          622.88 K │         577.00 K │ 1/21.2x │
 └──────────────┴─────────┴───────────────────┴──────────────────┴─────────┘

Changes: Added a fast path to `Date#<=>` for the common case where both objects are simple Date instances (`@df`, `@sf`, `@of` are all `nil`) with `@nth == 0` and `@has_jd` set. In this case, the comparison reduces to a direct `@jd <=> other.@jd` integer comparison, eliminating two `m_canonicalize_jd` calls (each of which allocates a `[nth, jd]` array via `canonicalize_jd`), redundant `simple_dat_p?` checks, and chained accessor calls for `m_nth`, `m_jd`, `m_df`, and `m_sf`.

- `Date#<=>` (Date only): 4.7x improvement over pre-optimization Ruby, reaches 75% of C performance
- `DateTime#<=>` (with offset): unaffected — falls through to the existing slow path

Benchmark: Date#== optimization (pure Ruby vs C)

- Ruby version: 4.0 (Docker)
- C baseline: bench/results/20260215/4.0.1_system.tsv
- Tool: benchmark-ips

┌─────────────┬─────────┬───────────────────┬──────────────────┬─────────┐
│  Benchmark  │ C (ips) │ Ruby before (ips) │ Ruby after (ips) │ after/C │
├─────────────┼─────────┼───────────────────┼──────────────────┼─────────┤
│ Date#==     │  2.78 M │          875.47 K │           3.24 M │   1.17x │
├─────────────┼─────────┼───────────────────┼──────────────────┼─────────┤
│ DateTime#== │  2.72 M │          798.68 K │         924.96 K │  1/2.9x │
└─────────────┴─────────┴───────────────────┴──────────────────┴─────────┘

Changes: Added a fast path to `Date#==` for the common case where both objects are simple Date instances (`@df`, `@sf`, `@of` are all `nil`) with `@nth == 0` and `@has_jd` set. In this case, equality reduces to a direct `@jd == other.@jd` integer comparison. This eliminates two `m_canonicalize_jd` calls (each allocating a `[nth, jd]` array via `canonicalize_jd`), redundant `simple_dat_p?` checks, and chained accessor calls for `m_nth`, `m_jd`, `m_df`, and `m_sf`.

- `Date#==` (Date only): 3.7x improvement over pre-optimization Ruby, 17% faster than C
- `DateTime#==` (with offset): unaffected — falls through to the existing slow path

Add fast paths that skip `m_canonicalize_jd` (which allocates an array) for the common case: both objects are simple (`@df`, `@sf`, `@of` are all `nil`), `@nth == 0`, `@has_jd` is true, and `0 <= @jd < CM_PERIOD` (guaranteeing that canonicalization is a no-op).
For `Date#===`, whether the two dates are on the same calendar or not, the result always reduces to `@jd == other.@jd` under these conditions, so the `m_gregorian_p?` check and both `m_canonicalize_jd` calls are eliminated.

For `Date#hash`, the same bounds guarantee that `m_nth == 0` and `m_jd == @jd` after canonicalization, so `[0, @jd, @sg].hash` is returned directly.

| Method      | Before      | After        | Speedup | C impl       |
|-------------|-------------|--------------|---------|--------------|
| `Date#===`  | ~558K ips   | ~2,940K ips  | +5.3x   | ~12,659K ips |
| `Date#hash` | ~1,990K ips | ~6,873K ips  | +3.5x   | ~13,833K ips |

feat: Optimized `Date#<`.

Add an explicit `Date#<` method with a fast path that bypasses the `Comparable` module overhead. When both objects are simple (`@df`, `@sf`, `@of` are all `nil`), `@nth == 0`, and `@has_jd` is true, `@jd < other.@jd` is returned directly without going through `<=>`. The slow path delegates to `super` (Comparable) to preserve all edge-case behavior including `ArgumentError` for incomparable types.

| Method   | Before      | After       | Speedup | C impl      |
|----------|-------------|-------------|---------|-------------|
| `Date#<` | ~2,430K ips | ~3,330K ips | +37%    | ~7,628K ips |

Add an explicit `Date#>` method with a fast path that bypasses the `Comparable` module overhead. When both objects are simple (`@df`, `@sf`, `@of` are all `nil`), `@nth == 0`, and `@has_jd` is true, `@jd > other.@jd` is returned directly without going through `<=>`. The slow path delegates to `super` (Comparable) to preserve all edge-case behavior including `ArgumentError` for incomparable types.

| Method   | Before      | After       | Speedup | C impl      |
|----------|-------------|-------------|---------|-------------|
| `Date#>` | ~2,560K ips | ~3,330K ips | +30%    | ~7,682K ips |

@nobu nobu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share the benchmark?

As for zonetab.rb, it should be updated like as zonetab.h is automatically updated weekly with ext/date/update-abbr.

Comment thread lib/date/parse.rb Outdated
(#{ABBR_MONTHS_PATTERN})\s+
(-?\d{4})\s+
(\d{2}):(\d{2}):(\d{2})\s+
(gmt)\s*\z/ix

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(gmt)\s*\z/ix
(gmt)\s*\z/ixo

Comment thread lib/date/strftime.rb Outdated
Comment on lines +23 to +36
# What to do if format string contains a "\0".
if format.include?("\0")
result = String.new
parts = format.split("\0", -1)

parts.each_with_index do |part, i|
result << strftime_format(part) unless part.empty?
result << "\0" if i < parts.length - 1
end

result.force_encoding(format.encoding)

return result
end

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting by \0 comes from the restriction of strftime in C.
It should be unnecessary in Ruby.

Comment thread lib/date/strftime.rb Outdated
Comment on lines +176 to +179
result = String.new
i = 0

while i < format.length

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks too direct C.
format.gsub may be faster, I guess.

    result = format.gsub(/(?:%%)+|%([-_^\#0]*)([1-9]\d*)?(?:E[cCxXyY]|O[deHkIlmMSuUVwWy]|[YCymBbhdejHkIlMSLNPpAawuUWVGgZsQntFDxTXRrcv+]|(:{0,3})z)/) do |fmt|
      flags, width, spec, colons = $~.captures
      next fmt[0, fmt.length/2] unless spec # Squeeze '%%' -> '%'
      spec = spec[-1] if spec.length > 1  # Ignore E / O modifiers

Comment thread lib/date/strftime.rb Outdated

# Width specifier overflow check
unless width.empty?
if width.length > 10 || (width.length == 10 && width > '2147483647')

@nobu nobu Feb 23, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually width is compared with 1024, why need to compare with '2147483647'?

Suggested change
if width.length > 10 || (width.length == 10 && width > '2147483647')
if width.length > 4

Comment thread lib/date/strftime.rb Outdated
Comment on lines +354 to +356
sprintf("%#{prec}d", y)
else
sprintf("%0#{prec}d", y)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ruby's sprintf inherits * flag from C, you don't have to create a new format string each time.

Suggested change
sprintf("%#{prec}d", y)
else
sprintf("%0#{prec}d", y)
sprintf("%*d", prec, y)
else
sprintf("%0*d", prec, y)

feat: Optimize Date#strftime with lookup tables, fast paths, and integer flags

Summary of changes:

1. Replace string-based flag accumulation with integer bitmask
     - Introduced FLAG_MINUS, FLAG_SPACE, FLAG_UPPER, FLAG_CHCASE, FLAG_ZERO
     - Eliminates per-call String allocation for format modifier parsing
2. Add fast path in strftime_format for simple Date objects
     - Detects simple Date (@df/@sf/@Of all nil, @nth == 0) once per call
     - Bypasses tmx_* method chain for common specs (Y, m, d, A, B, etc.)
     - Precomputes f_year, f_month, f_day, f_wday from instance variables
3. Add FOUR_DIGIT precomputed lookup table
     - "0000".."9999" frozen string table avoids per-call sprintf for years 0..9999
     - Applied to fast paths (%F/%Y-%m-%d, composite, %c) and strftime_format
4. Move TWO_DIGIT, FOUR_DIGIT, and FLAG_* constants to constants.rb
     - Consolidates all Date constants in one file

---
Performance comparison (Date#strftime, Ruby 4.0.1)

Benchmark: Date#strftime (default)
C extension: 2,862,515
Pure Ruby before: 86,749
Pure Ruby after: 1,341,000
vs before: +15.5x
vs C ext: 46.8%
────────────────────────────────────────
Benchmark: Date#strftime(%Y-%m-%d)
C extension: 3,040,048
Pure Ruby before: 108,336
Pure Ruby after: 1,346,000
vs before: +12.4x
vs C ext: 44.3%
────────────────────────────────────────
Benchmark: Date#strftime(%A %B)
C extension: 2,960,230
Pure Ruby before: 83,226
Pure Ruby after: 265,000
vs before: +3.2x
vs C ext: 9.0%
────────────────────────────────────────
Benchmark: Date#strftime(%c)
C extension: 2,001,595
Pure Ruby before: 43,985
Pure Ruby after: 1,010,000
vs before: +23.0x
vs C ext: 50.5%
────────────────────────────────────────
Benchmark: Date#strftime(%x)
C extension: 2,622,940
Pure Ruby before: 89,569
Pure Ruby after: 1,438,000
vs before: +16.1x
vs C ext: 54.8%
────────────────────────────────────────
Benchmark: Date#strftime(composite)
C extension: 1,652,488
Pure Ruby before: 47,050
Pure Ruby after: 1,268,000
vs before: +27.0x
vs C ext: 76.7%

feat: Add direct fast paths to Date#iso8601, #rfc2822, and #asctime

For simple Date objects (no time/offset, @nth == 0), bypass the strftime
machinery entirely and build the result string directly using the FOUR_DIGIT
and TWO_DIGIT precomputed tables.

Changes in lib/date/core.rb:
- Date#iso8601 / #xmlschema: build "%Y-%m-%d" string directly
- Date#rfc2822 / #rfc822: build RFC 2822 string directly using ABBR_DAYNAMES,
  ABBR_MONTHNAMES, FOUR_DIGIT; offset fixed to "+0000" for simple Date
- Date#asctime / #ctime: build ctime string directly with space-padded day;
  uses ABBR_DAYNAMES, ABBR_MONTHNAMES, FOUR_DIGIT
- All three methods fall back to strftime for non-simple Date objects (DateTime,
  objects with non-zero offset, etc.)

Performance comparison (Ruby 4.0.1, measured 2026-02-22):

  Benchmark       | Pure Ruby (after) |  C extension  | After / C ext
  ----------------|-------------------|---------------|---------------
  Date#iso8601    |       2,420,256   |   3,983,797   |       60.8 %
  Date#rfc2822    |       1,811,210   |   1,960,706   |       92.4 %
  Date#asctime    |       1,590,435   |   2,444,714   |       65.1 %

  Unit: iterations/second (i/s). "Pure Ruby (after)" is measured with
  Process.clock_gettime on Ruby 4.0.1 (docker ruby:4.0) after this change.

  All three methods previously delegated to strftime, which parsed the format
  string through strftime_format on every call. The new fast paths eliminate
  that overhead for the common case (simple Date created via Date.new /
  Date.civil). Date#rfc2822 reaches 92% of C extension performance.

feat: Optimize Date._strptime with fast path and byte-level digit scanning

Summary of changes in lib/date/strptime.rb:

1. Add fast path for '%F' / '%Y-%m-%d' (the default format)
   - Parse year/mon/mday directly with a single compiled regex, bypassing
     the full format-string scanner and _strptime_spec dispatch entirely.
   - Note: manual byte scanning was tested for this hot path but found to be
     slower than the C regex engine due to Ruby method-call overhead; the
     regex-based fast path is retained.

2. Remove unnecessary str = string.dup
   - The input string is read-only inside the parser; the copy was wasteful.

3. Cache fmt_len = format.length
   - Avoid repeated length calls on every loop iteration.

4. Replace String-based width accumulation with integer arithmetic
   - width_str = String.new + regex digit check replaced by d.ord - 48
     integer accumulation; field_width is nil (not specified) or Integer.

5. Replace format whitespace regex with explicit char comparison
   - format[i] =~ /\s/ replaced by direct comparison against
     ' ', "\t", "\n", "\r", "\v", "\f".

6. Change _strptime_spec calling convention to in-place hash modification
   - Old: returns {pos: new_pos, hash: h} — allocates two hashes per spec.
   - New: modifies the caller's hash directly and returns new_pos (Integer)
     or nil on failure — zero extra allocations per spec.
   - _strptime_composite updated to match the new convention.

7. Add scan_uint / scan_sint byte-level digit scanners
   - scan_uint(str, pos, max): reads unsigned digits via getbyte, no regex,
     no substring, no MatchData — returns [value, new_pos] or nil.
   - scan_sint(str, pos, max): handles optional leading +/- prefix.

8. Replace regex-based matching in _strptime_spec for all numeric specifiers
   - Affected: Y, C, y, m, d/e, j, H/k, I/l, M, S, L, N, w, u, U, W, V,
     G, g, s, Q.
   - Each str[pos..].match(/\A.../) call eliminated: no substring allocation,
     no MatchData object, no regex engine overhead per numeric field.

Performance comparison (Ruby 4.0.1, measured 2026-02-22):

  Benchmark                  | Before (i/s) | After (i/s) | C ext (i/s) | After / C ext
  ---------------------------|--------------|-------------|-------------|---------------
  Date._strptime  (default)  |       40,248 |     740,864 |   2,610,013 |        28.4 %
  Date.strptime   (default)  |       37,440 |     323,953 |   1,373,996 |        23.6 %
  Date._strptime  (complex)  |       24,015 |      75,532 |   1,097,796 |         6.9 %

  Unit: iterations/second (i/s).
  "Before" is taken from bench/results/20260222/4.0.1_local.tsv (prior to
  this change). "C ext" is taken from bench/results/20260222/4.0.1_system.tsv.
  "After" is measured with Process.clock_gettime on ruby:4.0 (Docker) after
  all changes in this commit.

  The default format (Date._strptime with no explicit format argument) improves
  18.4x over the baseline by hitting the '%F' fast path. The complex format
  ('%Y-%m-%d %H:%M:%S') improves 3.1x through the elimination of per-spec
  substring and MatchData allocations via scan_uint / scan_sint.

feat: Apply StringScanner (Approach A) to Date._strptime general parser

Replace the hand-rolled position-integer loop in _strptime with a
StringScanner-based approach to eliminate redundant String allocations.

Changes:
- Add `require 'strscan'`
- Main loop: use `format.getbyte(i)` (Integer comparison) instead of
  `format[i]` (String allocation) for every format character
- Literal character matching: `ss.string.getbyte(ss.pos) == fb` + `ss.pos += 1`
  instead of `str[pos] == c` (String allocation per literal char)
- Whitespace skipping: `ss.skip(/[ \t\n\r\v\f]*/)` instead of a
  hand-rolled while loop with per-char String comparisons
- `%p`/`%P`: `ss.scan(/a\.?m\.?|p\.?m\.?/i)` eliminates the
  `str[pos..].match(/\A.../)` substring allocation
- `%n`/`%t`: `ss.skip(/\s*/)` replaces `str[pos..].match(/\A\s+/)`
- `_strptime_spec` signature: `(ss, spec, width, hash, next_is_num)`
  — updates `ss.pos` in-place, returns `true`/`nil`
- `_strptime_composite` signature: `(ss, format, context_hash)`
  — uses `format.getbyte(i)` and `ss.string.getbyte(ss.pos)` throughout,
  returns the diff hash (or nil) rather than `{pos:, hash:}`
- The `%F`/`%Y-%m-%d` regex fast path is unchanged

Performance (500,000 iterations, ruby 4.0, linux/amd64):

| Method                      | C ext (i/s) | Before (i/s) | After (i/s) | % of C |
|-----------------------------|-------------|--------------|-------------|--------|
| Date._strptime (default %F) | 2,610,014   |      740,864 |     747,515 |  28.6% |
| Date.strptime  (default %F) | 1,373,996   |      323,953 |     326,703 |  23.8% |
| Date._strptime (complex fmt)| 1,097,796   |       75,532 |     106,642 |   9.7% |

The default-format path shows only marginal gains because the regex
fast path (`%F`/`%Y-%m-%d`) bypasses the StringScanner loop entirely.
The complex-format path improves by ~41% over Approach C, driven by
eliminating per-character String allocations in the main parse loop.

feat: Expand _strptime fast paths for common datetime formats (Approach D)

Add direct regex fast paths for two additional format strings, bypassing
the StringScanner general parser loop entirely.

Changes:
- Refactor the existing `%F`/`%Y-%m-%d` fast path into a `case/when`
  dispatch for extensibility
- Add fast path for `'%Y-%m-%d %H:%M:%S'`: matches
  `/\A([+-]?\d+)-(\d{1,2})-(\d{1,2}) (\d{1,2}):(\d{1,2}):(\d{1,2})/`
  and returns `{year:, mon:, mday:, hour:, min:, sec:}` directly
- Add fast path for `'%Y-%m-%dT%H:%M:%S'`: same as above with `T`
  separator (ISO 8601 datetime)
- Both fast paths validate ranges (mon 1-12, mday 1-31, hour 0-24,
  min 0-59, sec 0-60) and set `:leftover` if trailing input remains

Performance (500,000 iterations, ruby 4.0, linux/amd64):

| Method                            | C ext (i/s) | Before (i/s) | After (i/s) |   % of C |
|-----------------------------------|-------------|--------------|-------------|  --------|
| Date._strptime  (default %F)      | 2,610,014   |      747,515 |     723,600 |    27.7% |
| Date.strptime   (default %F)      | 1,373,996   |      326,703 |     332,490 |    24.2% |
| Date._strptime  (%Y-%m-%d %H:%M:%S) | 1,097,796 |      106,642 |     544,377 |    49.6% |

The complex datetime format improves ~5x by eliminating all StringScanner
and spec-dispatch overhead for the hot `%Y-%m-%d %H:%M:%S` pattern.
The default `%F` case is unchanged in behavior (before/after difference
is within benchmark noise).

feat: Optimize constructors, accessors, and strptime internals (Phases 4–8)

Changes

- strptime internals
  - Precompute NUM_PATTERN_SPECS_TABLE, STRPTIME_DAYNAME_BY_INT_KEY,
    and STRPTIME_MONNAME_BY_INT_KEY in constants.rb for O(1) byte-level lookup
  - Rewrite num_pattern_p with getbyte to eliminate String allocations
  - Replace ss.skip(/[ \t\n\r\v\f]*/) with skip_ws byte-loop helper (3 sites)
-Julian calendar fast path
  - Add integer-arithmetic Julian JD fast path in Date#initialize covering
    Date::JULIAN start and pre-reform years (e.g. Date.civil(-4712, 1, 1))
- Computed accessor hot paths
  - Add @jd fast path to Date#ajd, #amjd, #mjd, #ld bypassing 6-level method chain
- Arithmetic and conversion hot paths
  - Add @jd fast path to Date#to_datetime (skip decode_year + c_civil_to_jd)
  - Inline Date#wday as (@jd + 1) % 7
  - Optimize Date#-(Date) to direct Rational(@jd - other.@jd, 1)
- Constructor ivar reduction and accessor inlining
  - Reduce instance_variable_set calls in Date.jd / .ordinal / .commercial
    fast paths from 11 to 4 (allocate initializes remaining ivars to nil)
  - Inline early-return in Date#year, #month, #day

Performance: C extension vs pure Ruby (i/s, measured on Ruby 4.0.1)

Method                 |   C ext  |   Ruby   | Ruby/C
-----------------------|----------|----------|-------
Date.civil             | 4,626k   | 1,018k   |   22%
Date.civil(neg)        | 4,649k   | 1,258k   |   27%
Date.jd                | 4,884k   | 2,311k   |   47%
Date.ordinal           | 3,032k   |   947k   |   31%
Date.commercial        | 2,483k   |   809k   |   33%
Date#year              | 20,261k  | 13,268k  |   65%
Date#month             | 21,387k  | 15,796k  |   74%
Date#day               | 18,377k  | 15,798k  |   86%
Date#wday              | 20,868k  | 11,957k  |   57%
Date#jd                | 19,662k  | 16,496k  |   84%
Date#ajd               |  7,566k  |  4,643k  |   61%
Date#mjd               | 11,732k  |  8,089k  |   69%
Date#amjd              | 10,574k  |  5,746k  |   54%
Date#ld                | 11,805k  |  8,216k  |   70%
Date#yday              | 15,685k  | 19,099k  |  122%
Date#cwyear            |  4,494k  |  6,810k  |  152%
Date#cweek             |  4,550k  | 14,646k  |  322%
Date#-Date             |  1,858k  |  2,327k  |  125%
Date#-1                |  3,715k  |  3,612k  |   97%
Date#+1                |  5,516k  |  3,460k  |   63%
Date#to_datetime       |  6,686k  |  2,126k  |   32%
Date#iso8601           |  3,997k  |  2,505k  |   63%
Date#rfc3339           |  1,801k  |  2,242k  |  125%
Date#rfc2822           |  1,963k  |  1,687k  |   86%
Date#strftime          |  2,830k  |  1,347k  |   48%
Date._parse(iso)       |    237k  |     61k  |   26%
Date._strptime         |  2,719k  |    782k  |   29%
Date.strptime(complex) |  1,123k  |    114k  |   10%
@jinroq

jinroq commented Feb 23, 2026

Copy link
Copy Markdown
Author

@jeremyevans @nobu

Thank you for your comment! I tried optimizing Ruby implementation of date. The current benchmark results are as follows.

Performance Comparison: C Extension vs Pure Ruby Implementation

Environment: Ruby 4.0.1, measured with benchmark-ips (warmup: 1s, measurement: 2s)
(*) = Pure Ruby is faster than or equal to the C extension.

Method C ext (i/s) Ruby (i/s) Ruby/C
Constructors
Date.civil 4.65 M 1.02 M 21.9%
Date.civil(sg) 4.46 M 1.38 M 31.1%
Date.civil(-1) 4.68 M 994.6 k 21.3%
Date.civil(neg) 4.56 M 1.26 M 27.6%
Date.jd 4.96 M 2.31 M 46.6%
Date.ordinal 3.02 M 947.4 k 31.3%
Date.commercial 2.48 M 809.0 k 32.6%
Date.today 176.9 k 424.0 k 239.6% (*)
Validation
Date.valid_civil? 10.75 M 2.73 M 25.4%
Date.valid_civil?(false) 10.96 M 2.73 M 24.9%
Date.valid_ordinal? 4.23 M 3.89 M 91.9%
Date.valid_commercial? 3.18 M 1.20 M 37.7%
Date.valid_jd? 16.61 M 12.10 M 72.8%
Date.gregorian_leap? 14.91 M 9.42 M 63.2%
Date.gregorian_leap?(1900) 14.71 M 9.03 M 61.4%
Date.julian_leap? 17.20 M 11.81 M 68.7%
Parsing
Date._parse(iso) 232.8 k 61.0 k 26.2%
Date._parse(us) 118.9 k 36.1 k 30.3%
Date._parse(eu) 159.8 k 51.9 k 32.5%
Date._parse(rfc2822) 81.1 k 28.0 k 34.5%
Date.parse(iso) 212.3 k 52.3 k 24.6%
Date.parse(us) 111.5 k 31.9 k 28.6%
Date.parse(eu) 146.8 k 44.7 k 30.4%
Date.parse(compact) 132.0 k 47.2 k 35.8%
Date._strptime 2.72 M 781.8 k 28.7%
Date.strptime 1.37 M 325.9 k 23.7%
Date.strptime(complex) 1.11 M 114.1 k 10.2%
Date._iso8601 723.6 k 464.9 k 64.2%
Date._rfc3339 488.7 k 117.3 k 24.0%
Date._rfc2822 375.8 k 98.9 k 26.3%
Date._xmlschema 766.3 k 551.8 k 72.0%
Date._httpdate 403.5 k 267.2 k 66.2%
Date._jisx0301 713.4 k 538.5 k 75.5%
Date.iso8601 551.8 k 250.6 k 45.4%
Date.rfc3339 396.0 k 100.5 k 25.4%
Date.rfc2822 304.3 k 85.2 k 28.0%
Date.xmlschema 552.7 k 278.4 k 50.4%
Date.httpdate 320.2 k 175.2 k 54.7%
Date.jisx0301 550.4 k 272.4 k 49.5%
Accessors
Date#year 20.24 M 13.27 M 65.6%
Date#month 20.63 M 15.80 M 76.6%
Date#day 18.36 M 15.80 M 86.0%
Date#wday 20.53 M 11.96 M 58.2%
Date#yday 15.60 M 19.10 M 122.4% (*)
Date#jd 19.42 M 16.50 M 85.0%
Date#ajd 7.55 M 4.64 M 61.5%
Date#mjd 11.60 M 8.09 M 69.7%
Date#amjd 10.45 M 5.75 M 55.0%
Date#ld 11.54 M 8.22 M 71.2%
Date#start 20.29 M 20.37 M 100.4% (*)
Date#cwyear 4.49 M 6.81 M 151.5% (*)
Date#cweek 4.56 M 14.65 M 321.5% (*)
Date#cwday 20.34 M 8.58 M 42.2%
Date#leap? 18.67 M 16.74 M 89.7%
Date#julian? 19.40 M 2.68 M 13.8%
Date#gregorian? 19.69 M 2.51 M 12.8%
Weekday predicates
Date#sunday? 20.81 M 9.52 M 45.7%
Date#monday? 20.78 M 9.85 M 47.4%
Date#saturday? 20.79 M 9.80 M 47.1%
Arithmetic
Date#+1 5.89 M 3.46 M 58.7%
Date#+100 5.52 M 3.48 M 63.0%
Date#-1 3.93 M 3.61 M 91.9%
Date#-Date 1.88 M 2.33 M 124.1% (*)
Date#>>1 3.12 M 1.84 M 58.9%
Date#>>12 3.06 M 1.84 M 60.2%
Date#<<1 2.21 M 1.86 M 84.1%
Date#next_day 5.41 M 3.24 M 59.9%
Date#prev_day 3.92 M 3.34 M 85.2%
Date#next_month 3.04 M 1.77 M 58.3%
Date#prev_month 2.21 M 1.75 M 79.3%
Date#next_year 2.74 M 1.69 M 61.7%
Date#prev_year 2.03 M 1.69 M 83.1%
Date#succ 5.57 M 3.05 M 54.7%
Comparison
Date#<=> 11.96 M 3.21 M 26.9%
Date#=== 12.54 M 3.13 M 25.0%
Date#== 2.76 M 3.36 M 121.7% (*)
Date#< 7.69 M 3.42 M 44.5%
Date#> 7.93 M 3.41 M 43.0%
Date#eql? 11.34 M 3.37 M 29.7%
Date#hash 13.64 M 7.37 M 54.0%
Iteration
Date#upto(+30) 154.5 k 49.2 k 31.9%
Date#downto(-30) 118.0 k 50.1 k 42.4%
Date#step(+30,7) 804.9 k 242.0 k 30.1%
Formatting / Output
Date#to_s 3.89 M 1.84 M 47.3%
Date#inspect 548.8 k 746.6 k 136.0% (*)
Date#asctime 2.45 M 1.75 M 71.2%
Date#strftime 2.86 M 1.35 M 47.1%
Date#strftime(%Y-%m-%d) 3.14 M 1.54 M 49.2%
Date#strftime(%A %B) 3.11 M 1.31 M 42.1%
Date#strftime(%c) 2.04 M 1.19 M 58.4%
Date#strftime(%x) 2.68 M 1.65 M 61.7%
Date#strftime(composite) 1.67 M 1.46 M 87.1%
Date#iso8601 4.00 M 2.50 M 62.6%
Date#rfc3339 1.90 M 2.24 M 118.2% (*)
Date#rfc2822 1.97 M 1.69 M 85.5%
Date#xmlschema 3.96 M 2.50 M 63.3%
Date#httpdate 1.66 M 1.78 M 106.9% (*)
Date#jisx0301 2.83 M 2.04 M 72.1%
Conversion & Calendar
Date#to_date 22.05 M 21.08 M 95.6%
Date#to_datetime 6.24 M 2.13 M 34.1%
Date#to_time 1.93 M 861.5 k 44.6%
Date#new_start 4.89 M 987.8 k 20.2%
Date#julian 5.78 M 1.19 M 20.6%
Date#gregorian 5.60 M 1.12 M 20.1%
Date#italy 5.76 M 1.18 M 20.4%
Date#england 5.89 M 1.25 M 21.1%
Serialization & Pattern matching
Date Marshal.dump 534.4 k 494.3 k 92.5%
Date Marshal.load 577.2 k 572.9 k 99.3%
Date#deconstruct_keys(nil) 3.62 M 3.07 M 84.8%
Date#deconstruct_keys(year) 5.61 M 2.74 M 48.9%
Date#deconstruct_keys(y/m/d) 3.94 M 1.64 M 41.6%

Notes

Methods where pure Ruby equals or exceeds the C extension (*):

  • Date.today (239.6%): The C extension pays an extra rb_funcall overhead for Time.now; the Ruby path calls it more directly.
  • Date#yday (122.4%), Date#cwyear (151.5%), Date#cweek (321.5%): Results are memoized after the first computation. The C extension recomputes on every call.
  • Date#-Date (124.1%), Date#== (121.7%): These are hot-pathed to direct integer arithmetic (@jd subtraction / comparison), avoiding the generalized minus_dd / coercion path used in C.
  • Date#rfc3339 (118.2%), Date#httpdate (106.9%), Date#inspect (136.0%): Fast-path string construction added to the Ruby implementation outperforms the C version's general-purpose formatter.
  • Date#start (100.4%): Direct @sg ivar read; effectively the same speed.

@jeremyevans

Copy link
Copy Markdown
Contributor

In the vast majority of cases, the Ruby version is significantly slower than the C extension (2-5x in many cases, with a few cases worse). In the cases where it is faster:

  • Date.today could be implemented in Ruby even with the current extension.
  • Date.yday, Date.cwyear, Date.cweek could cache this information in the C struct, though I'm not sure whether the memory/CPU tradeoff is worth it, as usage of these methods is not nearly as common as other methods.
  • Other cases the performance difference is small, and maybe a similar approach could be used for the C extension to improve performance.

I appreciate that Ruby is easier to maintain than C, but I don't think the performance decrease here is acceptable (or even close to acceptable), considering that Date performance can be a bottleneck in application code.

jinroq added 3 commits March 2, 2026 01:17
Replace the C extension (ext/date/date_core.c, date_parse.c, date_strftime.c,
date_strptime.c) with a pure Ruby implementation while maintaining full
compatibility with the existing test suite (143 tests, 162,593 assertions).

Key implementation files:
- lib/date/core.rb: Date class with calendar conversions, arithmetic, and
  comparison operators
- lib/date/datetime.rb: DateTime class with time component handling
- lib/date/parse.rb: Date parsing (_parse, _rfc3339, _httpdate, _rfc2822,
  _xmlschema, _iso8601, _jisx0301) with byte-level fast paths
- lib/date/strftime.rb: strftime formatting engine
- lib/date/strptime.rb: strptime parsing engine
- lib/date/constants.rb: Consolidated constants
- lib/date/zonetab.rb: Timezone lookup table
- lib/date/time.rb: Time conversion methods

Performance optimizations:
- Byte-level fast paths for common date format parsing (ISO 8601, RFC 3339,
  RFC 2822, HTTP dates)
- Integer-based JD comparison instead of Rational arithmetic
- Lazy evaluation for civil date computation and deconstruct_keys
- Inlined private method calls to reduce __send__ overhead
- O(1) boolean lookup tables replacing Array linear scans in strptime
Add ext/date/generate-zonetab-rb script that reads zonetab.list and produces
lib/date/zonetab.rb, ensuring the Ruby hash table stays in sync with the C
gperf header (zonetab.h) from the same sources.

- Update ext/date/prereq.mk to run generate-zonetab-rb after update-abbr
- Update .github/workflows/update.yml to include lib/date/zonetab.rb in the
  weekly auto-commit
- Remove 75 "xxx standard time" entries from zonetab.rb that did not exist in
  zonetab.list or the C extension (316 entries, matching zonetab.list exactly)
The file already has `# encoding: US-ASCII` magic comment, which makes
string literals US-ASCII by default. Additionally, `# frozen_string_literal: true`
makes .freeze unnecessary for string literals.

- Remove .encode(Encoding::US_ASCII).freeze from MONTH_DAY_SUFFIX
  (keep .freeze since format() returns a new mutable string)
- Remove .encode(Encoding::US_ASCII).freeze from DEFAULT_STRFTIME_FMT
- Remove .encode(Encoding::US_ASCII).freeze from YMD_FMT
@jinroq

jinroq commented Mar 1, 2026

Copy link
Copy Markdown
Author

@jeremyevans
Further optimizations have been made. Please see the benchmark here.

@jinroq

jinroq commented Mar 1, 2026

Copy link
Copy Markdown
Author

@nobu

As for zonetab.rb, it should be updated like as zonetab.h is automatically updated weekly with ext/date/update-abbr.

d276c80 has been fixed.

@jinroq jinroq requested a review from nobu March 1, 2026 18:21
@jeremyevans

Copy link
Copy Markdown
Contributor

@jeremyevans Further optimizations have been made. Please see the benchmark here.

The numbers are looking much better. Common actions (Date.civil) are still over 2x slower, though. I don't see benchmarks for Date.new, which I'm guessing is the most common way to instantiate Date objects.

What is ObjectSpace.memsize_of(Date.new) in the new implementation (it's 72 with the current implementation)?

@jinroq

jinroq commented Mar 2, 2026

Copy link
Copy Markdown
Author

@jeremyevans

Date.new benchmark (Ruby 4.0.1)

Date.new uses Class#newinitialize, while Date.civil is a direct class method. Here are the benchmark results:

Method C ext (i/s) Pure Ruby (i/s) Ratio
Date.new 3,974,217 1,531,119 38.5%
Date.new(no args) 4,509,731 1,415,434 31.4%

ObjectSpace.memsize_of(Date.new(2024, 1, 1))

Ruby Implementation memsize (bytes)
2.6.10 C ext 72
2.7.8 C ext 72
3.0.7 C ext 72
3.1.7 C ext 72
3.2.10 C ext 72
3.3.10 C ext 72
3.3.10 Pure Ruby 80
3.4.8 C ext 72
3.4.8 Pure Ruby 80
4.0.1 C ext 72
4.0.1 Pure Ruby 80

The pure Ruby implementation adds 8 bytes (72 → 80) per Date object. The C extension uses a fixed-size struct, while the pure Ruby version stores
instance variables (@jd, @sg, @df) whose object header accounts for the difference.

Skip Class#new -> allocate -> initialize overhead by defining Date.new
as a class method that delegates to Date.civil. Also replace rescue-based
type guard with direct Integer === checks to eliminate rescue frame cost.

Add DateTime.new override to prevent infinite recursion from inheritance.

Performance comparison (Date.new):
| Method     | C ext (i/s)   | Pure Ruby Before (i/s) | Pure Ruby After (i/s) | vs C ext |
|------------|---------------|------------------------|-----------------------|----------|
| Date.new   | 4,648,499     | 1,531,119 (32.9%)      | 2,011,298 (43.3%)     | 43.3%    |
| Date.civil | 4,648,499     | 2,137,457 (46.0%)      | 2,118,545 (45.6%)     | 45.6%    |
@jinroq

jinroq commented Mar 2, 2026

Copy link
Copy Markdown
Author

@jeremyevans

I also included benchmarks for Ruby + YJIT, which show sufficient results with YJIT enabled.

Method C ext (i/s) Ruby (i/s) vs C ext Ruby+YJIT (i/s) vs C ext
Date.new 4,648,498.9 2,036,150.6 43.8% 6,800,694.4 146.3%
Date.civil 4,648,498.9 2,127,289.0 45.8% 7,054,147.8 151.8%
Date.civil(sg) 4,455,619.7 1,949,044.5 43.7% 6,062,139.0 136.1%
Date.civil(-1) 4,675,053.4 1,626,856.6 34.8% 5,440,817.9 116.4%
Date.civil(neg) 4,563,715.3 1,973,021.1 43.2% 6,551,825.2 143.6%
Date.jd 4,962,000.5 3,534,728.0 71.2% 8,943,396.9 180.2%
Date.ordinal 3,023,006.5 1,888,939.4 62.5% 7,674,517.3 253.9%
Date.commercial 2,478,941.0 1,668,324.9 67.3% 6,896,332.2 278.2%
Date.today 176,946.6 390,025.4 220.4% 485,024.0 274.1%
Date.valid_civil? 10,749,223.3 1,328,167.2 12.4% 6,908,534.8 64.3%
Date.valid_civil?(false) 10,961,294.3 1,762,002.3 16.1% 7,834,300.6 71.5%
Date.valid_ordinal? 4,233,714.4 1,350,027.0 31.9% 6,867,174.2 162.2%
Date.valid_commercial? 3,177,114.9 488,591.4 15.4% 2,387,213.4 75.1%
Date.valid_jd? 16,608,826.2 10,043,032.0 60.5% 24,574,453.1 148.0%
Date.gregorian_leap? 14,913,745.3 6,826,828.2 45.8% 20,989,836.5 140.7%
Date.gregorian_leap?(1900) 14,713,170.5 6,365,132.8 43.3% 20,003,406.2 136.0%
Date.julian_leap? 17,203,164.6 7,938,256.0 46.1% 23,782,084.2 138.2%
Date._parse(iso) 232,827.9 1,294,271.7 555.9% 3,292,066.3 1413.9%
Date._parse(us) 118,925.3 545,573.5 458.8% 661,336.2 556.1%
Date._parse(eu) 159,775.1 475,354.0 297.5% 564,690.0 353.4%
Date._parse(rfc2822) 81,117.7 215,561.9 265.7% 290,587.7 358.2%
Date.parse(iso) 212,254.8 613,834.4 289.2% 1,921,631.3 905.3%
Date.parse(us) 111,533.3 345,410.8 309.7% 542,995.9 486.8%
Date.parse(eu) 146,786.8 309,203.1 210.6% 466,989.1 318.1%
Date.parse(compact) 132,038.9 619,357.0 469.1% 1,981,781.4 1500.9%
Date._strptime 2,721,888.7 1,177,709.3 43.3% 2,147,098.1 78.9%
Date.strptime 1,374,554.3 800,628.7 58.2% 2,014,174.4 146.5%
Date.strptime(complex) 1,114,730.3 338,797.6 30.4% 1,955,386.3 175.4%
Date._iso8601 723,572.8 1,368,965.6 189.2% 4,485,893.6 620.0%
Date._rfc3339 488,706.5 583,054.6 119.3% 2,056,642.5 420.8%
Date._rfc2822 375,804.7 507,272.7 135.0% 1,272,314.2 338.6%
Date._xmlschema 766,287.8 1,367,530.2 178.5% 4,572,233.9 596.7%
Date._httpdate 403,521.7 548,276.9 135.9% 1,247,514.5 309.2%
Date._jisx0301 713,384.0 1,196,125.4 167.7% 2,997,525.2 420.2%
Date.iso8601 551,797.8 627,678.5 113.8% 1,968,689.2 356.8%
Date.rfc3339 395,958.4 384,319.2 97.1% 1,130,586.8 285.5%
Date.rfc2822 304,311.2 342,968.4 112.7% 826,264.6 271.5%
Date.xmlschema 552,686.0 617,598.2 111.7% 2,020,113.2 365.5%
Date.httpdate 320,236.4 363,580.0 113.5% 857,760.5 267.9%
Date.jisx0301 550,356.0 581,997.6 105.7% 1,589,031.8 288.7%
Date#year 20,235,535.8 10,774,110.8 53.2% 27,655,324.0 136.7%
Date#month 20,625,631.9 10,796,911.7 52.3% 28,156,488.9 136.5%
Date#day 18,362,005.8 10,705,688.4 58.3% 27,045,512.4 147.3%
Date#wday 20,530,546.4 11,000,804.2 53.6% 24,403,360.2 118.9%
Date#yday 15,601,172.0 12,876,713.8 82.5% 27,694,425.0 177.5%
Date#jd 19,418,473.5 17,071,262.0 87.9% 29,569,337.8 152.3%
Date#ajd 7,553,399.4 4,298,369.6 56.9% 5,944,814.2 78.7%
Date#mjd 11,597,881.3 17,084,185.2 147.3% 28,895,735.6 249.1%
Date#amjd 10,449,017.7 1,815,580.6 17.4% 2,398,321.6 23.0%
Date#ld 11,537,190.0 16,983,669.5 147.2% 29,047,769.7 251.8%
Date#start 20,286,534.5 17,836,924.1 87.9% 29,304,177.3 144.5%
Date#cwyear 4,493,912.4 16,430,249.9 365.6% 28,214,927.1 627.8%
Date#cweek 4,555,659.7 16,061,004.7 352.6% 28,960,883.3 635.7%
Date#cwday 20,343,305.6 8,371,271.9 41.2% 23,049,924.0 113.3%
Date#leap? 18,667,588.1 8,036,097.5 43.0% 22,004,473.1 117.9%
Date#julian? 19,396,307.1 8,110,830.0 41.8% 23,137,616.8 119.3%
Date#gregorian? 19,689,574.9 9,558,623.9 48.5% 25,319,534.9 128.6%
Date#sunday? 20,810,650.7 9,030,532.6 43.4% 21,230,372.1 102.0%
Date#monday? 20,777,318.6 8,922,244.8 42.9% 23,217,857.0 111.7%
Date#saturday? 20,790,164.4 8,963,911.4 43.1% 23,168,896.6 111.4%
Date#+1 5,893,233.7 2,889,182.8 49.0% 5,040,998.6 85.5%
Date#+100 5,520,702.9 2,937,903.5 53.2% 5,033,091.2 91.2%
Date#-1 3,929,555.6 2,597,216.7 66.1% 5,340,599.7 135.9%
Date#-Date 1,875,155.2 1,972,961.1 105.2% 3,117,628.6 166.3%
Date#>>1 3,119,218.3 1,613,270.3 51.7% 4,226,230.4 135.5%
Date#>>12 3,061,617.7 1,631,701.9 53.3% 4,189,177.5 136.8%
Date#<<1 2,206,893.6 1,547,786.7 70.1% 4,157,267.8 188.4%
Date#next_day 5,407,206.0 2,700,424.4 49.9% 4,981,497.3 92.1%
Date#prev_day 3,923,672.9 2,535,334.7 64.6% 4,917,461.8 125.3%
Date#next_month 3,038,007.5 1,574,504.9 51.8% 4,099,210.9 134.9%
Date#prev_month 2,207,619.7 1,497,764.9 67.8% 4,114,609.8 186.4%
Date#next_year 2,736,947.8 1,566,210.4 57.2% 4,085,857.7 149.3%
Date#prev_year 2,028,662.2 1,417,257.9 69.9% 4,171,630.4 205.6%
Date#succ 5,569,330.6 2,717,558.2 48.8% 5,286,740.7 94.9%
Date#<=> 11,961,484.4 6,609,541.3 55.3% 21,053,740.7 176.0%
Date#=== 12,543,036.5 5,541,720.8 44.2% 19,983,408.5 159.3%
Date#== 2,760,650.4 7,322,420.6 265.2% 22,861,312.3 828.1%
Date#< 7,685,771.4 6,455,497.0 84.0% 20,481,215.1 266.5%
Date#> 7,929,852.6 6,473,788.7 81.6% 21,216,662.3 267.6%
Date#eql? 11,335,321.9 7,930,323.2 70.0% 24,341,834.0 214.7%
Date#hash 13,639,648.1 10,546,833.3 77.3% 16,202,422.0 118.8%
Date#upto(+30) 154,548.8 134,861.8 87.3% 236,819.6 153.2%
Date#downto(-30) 118,018.3 132,404.6 112.2% 243,172.5 206.0%
Date#step(+30,7) 804,863.4 687,218.2 85.4% 1,316,888.1 163.6%
Date#to_s 3,889,377.7 3,769,828.9 96.9% 6,604,270.8 169.8%
Date#inspect 548,810.9 1,187,608.2 216.4% 1,706,568.4 311.0%
Date#asctime 2,452,656.1 1,505,310.9 61.4% 2,201,277.9 89.8%
Date#strftime 2,863,406.8 3,332,272.9 116.4% 6,691,922.9 233.7%
Date#strftime(%Y-%m-%d) 3,138,482.2 2,817,965.7 89.8% 4,125,145.7 131.4%
Date#strftime(%A %B) 3,109,861.1 1,553,015.9 49.9% 2,028,742.3 65.2%
Date#strftime(%c) 2,041,797.7 1,323,864.4 64.8% 1,765,369.8 86.5%
Date#strftime(%x) 2,676,711.2 1,903,867.7 71.1% 2,382,801.7 89.0%
Date#strftime(composite) 1,671,366.5 1,613,957.3 96.6% 2,224,556.5 133.1%
Date#iso8601 3,999,850.6 3,450,501.3 86.3% 6,286,520.0 157.2%
Date#rfc3339 1,896,360.7 2,111,494.2 111.3% 3,381,402.3 178.3%
Date#rfc2822 1,974,373.5 1,593,782.9 80.7% 2,289,653.2 116.0%
Date#xmlschema 3,955,976.0 3,500,299.9 88.5% 6,337,757.0 160.2%
Date#httpdate 1,663,870.8 1,779,108.2 106.9% 2,700,540.9 162.3%
Date#jisx0301 2,833,069.0 2,351,477.2 83.0% 3,091,380.4 109.1%
Date#to_date 22,050,815.1 18,402,764.7 83.5% 31,789,222.2 144.2%
Date#to_datetime 6,244,339.9 296,676.2 4.8% 715,839.5 11.5%
Date#to_time 1,933,709.8 471,209.1 24.4% 539,343.5 27.9%
Date#new_start 4,889,348.6 3,343,094.7 68.4% 4,996,316.0 102.2%
Date#julian 5,778,390.6 3,171,427.1 54.9% 4,960,678.0 85.8%
Date#gregorian 5,599,171.9 3,188,930.4 57.0% 4,961,054.6 88.6%
Date#italy 5,762,239.8 3,251,144.1 56.4% 5,045,699.5 87.6%
Date#england 5,893,742.2 3,233,278.2 54.9% 4,987,572.9 84.6%
Date Marshal.dump 534,350.2 545,965.0 102.2% 583,518.2 109.2%
Date Marshal.load 577,187.8 563,082.5 97.6% 626,781.8 108.6%
Date#deconstruct_keys(nil) 3,618,972.7 3,023,491.2 83.5% 4,757,871.6 131.5%
Date#deconstruct_keys(year) 5,607,895.3 3,859,001.1 68.8% 6,029,250.7 107.5%
Date#deconstruct_keys(y/m/d) 3,942,984.9 1,667,445.0 42.3% 3,673,872.7 93.2%
DateTime.civil 1,851,893.5 313,100.2 16.9% 720,137.6 38.9%
DateTime.jd 1,889,075.3 554,991.0 29.4% 1,067,002.2 56.5%
DateTime.ordinal 1,508,769.7 536,343.4 35.5% 1,514,865.1 100.4%
DateTime.commercial 1,351,171.4 311,674.3 23.1% 1,106,892.2 81.9%
DateTime.now 139,518.4 287,473.0 206.0% 365,739.0 262.1%
DateTime.parse(iso) 84,713.9 33,538.6 39.6% 48,763.1 57.6%
DateTime.parse(rfc2822) 74,627.0 121,937.2 163.4% 209,317.4 280.5%
DateTime.strptime 394,530.9 81,962.5 20.8% 309,988.4 78.6%
DateTime.iso8601 355,014.9 127,652.6 36.0% 246,696.0 69.5%
DateTime.rfc3339 379,544.4 204,046.4 53.8% 616,887.1 162.5%
DateTime.rfc2822 284,987.5 180,603.6 63.4% 467,337.6 164.0%
DateTime.xmlschema 358,991.2 142,858.9 39.8% 259,931.0 72.4%
DateTime.httpdate 298,051.0 196,967.1 66.1% 500,106.0 167.8%
DateTime.jisx0301 345,210.8 132,675.8 38.4% 256,861.4 74.4%
DateTime#year 19,881,470.7 11,071,056.1 55.7% 26,023,439.7 130.9%
DateTime#month 20,924,763.4 12,215,448.7 58.4% 25,207,601.2 120.5%
DateTime#day 18,785,995.5 12,171,972.8 64.8% 25,542,575.0 136.0%
DateTime#hour 21,502,513.9 12,824,548.2 59.6% 28,950,420.8 134.6%
DateTime#min 18,485,836.5 12,626,620.6 68.3% 28,900,737.2 156.3%
DateTime#sec 20,797,882.5 12,742,817.0 61.3% 29,335,953.4 141.1%
DateTime#sec_fraction 10,200,271.9 12,706,458.8 124.6% 29,223,527.0 286.5%
DateTime#offset 9,829,985.5 5,084,467.5 51.7% 6,634,926.1 67.5%
DateTime#zone 3,207,822.5 1,182,995.5 36.9% 1,448,983.9 45.2%
DateTime#wday 20,168,567.7 15,619,425.4 77.4% 24,279,235.7 120.4%
DateTime#yday 14,700,414.1 12,912,832.2 87.8% 25,935,006.8 176.4%
DateTime#jd 19,103,691.9 17,924,629.0 93.8% 21,115,737.4 110.5%
DateTime#ajd 2,100,589.4 757,860.4 36.1% 969,689.3 46.2%
DateTime#+1 5,385,075.7 1,477,393.3 27.4% 4,128,308.8 76.7%
DateTime#+frac 292,699.0 427,903.8 146.2% 705,517.6 241.0%
DateTime#-1 4,006,702.8 1,402,167.4 35.0% 3,811,043.9 95.1%
DateTime#-DT 1,316,024.9 335,266.8 25.5% 480,147.5 36.5%
DateTime#>>1 2,915,200.6 1,573,828.7 54.0% 3,767,400.8 129.2%
DateTime#<<1 2,091,904.2 1,534,607.3 73.4% 3,659,443.7 174.9%
DateTime#next_day 5,282,096.7 1,440,511.1 27.3% 4,050,846.1 76.7%
DateTime#prev_day 3,924,083.2 1,375,942.9 35.1% 3,960,172.7 100.9%
DateTime#next_month 2,789,292.9 1,532,713.4 54.9% 3,670,986.2 131.6%
DateTime#prev_month 2,052,433.3 1,445,021.4 70.4% 3,661,189.0 178.4%
DateTime#next_year 2,672,709.2 1,508,662.3 56.4% 3,683,351.2 137.8%
DateTime#prev_year 1,970,385.2 1,408,942.8 71.5% 3,720,693.5 188.8%
DateTime#<=> 12,133,082.2 6,597,448.7 54.4% 19,160,491.0 157.9%
DateTime#=== 11,302,900.6 4,964,157.5 43.9% 16,864,706.3 149.2%
DateTime#== 2,746,062.5 7,161,663.6 260.8% 20,932,614.6 762.3%
DateTime#eql? 11,282,592.6 7,817,051.7 69.3% 24,241,577.9 214.9%
DateTime#hash 13,883,068.1 6,540,332.4 47.1% 12,099,219.7 87.2%
DateTime#to_s 1,940,594.7 192,016.1 9.9% 484,311.3 25.0%
DateTime#inspect 470,005.4 169,330.0 36.0% 386,114.9 82.2%
DateTime#strftime 1,531,900.0 273,764.3 17.9% 537,769.3 35.1%
DateTime#strftime(%Y%m%d%z) 1,681,470.3 898,867.1 53.5% 1,371,085.3 81.5%
DateTime#strftime(%c) 1,998,483.9 944,923.6 47.3% 1,277,656.4 63.9%
DateTime#strftime(%s) 3,244,638.4 896,356.5 27.6% 1,782,512.8 54.9%
DateTime#iso8601 1,367,474.5 191,089.6 14.0% 491,047.4 35.9%
DateTime#rfc3339 1,374,890.5 191,139.4 13.9% 500,719.3 36.4%
DateTime#rfc2822 1,905,864.7 1,093,096.5 57.4% 1,637,555.2 85.9%
DateTime#xmlschema 1,380,004.6 195,651.5 14.2% 497,945.6 36.1%
DateTime#httpdate 1,666,736.7 1,181,090.2 70.9% 1,772,252.2 106.3%
DateTime#jisx0301 1,184,511.6 425,241.9 35.9% 531,411.1 44.9%
DateTime#new_offset(0) 5,652,278.9 1,014,871.0 18.0% 2,707,830.8 47.9%
DateTime#new_offset(str) 3,782,946.8 422,249.2 11.2% 758,318.2 20.0%
DateTime#new_offset(rat) 2,332,643.4 989,378.9 42.4% 2,375,151.0 101.8%
DateTime#to_date 6,722,539.6 2,827,305.6 42.1% 7,816,381.4 116.3%
DateTime#to_datetime 21,865,006.0 18,708,595.0 85.6% 31,341,332.2 143.3%
DateTime#to_time 910,402.1 594,570.5 65.3% 1,092,081.2 120.0%
DateTime Marshal.dump 529,753.9 335,623.4 63.4% 359,797.7 67.9%
DateTime Marshal.load 563,611.5 310,872.2 55.2% 345,545.9 61.3%
DateTime#deconstruct_keys(nil) 901,133.5 606,273.5 67.3% 708,956.3 78.7%
DateTime#deconstruct_keys(y/h) 4,464,776.5 2,094,465.9 46.9% 4,119,094.4 92.3%
Time#to_date 3,868,352.1 1,509,377.0 39.0% 5,625,967.0 145.4%
Time#to_datetime 1,216,319.1 964,642.1 79.3% 2,239,412.0 184.1%

@jeremyevans

Copy link
Copy Markdown
Contributor

I hadn't realized the earlier benchmark numbers were without YJIT. Anyway who cares about performance is going to enable YJIT, and as the numbers with YJIT are overall significantly faster, I don't see any barriers to merging this. Thank you very much for your work on this.

jinroq added 2 commits March 2, 2026 16:22
Replace all getbyte-based byte manipulation with StringScanner and regex
patterns across parse.rb, strftime.rb, and strptime.rb for improved
readability and maintainability.

Key changes:
- Replace getbyte loops with StringScanner#scan/skip and regex patterns
- Optimize strptime fast paths using match? + byteslice instead of
  StringScanner allocation
- Add 17 pre-compiled regex constants for YJIT inline cache efficiency
- Inline sp_digits_sc and sp_num_p? helper methods
- Replace hash[:_fail] error propagation with throw/catch(:sp_fail)
- Extract compute_3key into lib/date/shared.rb

Performance (iterations/s, pure Ruby + YJIT vs C ext + YJIT):

| Benchmark              | C ext+YJIT  | pure Ruby+YJIT | Ratio |
|------------------------|-------------|-----------------|-------|
| Date._strptime         | 2,263,815   | 1,284,553       |  57%  |
| Date.strptime          | 1,156,519   | 1,189,801       | 103%  |
| Date.strptime(complex) |   905,983   |   424,281       |  47%  |
@jinroq

jinroq commented Mar 3, 2026

Copy link
Copy Markdown
Author

@nobu @jeremyevans

Replace all getbyte-based byte manipulation with StringScanner and regex patterns for improved readability and maintainability.
In exchange, the performance of Date._strptime and Date.strptime has decreased. Please give us your feedback on this.

Benchmark C ext+YJIT pure Ruby+YJIT Ratio
Date._strptime 2,263,815 1,284,553 57%
Date.strptime 1,156,519 1,189,801 103%
Date.strptime(complex) 905,983 424,281 47%

Here are benchmarks for all Date methods:

Method C ext C ext+YJIT pure Ruby pure Ruby+YJIT Ruby+YJIT vs C+YJIT
Date.new 3.6M 4.5M 1.8M 5.7M 129%
Date.civil 3.8M 3.7M 1.6M 5.5M 147%
Date.civil(sg) 3.0M 3.2M 1.7M 5.2M 162%
Date.civil(-1) 3.4M 4.2M 1.6M 4.7M 110%
Date.civil(neg) 3.4M 4.0M 1.6M 5.9M 147%
Date.jd 3.4M 4.1M 3.0M 7.4M 181%
Date.ordinal 2.3M 2.5M 1.7M 6.7M 263%
Date.commercial 2.0M 2.2M 1.5M 5.9M 265%
Date.today 335.1k 343.5k 430.2k 472.7k 138%
Date.valid_civil? 8.4M 10.7M 1.2M 6.1M 57%
Date.valid_civil?(false) 8.3M 12.5M 1.6M 7.0M 56%
Date.valid_ordinal? 3.5M 4.0M 1.3M 6.2M 152%
Date.valid_commercial? 2.8M 3.0M 435.2k 2.1M 71%
Date.valid_jd? 12.7M 19.6M 8.7M 20.5M 104%
Date.gregorian_leap? 11.7M 19.4M 5.9M 15.4M 79%
Date.gregorian_leap?(1900) 11.7M 19.7M 5.7M 17.3M 88%
Date.julian_leap? 11.3M 19.5M 7.2M 20.2M 104%
Date._parse(iso) 182.9k 191.6k 461.4k 546.6k 285%
Date._parse(us) 92.6k 96.1k 449.9k 566.0k 589%
Date._parse(eu) 120.5k 123.1k 403.2k 491.5k 399%
Date._parse(rfc2822) 60.3k 60.0k 142.7k 165.9k 276%
Date.parse(iso) 162.8k 165.3k 300.3k 490.4k 297%
Date.parse(us) 87.1k 87.6k 293.7k 467.0k 533%
Date.parse(eu) 113.6k 114.4k 267.9k 409.9k 358%
Date.parse(compact) 104.4k 106.2k 290.8k 457.5k 431%
Date._strptime 2.2M 2.3M 860.4k 1.3M 57%
Date.strptime 1.1M 1.2M 614.1k 1.2M 103%
Date.strptime(complex) 921.1k 906.0k 254.7k 424.3k 47%
Date._iso8601 534.1k 575.7k 462.8k 578.6k 100%
Date._rfc3339 388.4k 385.2k 190.5k 308.9k 80%
Date._rfc2822 265.2k 221.9k 153.2k 232.5k 105%
Date._xmlschema 560.4k 514.9k 379.7k 438.9k 85%
Date._httpdate 245.4k 237.5k 170.4k 211.0k 89%
Date._jisx0301 376.2k 470.2k 315.8k 383.7k 82%
Date.iso8601 410.8k 404.9k 258.3k 439.5k 109%
Date.rfc3339 304.0k 267.7k 167.7k 228.1k 85%
Date.rfc2822 209.6k 162.9k 119.9k 194.9k 120%
Date.xmlschema 351.0k 299.3k 227.3k 366.8k 123%
Date.httpdate 208.7k 168.2k 121.0k 207.7k 123%
Date.jisx0301 299.5k 342.6k 234.8k 329.2k 96%
Date#year 9.9M 14.6M 7.0M 18.0M 124%
Date#month 9.8M 15.7M 8.4M 15.8M 101%
Date#day 11.3M 19.4M 7.4M 13.1M 67%
Date#wday 13.6M 15.7M 8.2M 15.2M 97%
Date#yday 7.7M 14.3M 11.3M 17.6M 122%
Date#jd 11.3M 16.7M 12.9M 22.7M 136%
Date#ajd 4.7M 6.3M 3.6M 5.0M 78%
Date#mjd 8.1M 8.7M 14.3M 24.1M 277%
Date#amjd 5.7M 7.6M 1.6M 2.2M 29%
Date#ld 7.9M 11.5M 14.9M 24.2M 210%
Date#start 15.4M 23.0M 15.2M 25.0M 108%
Date#cwyear 3.8M 4.3M 15.0M 25.0M 589%
Date#cweek 3.9M 4.2M 14.6M 24.5M 577%
Date#cwday 15.6M 23.4M 7.5M 18.4M 79%
Date#leap? 12.1M 16.8M 7.2M 19.5M 116%
Date#julian? 13.1M 21.0M 7.4M 19.7M 94%
Date#gregorian? 13.1M 18.9M 7.9M 20.7M 109%
Date#sunday? 15.1M 20.2M 6.9M 16.4M 81%
Date#monday? 13.7M 17.7M 7.2M 17.1M 97%
Date#saturday? 12.0M 18.9M 7.7M 14.0M 74%
Date#+1 4.9M 5.7M 2.4M 3.9M 69%
Date#+100 4.7M 5.4M 2.4M 4.4M 81%
Date#-1 3.1M 3.6M 2.3M 4.1M 113%
Date#-Date 1.5M 1.5M 1.8M 2.9M 187%
Date#>>1 2.5M 2.6M 1.4M 3.2M 123%
Date#>>12 2.4M 2.3M 1.5M 3.6M 154%
Date#<<1 1.7M 1.9M 1.3M 3.4M 184%
Date#next_day 4.8M 5.8M 2.3M 4.2M 73%
Date#prev_day 3.4M 3.7M 2.2M 4.3M 115%
Date#next_month 2.5M 2.6M 1.3M 3.5M 136%
Date#prev_month 1.7M 1.7M 1.3M 3.5M 208%
Date#next_year 2.1M 2.4M 1.4M 3.7M 152%
Date#prev_year 1.6M 1.8M 1.3M 3.4M 193%
Date#succ 5.1M 6.1M 2.4M 4.6M 75%
Date#<=> 8.1M 10.9M 5.8M 19.1M 175%
Date#=== 8.3M 12.9M 5.0M 17.2M 133%
Date#== 2.2M 2.4M 6.6M 20.8M 851%
Date#< 6.0M 7.0M 5.9M 14.8M 210%
Date#> 6.0M 7.1M 5.9M 18.9M 265%
Date#eql? 8.0M 10.5M 7.2M 21.8M 207%
Date#hash 10.3M 16.8M 9.6M 14.8M 88%
Date#upto(+30) 134.1k 139.9k 112.8k 184.6k 132%
Date#downto(-30) 95.9k 99.0k 116.7k 194.0k 196%
Date#step(+30,7) 684.4k 781.9k 610.1k 1.1M 145%
Date#to_s 3.1M 3.4M 3.4M 6.1M 180%
Date#inspect 486.5k 495.3k 1.1M 1.6M 320%
Date#asctime 1.9M 2.0M 1.3M 2.0M 99%
Date#strftime 2.4M 2.6M 3.1M 6.1M 235%
Date#strftime(%Y-%m-%d) 2.6M 2.8M 2.6M 3.8M 133%
Date#strftime(%A %B) 2.4M 2.6M 1.1M 1.7M 64%
Date#strftime(%c) 1.3M 1.6M 864.8k 1.5M 93%
Date#strftime(%x) 1.5M 2.3M 1.7M 2.2M 94%
Date#strftime(composite) 1.4M 1.3M 1.5M 2.0M 152%
Date#iso8601 2.8M 3.5M 3.3M 6.0M 172%
Date#rfc3339 1.5M 1.6M 2.0M 3.2M 203%
Date#rfc2822 1.6M 1.6M 1.5M 2.1M 129%
Date#xmlschema 3.1M 3.4M 3.2M 5.9M 172%
Date#httpdate 1.3M 1.3M 1.7M 2.3M 173%
Date#jisx0301 2.2M 2.5M 2.1M 2.8M 115%
Date#to_date 17.0M 25.8M 16.2M 17.4M 67%
Date#to_datetime 5.8M 5.5M 272.0k 652.0k 12%
Date#to_time 471.9k 441.7k 427.6k 483.9k 110%
Date#new_start 4.9M 5.7M 2.8M 4.0M 70%
Date#julian 5.2M 5.9M 2.7M 3.9M 67%
Date#gregorian 5.3M 5.9M 2.7M 3.8M 65%
Date#italy 5.1M 6.4M 2.7M 3.9M 62%
Date#england 5.2M 6.1M 2.8M 4.3M 71%
Date Marshal.dump 461.3k 457.4k 485.4k 531.5k 116%
Date Marshal.load 496.3k 489.1k 493.4k 559.2k 114%
Date#deconstruct_keys(nil) 2.6M 2.7M 2.5M 3.4M 127%
Date#deconstruct_keys(year) 3.6M 3.8M 3.2M 4.8M 126%
Date#deconstruct_keys(y/m/d) 2.8M 3.0M 1.4M 2.5M 82%
DateTime.civil 1.4M 1.6M 266.5k 620.5k 40%
DateTime.jd 1.5M 1.6M 487.9k 834.8k 51%
DateTime.ordinal 1.2M 1.3M 438.4k 1.4M 111%
DateTime.commercial 1.0M 1.2M 294.5k 1.0M 85%
DateTime.now 243.9k 251.4k 314.2k 351.6k 140%
DateTime.parse(iso) 69.4k 69.0k 28.1k 40.0k 58%
DateTime.parse(rfc2822) 57.8k 59.9k 91.2k 141.4k 236%
DateTime.strptime 311.7k 309.3k 36.1k 54.2k 18%
DateTime.iso8601 294.2k 291.6k 96.1k 155.3k 53%
DateTime.rfc3339 304.2k 312.4k 126.1k 204.3k 65%
DateTime.rfc2822 211.0k 208.7k 107.6k 180.1k 86%
DateTime.xmlschema 282.7k 288.6k 87.3k 148.3k 51%
DateTime.httpdate 208.3k 214.0k 116.0k 184.9k 86%
DateTime.jisx0301 259.6k 233.7k 92.2k 156.4k 67%
DateTime#year 13.4M 21.6M 10.0M 17.7M 82%
DateTime#month 13.0M 24.7M 11.3M 22.2M 90%
DateTime#day 16.0M 24.6M 11.0M 20.9M 85%
DateTime#hour 16.0M 22.9M 11.4M 18.9M 83%
DateTime#min 15.8M 22.8M 11.2M 24.9M 110%
DateTime#sec 15.5M 24.1M 11.4M 23.3M 96%
DateTime#sec_fraction 7.7M 9.6M 10.9M 22.9M 237%
DateTime#offset 5.8M 9.1M 4.5M 6.6M 73%
DateTime#zone 2.5M 2.8M 1.0M 1.3M 46%
DateTime#wday 14.3M 20.2M 13.1M 19.5M 97%
DateTime#yday 10.3M 14.4M 11.3M 19.3M 134%
DateTime#jd 13.4M 18.4M 13.5M 22.0M 120%
DateTime#ajd 1.5M 1.7M 638.8k 916.6k 54%
DateTime#+1 5.3M 6.0M 1.3M 3.8M 63%
DateTime#+frac 236.9k 247.4k 359.4k 640.7k 259%
DateTime#-1 3.5M 3.2M 1.2M 3.5M 111%
DateTime#-DT 1.1M 1.1M 320.8k 451.8k 41%
DateTime#>>1 2.6M 2.7M 1.5M 3.5M 130%
DateTime#<<1 1.8M 1.9M 1.4M 3.3M 176%
DateTime#next_day 5.3M 5.9M 1.3M 3.4M 58%
DateTime#prev_day 3.6M 3.8M 1.2M 3.5M 91%
DateTime#next_month 2.5M 2.6M 1.4M 3.5M 133%
DateTime#prev_month 1.8M 1.9M 1.4M 3.3M 169%
DateTime#next_year 2.3M 2.4M 1.4M 3.2M 135%
DateTime#prev_year 1.7M 1.7M 1.3M 3.5M 200%
DateTime#<=> 9.7M 11.5M 6.2M 16.9M 147%
DateTime#=== 8.3M 11.0M 4.6M 16.4M 149%
DateTime#== 2.4M 2.6M 6.7M 18.5M 715%
DateTime#eql? 8.8M 10.8M 7.3M 21.9M 203%
DateTime#hash 10.5M 16.7M 6.2M 11.3M 67%
DateTime#to_s 1.6M 1.6M 88.5k 121.5k 7%
DateTime#inspect 428.9k 398.6k 63.8k 115.4k 29%
DateTime#strftime 1.3M 1.2M 134.9k 181.7k 15%
DateTime#strftime(%Y%m%d%z) 1.4M 1.5M 796.5k 1.3M 88%
DateTime#strftime(%c) 1.7M 1.8M 832.8k 1.2M 68%
DateTime#strftime(%s) 2.7M 3.0M 455.0k 590.7k 20%
DateTime#iso8601 1.1M 1.2M 85.9k 121.5k 10%
DateTime#rfc3339 1.1M 1.2M 86.2k 119.6k 10%
DateTime#rfc2822 1.5M 1.6M 992.8k 1.5M 96%
DateTime#xmlschema 1.2M 1.2M 86.6k 120.7k 10%
DateTime#httpdate 1.3M 1.4M 1.1M 1.7M 119%
DateTime#jisx0301 1.0M 1.1M 394.3k 496.0k 47%
DateTime#new_offset(0) 4.9M 6.0M 922.9k 2.4M 40%
DateTime#new_offset(str) 3.7M 3.9M 356.0k 572.8k 15%
DateTime#new_offset(rat) 2.0M 1.6M 943.9k 2.1M 128%
DateTime#to_date 5.7M 5.6M 2.7M 7.0M 126%
DateTime#to_datetime 10.9M 17.3M 17.0M 29.6M 172%
DateTime#to_time 588.6k 721.2k 554.6k 997.2k 138%
DateTime Marshal.dump 463.1k 463.7k 310.5k 324.9k 70%
DateTime Marshal.load 488.9k 487.0k 291.2k 328.3k 67%
DateTime#deconstruct_keys(nil) 761.7k 688.3k 528.7k 598.2k 87%
DateTime#deconstruct_keys(y/h) 3.4M 3.6M 1.9M 3.6M 99%
Time#to_date 3.3M 3.6M 1.4M 5.2M 144%
Time#to_datetime 1.1M 1.0M 983.7k 2.2M 209%

Comment thread lib/date/core.rb Outdated
Comment on lines +16 to +24
if day >= 1 && day <= 28
gy = month > 2 ? year : year - 1
gjd_base = (1461 * (gy + 4716)) / 4 + GJD_MONTH_OFFSET[month] + day
a = gy / 100
jd_julian = gjd_base - 1524
gjd = jd_julian + 2 - a + a / 4
obj = allocate
obj.__send__(:init_from_jd, gjd >= start ? gjd : jd_julian, start)
return obj

@rhenium rhenium Mar 3, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I haven't worked on date and am unfamiliar with it, I think we should be cautious about the amount of code inlining/duplication in this patch. This particular fragment is duplicated more than 10 times, whereas it appeared only once in the C implementation (c_civil_to_jd() in date_core.c, I think).

More generally, I worry that a mechanical conversion combined with this kind of micro-optimization may make the codebase less maintainable for humans.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhenium
Thank you for your feedback. As you pointed out, we were over-optimizing by inlining. 7e80db9 has reduced inlining.

@jeremyevans

Copy link
Copy Markdown
Contributor

Replace all getbyte-based byte manipulation with StringScanner and regex patterns for improved readability and maintainability.
In exchange, the performance of Date._strptime and Date.strptime has decreased. Please give us your feedback on this.

In general, getbyte-style optimizations should be used sparingly. If we are going to convert from C to Ruby, we should be converting to idiomatic Ruby as much as possible. The decrease looks pretty substantial, though. Maybe I should do a review of the code and see how maintainable the Ruby code actually looks.

More pressing is the issue that @rhenium mentioned about duplicated code. We want to make sure the Ruby code is as well factored as the C code, or at least close. Duplicating code from a method and inlining it into every caller location is not good.

@jinroq

jinroq commented Mar 7, 2026

Copy link
Copy Markdown
Author

@eregon

What's the reason to use the Ruby implementation only on Ruby >= 3.3?
It would be nicer to avoid having to support 2 duplicate implementations of course.

There are several reasons.

When implementing ruby_date, we focused on faithfully reproducing C language behavior. This led to heavy use of regular expressions. Consequently, passing the test__parse_too_long_year test required Ruby >= 3.3. Since this issue couldn't be resolved through improvements to date itself, the requirement became Ruby >= 3.3.

After overhauling the date implementation strategy, tests failed on Ruby 2.6 to 3.1 due to “force_encoding FrozenError” and “test_memsize failure”. Additionally, Ruby 3.2 fails the test when running test_date_new.rb standalone if the Shape tree is cold and reallocation occurs. Rather than addressing these issues individually, we chose to require Ruby 3.3 or later.

Replace Rational(1, 2) with 0.5r and Rational(0) with 0r across
core.rb, datetime.rb, and strftime.rb. Rational literals are resolved
at compile time, avoiding method dispatch overhead on every call.

Note: Rational(4800001, 2) in amjd is intentionally kept as-is because
4800001/2r is parsed as Integer#/(Rational) division at runtime, which
is 2x slower than the direct Rational() constructor.

Benchmark (Ruby 4.0.1 +YJIT, benchmark-ips):

  Date.new!:
    Before (Rational(1,2)):  2.812M i/s (355.58 ns/i)
    After  (0.5r):           3.596M i/s (278.11 ns/i)
    Improvement:             +27.9%

  Date#day_fraction:
    Before (Rational(0)):   11.230M i/s  (89.05 ns/i)
    After  (0r):            29.947M i/s  (33.39 ns/i)
    Improvement:            +166.7%

  Micro-benchmark (literal vs method call):
    0r   vs Rational(0):    34.5M vs 11.3M i/s (+3.05x)
    0.5r vs Rational(1,2):  34.2M vs  8.8M i/s (+3.90x)
@jeremyevans

Copy link
Copy Markdown
Contributor

I spent some day today reviewing the patch. Due to the size, I could only review a part, but from my review, I'm OK with this approach. In general, the generated Ruby code is reasonable, and at least no worse than the C code it replaces.

In some cases, the generated Ruby code is significantly easier to follow than the C code it replaces. For example, the implementation of gregorian_leap? in the Ruby code (gregorian_leap? -> internal_gregorian_leap?) is significantly simpler than the equivalent code in C (gregorian_leap? -> decode_year -> c_gregorian_leap_p)

However, there are probably a number of cases not covered by tests where behavior between the two implementations differs. One example of this is type conversions. The C extension uses NUM2INT, while the Ruby code converts using Integer (or to_i if the object is known to be Numeric):

# C:
Date.new(2026, "0xa")
# invalid month (not numeric) (TypeError)
Date.new("0x7ea")
# invalid year (not numeric) (TypeError)
Date.jd("0x258de0")
# invalid jd (not numeric) (TypeError)

# Ruby:
Date.new(2026, "0xa")
# => #<Date: 2026-10-01 ((2461315j,0s,0n),+0s,2299161j)>
Date.new("0x7ea")
# => #<Date: 2026-01-01 ((2461042j,0s,0n),+0s,2299161j)>
Date.jd("0x258de0")
#<Date: 2026-04-21 ((2461152j,0s,0n),+0s,2299161j)>

I think that in general, the Ruby version should not convert using Integer. Instead, it should reject non-Numeric input.

The Ruby code includes public methods defined in the C extension only when debugging is enabled:

  • Date.weeknum
  • Date.nth_kday
  • Date.new!
  • Date#nth_kday?

These should be removed, as well as related private methods (e.g Date.weeknum_to_jd, Date.nth_kday_to_jd).

In general, the documentation for the Ruby code should not need explicit call-seq lines, as rdoc can parse the method definition. The explicit call-seq does indicate return types, but I'm not sure how much value that adds.

Note that the Ruby code is in some cases missing one of the main optimizations that was introduced in the C rewrite in Ruby 1.9.3, which is that it lazily computes values it doesn't need. For example, if you do Date.civil(2026, 4, 21), the C extension doesn't calculate the julian day until you call a method on the returned Date instance that requires it. Instead, it just stores the year, month, and day. For some reason, the generated Ruby code does not use this approach, and therefore, even though it's still faster with yjit than C extension, it has lower performance than it could have otherwise.

I did a brief test of making the calculation lazy, and performance of Date.civil increased by over 10%. I'm including an example diff at the end. However, be aware that it is incomplete. Code currently accessing @jd in instance methods needs to be changed to call jd.

Other possible issues, potential performance enhancements, and minor notes:

  • The initialization code is not currently shape friendly. It should be changed to initialize all unset instance variables to nil to ensure all instances have the same shape.
  • .new calls .civil instead of being an alias to .civil. Similarly, #iso8601 should probably be an alias to #to_s
  • Date.idiv private method is probably not worth it, I would switch idiv(a, b) to a.div(b).
  • Date#initialize is defined, but nothing calls it. It should be removed, or it should be defined like def initialize(jd, year, month, day, start) and just set instance variables, and instead of calling init_from_jd, you can alias new to _new, and call _new(jd, nil, nil, nil, start). This would prevent the need for explicit allocate calls, and in general simplify the implementation. The pre-1.9.3 date library used a similar approach I believe.
  • I'm not sure it's worth caching yday, cweek, or cwyear. If we do decide to cache it, it should be stored as an instance variable always so that all instances have the same shape.
  • Date#day_fraction returns Integer in the C implementation, and Rational in the Ruby implementation. I would avoid defining @df in Date. Date code should call day_fraction instead of accessing @df. DateTime can set @df and override #day_fraction to return @df.
  • In addition to defining Date#<=>, the Ruby implementation also defines #<, #>, #==. These should probably be removed, and the implementation from Comparable used.
  • Date#deconstruct_keys should probably call methods (year, month, day, wday) instead of inlining the code.
  • Instead of @sg and @df, we may want more friendly instance variable names (@jd can probably stay as the method has the same name).

I do not know to what extent the Ruby code was manually reviewed prior to submitting this PR. Considering the issues I found during my brief (2.5 hours) review, I suspect this code has not previously been fully manually reviewed. If you are submitting AI generated code for inclusion, I think you should be manually reviewing every line of generated code to check it for sanity. However, as far as I am aware, we don't have an official policy on this, so these are just my personal feelings, not official policy.

Lazy initialization PoC:

diff --git a/lib/date/core.rb b/lib/date/core.rb
index 06bebea..1e12f3e 100644
--- a/lib/date/core.rb
+++ b/lib/date/core.rb
@@ -14,7 +14,7 @@ class Date
     def civil(year = -4712, month = 1, day = 1, start = DEFAULT_SG)
       if Integer === year && Integer === month && Integer === day && month >= 1 && month <= 12
         if day >= 1 && day <= 28
-          return new_from_jd(civil_to_jd(year, month, day, start), start)
+          return new_from_civil(year, month, day, start)
         elsif day >= -31
           dim = if month == 2
             if start == Float::INFINITY
@@ -27,16 +27,13 @@ class Date
           end
           d = day < 0 ? day + dim + 1 : day
           if d >= 1 && d <= dim
-            return new_from_jd(civil_to_jd(year, month, d, start), start)
+            return new_from_civil(year, month, d, start)
           end
         end
       end
       civil_fallback(year, month, day, start)
     end
-
-    def new(year = -4712, month = 1, day = 1, start = DEFAULT_SG)
-      civil(year, month, day, start)
-    end
+    alias new civil
 
     # call-seq:
     #   Date.valid_civil?(year, month, mday, start = Date::ITALY) -> true or false
@@ -312,8 +309,7 @@ class Date
     #
     def today(start = DEFAULT_SG)
       t = Time.now
-      jd = civil_to_jd(t.year, t.mon, t.mday, start)
-      new_from_jd(jd, start)
+      new_from_civil(t.year, t.mon, t.mday, start)
     end
 
     # :nodoc:
@@ -591,6 +587,12 @@ class Date
       obj
     end
 
+    def new_from_civil(year, month, day, sg, df = nil)
+      obj = allocate
+      obj.__send__(:init_from_civil, year, month, day, sg, df)
+      obj
+    end
+
     # Parse offset string like "+09:00", "-07:30", "Z" to seconds.
     def offset_str_to_sec(str)
       case str
@@ -723,7 +725,7 @@ class Date
   #    DateTime.new(2001,2,3,4,5,6,'+7').jd	#=> 2451944
   #    DateTime.new(2001,2,3,4,5,6,'-7').jd	#=> 2451944
   def jd
-    @jd
+    @jd ||= self.class.civil_to_jd(@year, @month, @day, @sg)
   end
 
   # call-seq:
@@ -1836,6 +1838,7 @@ class Date
 
   # override
   def freeze
+    jd
     internal_civil  # compute and cache civil date before freezing
     super
   end
@@ -1848,6 +1851,16 @@ class Date
 
   def init_from_jd(jd, sg, df = nil)
     @jd = jd
+    @year = @month = @day = nil
+    @sg = sg
+    @df = df
+  end
+
+  def init_from_civil(year, month, day, sg, df = nil)
+    @jd = nil
+    @year = year
+    @month = month
+    @day = day
     @sg = sg
     @df = df
   end

jinroq added 8 commits June 6, 2026 21:22
…, fix day_fraction type, shape-friendly init

Implements the first batch of fixes from jeremyevans's review of PR ruby#155(ruby#155 (comment)).

1. Reject non-Numeric arguments (match C's check_numeric)
   The C extension validates constructor arguments with check_numeric and
   raises TypeError for non-Numeric input, while the Ruby code used
   Integer(), which coerces strings such as "0xa". A private check_numeric
   helper is now called from Date.jd, .ordinal, .commercial, the civil
   fallback path, and #initialize, so non-Numeric input raises

     Date.new(2026, "0xa")  # => TypeError (was 2026-10-01)
     Date.new("0x7ea")      # => TypeError (was 2026-01-01)
     Date.jd("0x258de0")    # => TypeError (was 2026-04-21)

2. Remove debug-only methods
   Date.weeknum, Date.nth_kday, Date.new!, Date#nth_kday?,
   DateTime.weeknum and DateTime.nth_kday are guarded by #ifndef NDEBUG in
   the C extension and therefore absent from production builds. They are
   removed here, together with their private helpers weeknum_to_jd and
   nth_kday_to_jd. The corresponding tests self-skip via respond_to?
   guards, so the test suite is unaffected.

3. Fix Date#day_fraction return type
   C returns Integer 0 (INT2FIX(0)) for a simple Date without a fraction
   and a Rational only when a fraction is present. The Ruby code returned
   Rational 0r unconditionally; it now returns Integer 0 for simple dates.

4. Shape-friendly initialization
   Every Date instance lazily added @year/@month/@day/@yday/@cweek/@cwyear
   through its accessors, giving instances divergent object shapes over
   their lifetime. All of these ivars are now initialized to nil up front
   in init_from_jd, initialize_copy and DateTime#_init_datetime, so every
   instance of a class shares a single shape. As a side effect this fixes
   the previously failing test_memsize.

Performance (Ruby 4.1.0dev + YJIT vs C extension baseline
bench/results/20260223_r1/4.0.1_system.tsv; ips, higher is better):

  method            C ext ips      Ruby ips    Ruby/C
  Date.civil        4,648,499     7,012,418      151%
  Date.civil(sg)    4,455,620     6,620,097      149%
  Date.civil(-1)    4,675,053     6,554,829      140%
  Date.civil(neg)   4,563,715     7,185,497      157%
  Date.jd           4,962,000     9,284,179      187%
  Date.ordinal      3,023,006     7,005,965      232%
  Date.commercial   2,478,941     6,068,962      245%
  Date.today          176,947       507,147      287%

  Overall: 113/179 methods >= 100% of C, median 120%.

The shape-friendly change adds six nil ivar assignments per construction
yet construction stays well above the C extension (140-287%).
Continues addressing jeremyevans's review of PR ruby#155.

* Inline the idiv private helper
  idiv(a, b) was a thin wrapper around a.div(b) (floor division). Per the
  review it is not worth a dedicated method, so its twelve call sites in
  jd_to_gregorian and jd_to_julian are replaced with direct (expr).div(n)
  calls and the helper is removed. Integer#div performs floor division
  identically for negative operands, so the behavior is unchanged
  (verified across a wide jd range and historical boundaries such as the
  1582-10-15 Gregorian reform).

* Remove unused Date#initialize
  Date.new is overridden to delegate to Date.civil, which constructs via
  allocate + init_from_jd, so Date#initialize was never reached. DateTime
  defines its own #initialize and does not call super, and no test invokes
  Date#initialize directly. The dead method is removed; its Date.new
  call-seq documentation is moved onto the Date.new singleton method so the
  generated docs are unaffected. Construction now funnels solely through
  civil -> new_from_jd -> init_from_jd.

No behavior change. Full test suite passes (the two remaining errors,
test_string_argument and test_strftime__offset, are pre-existing Test::Unit
helper gaps unrelated to these changes).
…ay/cweek/cwyear caching

Continues addressing jeremyevans's review of PR ruby#155.

* Delegate #<, #>, #== to Comparable
  The C extension includes Comparable and defines only #<=> and #===, so
  #<, #>, #==, #<=, #>= all come from Comparable. The Ruby code additionally
  defined explicit #<, #>, #== whose error paths diverged from the C
  behavior. They are removed, leaving #<=> (and #===, #eql?, #hash) as the
  basis. A characterization test confirms every successful comparison and
  every #== result is unchanged; only the ArgumentError message for an
  uncomparable operand changes to Comparable's wording
  ("... failed: comparator returned nil"), which now matches the C
  extension.

* Simplify Date#/DateTime#deconstruct_keys
  Call the accessor methods (year, month, day, wday, yday and, for
  DateTime, hour, min, sec, sec_fraction, zone) instead of inlining ivar
  reads and repeating "internal_civil unless @year". Behavior, including
  pattern matching and frozen instances, is unchanged.

* Stop caching yday/cweek/cwyear
  The C extension caches only year/mon/mday in its struct and recomputes
  yday, cweek and cwyear on every access. The Ruby code now does the same:
  the @yday/@cweek/@cwyear ivars and their "unless frozen?" cache writes are
  removed, and compute_commercial is a pure computation. This drops the
  per-instance ivar count (Date 9 -> 6, DateTime 14 -> 11) while keeping a
  single shared object shape, and removes the frozen-instance special cases.

No behavior change other than the Comparable error-message alignment noted
above. Full test suite passes (the two remaining errors, test_string_argument
and test_strftime__offset, are pre-existing Test::Unit helper gaps unrelated
to these changes).
Continues addressing jeremyevans's review of PR ruby#155.

Per the review, Date logic should call #day_fraction rather than reading
the @df ivar directly. The computational call sites now do so:

* #ajd          : `@df ? r + @df : r`              -> `r + day_fraction`
* #+ (Numeric)  : `r + (@df || 0)`                 -> `r + day_fraction`
* #- (Date)     : `... + (@df || 0) - other.day_fraction`
                  -> `... + day_fraction - other.day_fraction`
* #- (Numeric)  : `(@df || 0) - r`                 -> `day_fraction - r`

These methods are overridden by DateTime, so they run only on plain Date
instances, where #day_fraction returns `@df || 0`; the rewrite is therefore
behavior-preserving.

@df is kept as the storage for complex (fractional) Date instances, which
the C extension also supports (Date + Rational/Float returns a fractional
Date via d_complex_new_internal). The remaining @df references are all
storage operations: the #day_fraction accessor itself, the df argument
passed through in #+/#- , initialize_copy, marshal_dump, and init_from_jd.

No behavior change: fractional Date arithmetic, #ajd, simple-date
complex dates are unchanged. Full test suite passes (the two remaining
errors, test_string_argument and test_strftime__offset, are pre-existing
Test::Unit helper gaps unrelated to these changes).
Per jeremyevans's review of PR ruby#155, replace the C-derived abbreviated
instance variable names with names matching their public accessors:

* @sg -> @start          (the #start method)
* @df -> @day_fraction   (the #day_fraction method)

This is a pure rename across core.rb and datetime.rb, including the
:@sg/:@df symbol forms used with instance_variable_get/set. Marshal is
unaffected (it uses a positional array format, not ivar names) and no test
depends on the ivar names. Assignment alignment was tidied accordingly.
The Integer fast path in Date#+ and Date#- built the result with
Date.allocate plus instance_variable_set for only @jd and @start. This had
two problems:

* It produced a 2-ivar object whose shape diverged from the canonical
  6-ivar shape established by init_from_jd, defeating the shape-friendly
  initialization.
* It dropped @day_fraction, so adding or subtracting an Integer to a
  fractional Date silently lost the fraction:
  (Date.new(2024,1,1) + Rational(1,2) + 1).day_fraction returned 0
  instead of 1/2. The C extension preserves the fraction here.

Both branches now route through new_from_jd(@jd +/- other, @start,
@day_fraction), which goes through init_from_jd (full canonical shape) and
carries the day fraction, matching the C extension.
Old marshal formats (1.8 and 1.9.2) store the start value (sg) as a
Date::Infinity object. marshal_load kept it as-is, so Date#start returned a
Date::Infinity instance instead of a Float. On MRI this slipped through
because Float#== falls back to the right operand's #== (so
Date::GREGORIAN == d.start happened to be true), but on TruffleRuby
Float#== does not fall back and the comparison returned false, failing
test_marshal18 and test_marshal192 on CI.

The C extension stores start as a double and Date#start returns a Float, so
the loaded value should be a Float too. A normalize_start helper converts a
Date::Infinity start to its Float equivalent (+/-Float::INFINITY) in both
Date#marshal_load and DateTime#marshal_load (formats 1.8 and current);
Integer starts such as Date::ITALY are left untouched.
@jinroq

jinroq commented Jun 6, 2026

Copy link
Copy Markdown
Author

@jeremyevans

Thanks very much for the detailed review — it was extremely helpful. I've gone through the points and pushed a series of commits. Here is a summary of what changed, what I deferred, and a full benchmark across every method.

Addressed

  • Type conversions. Constructors no longer coerce with Integer(). A private check_numeric (mirroring the C check_numeric) now rejects non-Numeric input in Date.jd/.ordinal/.civil/.commercial, so Date.new(2026, "0xa"), Date.new("0x7ea"), Date.jd("0x258de0") raise TypeError exactly as the C extension does.
  • Debug-only methods removed. Date.weeknum, Date.nth_kday, Date.new!, Date#nth_kday?, DateTime.weeknum, DateTime.nth_kday and the private weeknum_to_jd/nth_kday_to_jd are gone (they are #ifndef NDEBUG in C). The existing tests already guard these with respond_to?, so they self-skip.
  • Date#day_fraction now returns Integer 0 for a simple date (was Rational), matching INT2FIX(0).
  • Shape-friendly initialization. All ivars are initialized up front in init_from_jd/initialize_copy/DateTime#_init_datetime, so every instance shares a single shape. This also fixed the previously failing test_memsize.
  • @df access. Date methods (ajd, +, -) now call #day_fraction instead of reading @df directly; @df is retained only as storage for fractional dates.
  • #<, #>, #== removed in favor of Comparable (the C extension includes Comparable and defines only #<=>/#===). A characterization test confirmed every successful comparison and #== result is unchanged; only the ArgumentError message for an uncomparable operand changes to Comparable's wording, which now matches C.
  • idiv inlined to a.div(b).
  • Unused Date#initialize removed; its Date.new documentation moved onto the Date.new singleton method.
  • yday/cweek/cwyear caching removed (the C struct caches only year/mon/mday).
  • deconstruct_keys calls accessor methods instead of inlining ivar reads.
  • Renamed ivars @sg@start and @df@day_fraction to match their accessors.

Bug found while doing the above

The Integer fast path in Date#+/#- rebuilt the object with instance_variable_set for only @jd/@start, which both diverged from the canonical shape and dropped the day fraction:

(Date.new(2024,1,1) + Rational(1,2) + 1).day_fraction  # was 0, C gives (1/2)

Both paths now go through new_from_jd(..., @day_fraction), restoring the shape and preserving the fraction.

Deferred / not done

  • Lazy JD computation. I kept eager computation for now. With YJIT the construction paths are already comfortably above the C extension (see below), so I prioritized the correctness/cleanup items; I'm happy to revisit lazy @jd if you'd like the extra headroom.
  • Removing call-seq. I left these in place. Several public methods default start to a private constant (DEFAULT_SG), so dropping call-seq makes rdoc render start = DEFAULT_SG instead of Date::ITALY, and we'd also lose the documented return types and multi-form signatures. The trade-off didn't seem worth it, but I can revisit if you feel strongly.

Benchmarks (all methods)

Ruby 4.1.0dev with --yjit vs the C extension baseline. Numbers are ips (higher is better); Ruby/C is the ratio.

Summary: 112 of 179 methods meet or exceed the C extension; median 118%. The parse/format paths and core construction/accessors are well above C. The notable ones still below C are #ajd/#amjd (Rational arithmetic), #to_datetime/#to_time (object construction), Date.valid_civil? (C's tight inline check), and most DateTime methods (deprecated; not the optimization focus).

Method C ext (ips) Ruby+YJIT (ips) Ruby/C
Date.civil 4,648,499 7,472,940 161%
Date.civil(sg) 4,455,620 7,164,244 161%
Date.civil(-1) 4,675,053 7,030,232 150%
Date.civil(neg) 4,563,715 7,646,748 168%
Date.jd 4,962,000 9,719,874 196%
Date.ordinal 3,023,006 7,537,550 249%
Date.commercial 2,478,941 7,413,193 299%
Date.today 176,947 513,708 290%
Date.valid_civil? 10,749,223 5,049,352 47%
Date.valid_civil?(false) 10,961,294 6,188,926 56%
Date.valid_ordinal? 4,233,714 4,981,744 118%
Date.valid_commercial? 3,177,115 3,365,717 106%
Date.valid_jd? 16,608,826 24,159,598 145%
Date.gregorian_leap? 14,913,745 23,626,324 158%
Date.gregorian_leap?(1900) 14,713,170 21,331,599 145%
Date.julian_leap? 17,203,165 24,635,634 143%
Date._parse(iso) 232,828 772,920 332%
Date._parse(us) 118,925 710,705 598%
Date._parse(eu) 159,775 617,090 386%
Date._parse(rfc2822) 81,118 221,154 273%
Date.parse(iso) 212,255 589,131 278%
Date.parse(us) 111,533 543,819 488%
Date.parse(eu) 146,787 482,612 329%
Date.parse(compact) 132,039 577,069 437%
Date._strptime 2,721,889 1,645,489 60%
Date.strptime 1,374,554 1,308,795 95%
Date.strptime(complex) 1,114,730 535,278 48%
Date._iso8601 723,573 890,599 123%
Date._rfc3339 488,706 397,127 81%
Date._rfc2822 375,805 321,045 85%
Date._xmlschema 766,288 822,689 107%
Date._httpdate 403,522 406,890 101%
Date._jisx0301 713,384 673,195 94%
Date.iso8601 551,798 588,401 107%
Date.rfc3339 395,958 315,395 80%
Date.rfc2822 304,311 267,608 88%
Date.xmlschema 552,686 579,625 105%
Date.httpdate 320,236 323,183 101%
Date.jisx0301 550,356 480,351 87%
Date#year 20,235,536 30,136,861 149%
Date#month 20,625,632 30,033,905 146%
Date#day 18,362,006 30,266,692 165%
Date#wday 20,530,546 26,490,974 129%
Date#yday 15,601,172 13,482,286 86%
Date#jd 19,418,474 30,633,907 158%
Date#ajd 7,553,399 4,634,411 61%
Date#mjd 11,597,881 30,602,678 264%
Date#amjd 10,449,018 2,341,409 22%
Date#ld 11,537,190 30,151,036 261%
Date#start 20,286,534 30,658,859 151%
Date#cwyear 4,493,912 3,787,172 84%
Date#cweek 4,555,660 3,841,480 84%
Date#cwday 20,343,306 23,778,325 117%
Date#leap? 18,667,588 23,261,054 125%
Date#julian? 19,396,307 24,238,306 125%
Date#gregorian? 19,689,575 21,164,419 107%
Date#sunday? 20,810,651 19,757,901 95%
Date#monday? 20,777,319 24,419,299 118%
Date#saturday? 20,790,164 24,314,691 117%
Date#+1 5,893,234 10,537,284 179%
Date#+100 5,520,703 10,092,922 183%
Date#-1 3,929,556 9,971,656 254%
Date#-Date 1,875,155 5,104,893 272%
Date#>>1 3,119,218 2,234,928 72%
Date#>>12 3,061,618 2,222,326 73%
Date#<<1 2,206,894 2,159,905 98%
Date#next_day 5,407,206 8,889,346 164%
Date#prev_day 3,923,673 9,399,057 240%
Date#next_month 3,038,008 2,184,688 72%
Date#prev_month 2,207,620 2,133,319 97%
Date#next_year 2,736,948 2,187,178 80%
Date#prev_year 2,028,662 2,145,250 106%
Date#succ 5,569,331 9,012,136 162%
Date#<=> 11,961,484 20,950,114 175%
Date#=== 12,543,036 20,604,272 164%
Date#== 2,760,650 2,866,475 104%
Date#< 7,685,771 9,177,654 119%
Date#> 7,929,853 9,299,661 117%
Date#eql? 11,335,322 20,133,074 178%
Date#hash 13,639,648 17,175,203 126%
Date#upto(+30) 154,549 339,110 219%
Date#downto(-30) 118,018 328,546 278%
Date#step(+30,7) 804,863 1,839,163 229%
Date#to_s 3,889,378 7,493,054 193%
Date#inspect 548,811 1,869,719 341%
Date#asctime 2,452,656 2,469,375 101%
Date#strftime 2,863,407 7,475,781 261%
Date#strftime(%Y-%m-%d) 3,138,482 4,565,817 145%
Date#strftime(%A %B) 3,109,861 2,200,826 71%
Date#strftime(%c) 2,041,798 1,958,385 96%
Date#strftime(%x) 2,676,711 2,513,444 94%
Date#strftime(composite) 1,671,366 2,474,704 148%
Date#iso8601 3,999,851 7,256,680 181%
Date#rfc3339 1,896,361 4,008,497 211%
Date#rfc2822 1,974,374 2,551,559 129%
Date#xmlschema 3,955,976 6,989,694 177%
Date#httpdate 1,663,871 2,992,879 180%
Date#jisx0301 2,833,069 3,402,668 120%
Date#to_date 22,050,815 30,374,708 138%
Date#to_datetime 6,244,340 565,555 9%
Date#to_time 1,933,710 539,443 28%
Date#new_start 4,889,349 7,080,373 145%
Date#julian 5,778,391 7,073,751 122%
Date#gregorian 5,599,172 6,971,030 125%
Date#italy 5,762,240 6,809,367 118%
Date#england 5,893,742 7,132,000 121%
Date Marshal.dump 534,350 683,702 128%
Date Marshal.load 577,188 743,423 129%
Date#deconstruct_keys(nil) 3,618,973 4,156,274 115%
Date#deconstruct_keys(year) 5,607,895 6,983,154 125%
Date#deconstruct_keys(y/m/d) 3,942,985 3,777,458 96%
DateTime.civil 1,851,894 538,784 29%
DateTime.jd 1,889,075 1,365,579 72%
DateTime.ordinal 1,508,770 1,469,835 97%
DateTime.commercial 1,351,171 1,476,950 109%
DateTime.now 139,518 384,917 276%
DateTime.parse(iso) 84,714 45,718 54%
DateTime.parse(rfc2822) 74,627 170,434 228%
DateTime.strptime 394,531 66,242 17%
DateTime.iso8601 355,015 191,372 54%
DateTime.rfc3339 379,544 260,134 69%
DateTime.rfc2822 284,988 222,650 78%
DateTime.xmlschema 358,991 200,076 56%
DateTime.httpdate 298,051 248,864 83%
DateTime.jisx0301 345,211 193,827 56%
DateTime#year 19,881,471 29,038,359 146%
DateTime#month 20,924,763 29,253,935 140%
DateTime#day 18,785,996 29,293,320 156%
DateTime#hour 21,502,514 25,127,221 117%
DateTime#min 18,485,836 30,714,762 166%
DateTime#sec 20,797,882 30,326,507 146%
DateTime#sec_fraction 10,200,272 30,753,116 301%
DateTime#offset 9,829,986 7,584,424 77%
DateTime#zone 3,207,822 1,598,958 50%
DateTime#wday 20,168,568 26,264,795 130%
DateTime#yday 14,700,414 12,583,394 86%
DateTime#jd 19,103,692 29,947,692 157%
DateTime#ajd 2,100,589 1,178,891 56%
DateTime#+1 5,385,076 5,845,842 109%
DateTime#+frac 292,699 862,628 295%
DateTime#-1 4,006,703 5,170,300 129%
DateTime#-DT 1,316,025 582,528 44%
DateTime#>>1 2,915,201 2,067,412 71%
DateTime#<<1 2,091,904 2,023,426 97%
DateTime#next_day 5,282,097 5,590,898 106%
DateTime#prev_day 3,924,083 5,064,271 129%
DateTime#next_month 2,789,293 2,047,363 73%
DateTime#prev_month 2,052,433 1,982,994 97%
DateTime#next_year 2,672,709 2,030,525 76%
DateTime#prev_year 1,970,385 2,008,742 102%
DateTime#<=> 12,133,082 20,039,511 165%
DateTime#=== 11,302,901 19,226,894 170%
DateTime#== 2,746,062 2,886,440 105%
DateTime#eql? 11,282,593 18,516,718 164%
DateTime#hash 13,883,068 11,530,258 83%
DateTime#to_s 1,940,595 142,270 7%
DateTime#inspect 470,005 132,130 28%
DateTime#strftime 1,531,900 207,859 14%
DateTime#strftime(%Y%m%d%z) 1,681,470 1,514,645 90%
DateTime#strftime(%c) 1,998,484 1,406,947 70%
DateTime#strftime(%s) 3,244,638 722,105 22%
DateTime#iso8601 1,367,474 139,132 10%
DateTime#rfc3339 1,374,890 137,990 10%
DateTime#rfc2822 1,905,865 1,776,105 93%
DateTime#xmlschema 1,380,005 137,857 10%
DateTime#httpdate 1,666,737 1,970,665 118%
DateTime#jisx0301 1,184,512 605,114 51%
DateTime#new_offset(0) 5,652,279 3,356,476 59%
DateTime#new_offset(str) 3,782,947 956,250 25%
DateTime#new_offset(rat) 2,332,643 2,801,594 120%
DateTime#to_date 6,722,540 12,368,693 184%
DateTime#to_datetime 21,865,006 33,795,421 155%
DateTime#to_time 910,402 1,240,739 136%
DateTime Marshal.dump 529,754 396,681 75%
DateTime Marshal.load 563,612 396,916 70%
DateTime#deconstruct_keys(nil) 901,134 755,244 84%
DateTime#deconstruct_keys(y/h) 4,464,776 4,514,662 101%
Time#to_date 3,868,352 6,555,722 169%
Time#to_datetime 1,216,319 2,390,846 197%

@jeremyevans

Copy link
Copy Markdown
Contributor

@jinroq Thank you for your continued work on this. On July 15 (RubyConf Hack Day), I plan to review the updates to date/core.rb and start the review of changes to other files. I am in general positive on merging this before Ruby 4.1.

As a general point, I think we should drop the C extension when this is merged. Keeping both C and Ruby implementations would increase the maintenance burden, and one advantage of having the code in Ruby is a decreased maintenance burden (performance is the other advantage). We can bump required_ruby_version to Ruby 3.3 (or an earlier version if the tests pass on that version).

Comment thread ext/date/generate-zonetab-rb Outdated
next unless offset_expr

abbr.strip!
offset_expr.strip!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to strip, as eval skips leading/trailing spaces.

Suggested change
offset_expr.strip!

Comment thread ext/date/generate-zonetab-rb Outdated
entries[abbr] = offset
end

sorted = entries.sort_by { |k, _| k }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sorted = entries.sort_by { |k, _| k }
sorted = entries.sort

Comment thread ext/date/generate-zonetab-rb Outdated
Comment on lines +52 to +53
key = abbr.include?(' ') ? %Q("#{abbr}") : %Q("#{abbr}")
f.puts " %-#{max_key_len + 3}s => %d," % [key, offset]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both sides look the same expression.

Suggested change
key = abbr.include?(' ') ? %Q("#{abbr}") : %Q("#{abbr}")
f.puts " %-#{max_key_len + 3}s => %d," % [key, offset]
f.printf " %-*p => %d,\n", max_key_len + 3, abbr, offset

Comment thread ext/date/generate-zonetab-rb Outdated
Comment on lines +15 to +16
File.foreach(list_path) do |line|
line.chomp!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
File.foreach(list_path) do |line|
line.chomp!
File.foreach(list_path, chomp: true) do |line|

Comment thread ext/date/generate-zonetab-rb Outdated
Comment on lines +13 to +25
in_entries = false

File.foreach(list_path) do |line|
line.chomp!
if line == '%%'
if in_entries
break
else
in_entries = true
next
end
end
next unless in_entries

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
in_entries = false
File.foreach(list_path) do |line|
line.chomp!
if line == '%%'
if in_entries
break
else
in_entries = true
next
end
end
next unless in_entries
sections = 0
File.foreach(list_path) do |line|
line.chomp!
break if line == '%%' and (sections += 1) > 1
next if sections < 1

Comment thread ext/date/generate-zonetab-rb Outdated
f.puts '# frozen_string_literal: true'
f.puts
f.puts '# Timezone name => UTC offset (seconds) mapping table.'
f.puts '# Auto-generated from ext/date/zonetab.list by ext/date/generate-zonetab-rb.'

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f.puts '# Auto-generated from ext/date/zonetab.list by ext/date/generate-zonetab-rb.'
f.puts "# Auto-generated from #{list_path} by ext/date/generate-zonetab-rb."

Comment thread lib/date/shared.rb
Comment on lines +9 to +10
b = s.bytes
((b[0] | 0x20) << 16) | ((b[1] | 0x20) << 8) | (b[2] | 0x20)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
b = s.bytes
((b[0] | 0x20) << 16) | ((b[1] | 0x20) << 8) | (b[2] | 0x20)
b0, b1, b2 = s.unpack("C3")
((b0 | 0x20) << 16) | ((b1 | 0x20) << 8) | (b2 | 0x20)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. s.unpack("C3") is functionally equivalent here, and I agree it expresses the "exactly 3 bytes" intent more clearly. However, compute_3key is called from the hot parsing loops in parse.rb and strptime.rb, so I benchmarked it under YJIT:

method ips
bytes (current) 8.26M i/s
unpack C3 4.08M i/s - 2.03x slower

unpack("C3") is about 2x slower than the current s.bytes (the format-string handling has noticeable overhead), so on this hot path I'd prefer to keep s.bytes. The results are otherwise identical — I verified equivalence over the actual day/month abbreviations and all 17,576 lowercase 3-letter combinations.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are micro-optimizing, you might want to try:

b0, b1, b2 = s.getbyte(0), s.getbyte(1), s.getbyte(2)

It should save the array allocation, and also YJIT has codegen for String#getbyte.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# frozen_string_literal: true
require 'benchmark/ips'

s = "ABC"
Benchmark.ips do |x|
  x.report("bytes") do
    b = s.bytes
    ((b[0] | 0x20) << 16) | ((b[1] | 0x20) << 8) | (b[2] | 0x20)
  end

  x.report("unpack") do
    b0, b1, b2 = s.unpack("C3")
    ((b0 | 0x20) << 16) | ((b1 | 0x20) << 8) | (b2 | 0x20)
  end

  x.report("getbyte") do
    ((s.getbyte(0) | 0x20) << 16) | ((s.getbyte(1) | 0x20) << 8) | (s.getbyte(2) | 0x20)
  end

  x.compare!(order: :baseline)
end
ruby 4.0.5 (2026-05-20 revision 64336ffd0e) +YJIT +PRISM [arm64-darwin25]
Warming up --------------------------------------
               bytes     1.649M i/100ms
              unpack     1.189M i/100ms
             getbyte     4.452M i/100ms
Calculating -------------------------------------
               bytes     18.127M (± 0.6%) i/s   (55.17 ns/i) -     90.679M in   5.002478s
              unpack     12.623M (± 0.6%) i/s   (79.22 ns/i) -     64.189M in   5.084961s
             getbyte     52.338M (± 0.4%) i/s   (19.11 ns/i) -    262.667M in   5.018661s

Comparison:
  bytes: 18126826.3 i/s
getbyte: 52338075.0 i/s - 2.89x  faster
 unpack: 12623204.8 i/s - 1.44x  slower

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the usefulness of getbyte, but I avoided it because I overused it in the past, leading to poor maintainability.

@nobu
What do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intended just that the rest of the bytes would be wasted.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're getting the byte value here anyway getbyte seems perfectly fine and the right method for this

Comment thread lib/date/constants.rb
next if n.nil?
b = n.downcase.bytes
h[(b[0] << 16) | (b[1] << 8) | b[2]] = [i, MONTHNAMES[i].length].freeze
}.freeze

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't use Date.compute_3key consistently?

Comment thread lib/date/patterns.rb Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file looks unused.

Comment thread lib/date/constants.rb

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some constants seem to be used more locally.
For instance, HAVE_* and FL_* are used only in a method.

jinroq added 15 commits June 16, 2026 22:39
The `offset_expr` value is only consumed by `eval(offset_expr)`, and `eval` ignores leading/trailing whitespace when parsing its argument as Ruby source. Stripping it beforehand has no effect, so drop the call.
The `abbr.strip!` above is kept because `abbr` is used as a hash key, where surrounding whitespace would be significant.
Verified by regenerating lib/date/zonetab.rb (316 entries) before and after the change: the output is byte-for-byte identical.

Per review by @nobu:
ruby#155 (comment)
Replace `entries.sort_by { |k, _| k }` with `entries.sort`. Hash#sort compares `[key, value]` pairs, but since hash keys are unique the value is never reached as a tiebreaker, so the result is identical to sorting by key alone.
Verified by regenerating lib/date/zonetab.rb (316 entries) before and after the change: the output is byte-for-byte identical.

Per review by @nobu:
ruby#155 (comment)
The previous ternary `abbr.include?(' ') ? %Q("#{abbr}") : %Q("#{abbr}")` had identical branches, so the condition was dead code. Replace the whole block with a single `f.printf "%-*p"` call: `%p` inspects the key(adding the surrounding quotes) and `*` takes the field width argument, removing the redundant ternary and the intermediate `key` variable.

Using inspect via `%p` is also more robust than interpolating quotes by hand, since it would properly escape special characters if a key ever contained them.

Verified by regenerating lib/date/zonetab.rb (316 entries) before and after the change: the output is byte-for-byte identical.

Per review by @nobu:
ruby#155 (comment)
Pass `chomp: true` to `File.foreach` so each line is yielded already chomped, removing the explicit `line.chomp!` call inside the block.

zonetab.list uses LF-only line endings, so this is equivalent to the previous `chomp!` here.

Verified by regenerating lib/date/zonetab.rb (316 entries) before and after the change: the output is byte-for-byte identical.

Per review by @nobu:
ruby#155 (comment)
Replace the `in_entries` boolean and its nested `if`/`else` block with a `sections` counter. `break if line == '%%' and (sections += 1) > 1` stops at the second `%%` (end of the entries section), and `next if sections < 1` skips the declarations before the first `%%`. The first `%%` line passes that guard but is then dropped by the existing `next unless offset_expr`, since it has no comma.

The `line.chomp!` from the original suggestion is omitted here because `File.foreach` is already called with `chomp: true`.

Verified by regenerating lib/date/zonetab.rb (316 entries) before and after the change: the output is byte-for-byte identical.

Per review by @nobu:
ruby#155 (comment)
Take the list and output paths from ARGV instead of hardcoding them relative to __dir__:

```
  list_path, output_path = ARGV
```

This makes the generator path-agnostic so the caller controls where it reads from and writes to. The Usage comment and the auto-generated header line are updated accordingly, the latter now interpolating the actual list_path:

```
  # Auto-generated from #{list_path} by ext/date/generate-zonetab-rb.
```

The only caller, the `update-zonetab` target in ext/date/prereq.mk, is updated to pass the paths. It now runs from $(top_srcdir) and passes the repo-relative paths, so list_path resolves to exactly "ext/date/zonetab.list" and the generated header stays unchanged:

```
  $(RUBY) -C $(top_srcdir) ext/date/generate-zonetab-rb \
    ext/date/zonetab.list lib/date/zonetab.rb
```

Verified by regenerating lib/date/zonetab.rb (316 entries) through the updated invocation: the output, including the header comment, is byte-for-byte identical.

Per reviews by @nobu:
ruby#155 (comment)
ruby#155 (comment)
ruby#155 (comment)
The ABBR_DAY_3KEY / ABBR_MONTH_3KEY tables in constants.rb open-coded the 3-byte key computation that is already factored into the shared
Date.compute_3key helper. Use the helper instead, so the table keys and the runtime lookups in parse.rb / strptime.rb are produced by a single function and cannot drift apart.

To make the helper available while the tables are built at load time, require date/shared before date/constants (shared.rb references no constants, so the reorder is safe). The helper is a private class method, so it is called without an explicit receiver; its `| 0x20` case-folding also makes the previous `.downcase` unnecessary.

This runs only at load time, so there is no runtime performance impact.

Verified that ABBR_DAY_3KEY (7 entries) and ABBR_MONTH_3KEY (12 entries) are identical to the previous inline result, and the date test suite shows no new failures (0 failures; only the 3 pre-existing environment-related errors remain).

Per review by @nobu:
ruby#155 (comment)
This file defined 40 regex constants (TIME_PAT, PARSE_*, ISO8601_*, etc.) but was never required from anywhere (lib/test/ext), and none of its constants were referenced by any other file. The parsing code in parse.rb uses its own patterns (the runtime parse regexes live in constants.rb), so the file was dead code left over from an earlier stage of the port.

Confirmed unused before removal:
- no `require`/`autoload` of date/patterns anywhere
- all 40 constants have zero references outside the file
- after `require "date"`, Date::TIME_PAT and friends are undefined
- Date.parse / Date._parse / Date.rfc3339 all work without it

The date test suite is unchanged by the removal: 144 tests, 162568 assertions, 0 failures (only the 3 pre-existing environment-related errors remain).

Per review by @nobu:
ruby#155 (comment)
The FL_* (strftime flag bits) and HAVE_* (parse character-class bits) constants were defined in the shared constants.rb but are each used in only one file: FL_* by strftime.rb (across internal_strftime, fmt_year, pad_num, fmt_str, fmt_z) and HAVE_* by parse.rb (solely within _parse).

Move each group next to its sole user:
- FL_* -> lib/date/strftime.rb
- HAVE_* -> lib/date/parse.rb

They stay as private constants on Date. HAVE_* is placed directly under `class Date` (outside `class << self`) so the singleton _parse method still resolves the bare names and they remain Date::HAVE_*. Visibility and lookup paths are unchanged.

Verified:
- Date::FL_LEFT / Date::HAVE_ALPHA etc. still raise "private constant ... referenced" when referenced via scope resolution
- strftime (all flag paths) and _parse / parse behave identically
- no -w warnings
- date test suite unchanged: 144 tests, 162568 assertions, 0 failures (only the 3 pre-existing environment-related errors remain)

Per review by @nobu:
ruby#155 (comment)
Following the discussion on ruby#155, remove the C extension and ship date
as a pure Ruby library. Maintaining both a C and a Ruby implementation
that must stay behaviorally identical is a real maintenance burden; the
Ruby implementation is now the single source of truth. Performance-
sensitive users can enable YJIT, which is significantly faster on the
supported Ruby versions.

Changes:
- lib/date.rb: drop the `RUBY_VERSION >= "3.3"` gate and the
  `else require 'date_core'` fallback; require the pure Ruby files
  unconditionally.
- date.gemspec: bump required_ruby_version to ">= 3.3.0", drop
  s.extensions, and remove the C sources from the files list.
- Remove the C extension assets: date_core.c, date_parse.c,
  date_strftime.c, date_strptime.c, date_tmx.h, extconf.rb, and the
  gperf-generated zonetab.h. Keep zonetab.list, generate-zonetab-rb,
  update-abbr, and a trimmed prereq.mk, which still regenerate
  lib/date/zonetab.rb.
- Rakefile: drop the version gate and the Rake::ExtensionTask /
  zonetab.h branch; keep the pure Ruby test task (compile is a no-op).
- .github/workflows/test.yml: min_version 2.6 -> 3.3, drop the gperf
  install, and run `rake test` (compile is no longer needed).
- .github/workflows/update.yml: drop gperf and the `make zonetab.h`
  step; zonetab.rb regeneration is unchanged.
- strptime.rb: reword two comments that referenced the now-deleted
  ext/date/*.c files, keeping the original C function names as
  provenance.

required_ruby_version was chosen empirically: the suite passes on 3.0
and 3.1, the library smoke-loads on 3.2, but breaks on 2.7/2.6 (a
frozen-string incompatibility in strftime). 3.3 is kept as the floor to
match the prior gate, align with the YJIT performance rationale, and
avoid already-EOL versions.

Verified with `rake test`: 144 tests, 162595 assertions, 0 failures,
0 errors, 100% passed; no -w warnings; gemspec loads with
required_ruby_version ">= 3.3.0" and no extensions.
Date validation under a finite cutover (e.g. ITALY, ENGLAND) used the Gregorian rule unconditionally, so dates on or before the reform were mishandled. Verified against the C extension as ground truth, the pure Ruby implementation diverged on:

  - Julian-only leap days, e.g. Date.new(1500,2,29,ITALY) raised instead
    of returning 1500-02-29, and valid_civil? returned false.
  - Reform-gap days, e.g. Date.new(1582,10,10,ITALY) silently returned
    1582-10-20 (data corruption) instead of raising; ENGLAND 1752-09-03
    .. 13 likewise.
  - The shortened reform year, e.g. Date.ordinal(1582,356,ITALY) and
    Date.commercial(1582,52,1,ITALY) returned dates in the next year
    instead of raising.

Root cause: the fast paths and internal_valid_civil?/internal_valid_ordinal? chose the calendar by `sg == Float::INFINITY` only, and never rejected days skipped at the reform.

Fixes:
  - civil fast path: inline civil_to_jd for days 1..28 and detect the
    reform gap (gjd < start <= jjd); delegate the rest to civil_fallback.
  - ordinal / commercial fast paths: bound the result by the start of the
    next (commercial) year so the shortened reform year is respected.
  - internal_valid_civil?: validate on the Gregorian side first, then the
    Julian side (accepting Julian-only leap days and rejecting gap days
    via jjd >= sg) without a round-trip.
  - internal_valid_ordinal?: derive the true year length from the
    difference of the two Jan 1 JDs, which already accounts for the
    reform year.

All 64 calendar oracle cases (ITALY/ENGLAND/JULIAN/GREGORIAN, gap days, pre-reform leap, reform-year ordinal/commercial, negative mday/yday) now match the C extension. Official suite: 144 tests, 162595 assertions, 0 failures, 0 errors, 100%; no -w warnings.

Performance (YJIT, i/s) vs the C baseline (4.0.1_system.tsv):

  method             C ext       pure Ruby    ratio
  civil              4,648,499   6,816,000     147%
  ordinal            3,023,007   7,224,000     239%
  commercial         2,478,941   5,209,000     210%
  valid_civil?      10,749,223   4,752,000      44%
  valid_ordinal?     4,233,714  12,035,000     284%
  valid_commercial?  3,177,115   3,199,000     101%

valid_civil? stays below the C baseline, but it already was before this change (~47%); C's valid_civil? is exceptionally fast and the pure Ruby version has never matched it. This change moves it only ~6% (5.05M -> 4.75M) for the added correctness.
The C extension keeps the calendar-reform cutover as a double, so Date#start always returns a Float (e.g. ITALY -> 2299161.0). The pure Ruby version stored whatever was passed, so #start returned an Integer for the finite sentinels. Besides the type mismatch, this left a latent hash inconsistency: a freshly built date (Integer @start) and a date unmarshaled from C bytes (Float @start) compared eql? but hashed differently, because Date#hash mixes in @start.

Store @start as a Float at every construction point:
  - init_from_jd and _init_datetime: @start = sg.to_f
  - new_start (and thus gregorian/julian/italy/england): start.to_f
  - step/upto/downto and initialize_copy already copy an existing
    @start, so they inherit the Float.

Date#inspect formatted @start directly, which would now show "2299161.0j"; render it the way the C extension does instead:
  - Float::INFINITY  -> "Inf"
  - -Float::INFINITY -> "-Inf"
  - whole numbers    -> integer ("2299161")
This also fixes a pre-existing divergence where the infinity cutovers were shown as "Infinity"/"-Infinity" instead of "Inf"/"-Inf".

Verified against the C extension: #start type/value across new, jd, ordinal, commercial, parse, gregorian, julian, italy, new_start, +, -, next, dup and marshal round-trip all match, as do the three inspect forms; a C-marshaled date and a fresh date are now eql? with equal hashes. Official suite: 144 tests, 162595 assertions, 0 failures, 0 errors, 100%; no -w warnings.

Performance (YJIT, i/s) vs the C baseline (4.0.1_system.tsv); the added to_f is a flonum with no allocation, so constructors are unaffected:

  method      C ext       pure Ruby    ratio
  civil       4,648,499   7,296,000     157%
  ordinal     3,023,007   6,973,000     231%
  commercial  2,478,941   5,408,000     218%
- lib/date.rb: drop `require 'timeout'`; the pure Ruby implementation has no Timeout reference (the parse `limit:` is a string-length check).
- lib/date/constants.rb: remove the dead STRFTIME_DATE_DEFAULT_FMT constant; it had no references (DEFAULT_STRFTIME_FMT is the one in use).

No behavior change; the date test suite still passes 144 tests, 162595 assertions, 0 failures, 0 errors, 100%.
DateTime had no #inspect of its own and inherited Date#inspect, so its output started with "#<Date:" and always showed "0s" for the time, diverging from the C extension which shows "#<DateTime:" and the UTC seconds-into-day.

Add DateTime#inspect that reproduces the C format:
  - Julian Day number and seconds are given in UTC: derive them from the
    locally stored fields as (jd*86400 + h*3600 + m*60 + s - offset),
    then divmod by 86400.
  - sub-second is shown as nanoseconds, sec_fraction * 1e9 kept as a
    Rational: an integer when whole ("123456789n"), otherwise a
    parenthesized fraction ("(1000000000/3)n", "(1/2)n").
  - the offset ("+Ns"/"-Ns") and start follow.

Factor the start (cutover) formatting out of Date#inspect into a shared private helper (inspect_sg) used by both classes.

Verified against the C extension across offsets, fractional and sub-nanosecond seconds, the reform boundary and negative years: Date and DateTime #inspect now match exactly. Official suite: 144 tests, 162595 assertions, 0 failures, 0 errors, 100%; no -w warnings.

Date#inspect (YJIT) stays well above the C baseline (4.0.1_system.tsv):

  method         C ext      pure Ruby    ratio
  Date#inspect   548,811    1,608,000     293%

(DateTime#inspect is not benchmarked: DateTime is deprecated.)
Several long-standing bugs in the pure Ruby port, found by differential testing against the C extension (all present before this branch's recent
work):

A. DateTime#>>, #<<, next_month, prev_month, next_year and prev_year
   raised "TypeError: no implicit conversion of nil into Integer".
   Date#>> rebuilt the result with new_from_jd, which only initializes
   the Date fields, so the inherited path produced a DateTime with nil
   time-of-day ivars that blew up on the next #to_s/strftime.
   Extract the month-shift JD computation into a shared private helper
   (month_shifted_jd) and override DateTime#>> to rebuild via
   _new_dt_from_jd_time, preserving the time of day and offset. #<< and
   the next_*/prev_* helpers delegate to #>> and recover automatically.

B. DateTime#new_offset kept the wall-clock fields and only relabeled the
   offset, instead of re-expressing the same instant. Shift the stored
   fields by (new_offset - old_offset) and recompute jd/h/m/s.

C. strftime %y/%g/%D/%x were wrong for negative years: they used
   `year.abs % 100` (e.g. -44 -> "44") instead of `year % 100`
   (-44 -> "56"), diverging from the C extension. %C was already correct.

D. The day-clamp in the month shift used the Gregorian month length for
   any finite cutover, so e.g. Date.new(1500,1,31,ITALY) >> 1 gave
   1500-02-28 instead of 1500-02-29; month_shifted_jd now picks the
   Julian/Gregorian length according to where the target month falls
   relative to the reform.

Verified against the C extension: DateTime >>/<</next_*/prev_* (with time-of-day, sub-second and end-of-month clamping), new_offset across offsets and fractions, %y/%g/%D for negative years, and the reform-aware clamp all match. Official suite: 144 tests, 162595 assertions, 0 failures, 0 errors, 100%; no -w warnings.

Performance (YJIT, i/s) vs the C baseline (4.0.1_system.tsv):

  method            C ext       pure Ruby    ratio
  Date#>>1          3,119,218   3,750,000     120%
  Date#next_month   3,038,007   3,677,000     121%

(DateTime is deprecated and not benchmarked.)
@jinroq jinroq requested a review from nobu June 20, 2026 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants