Skip to content

swift 6.0+ performance tweaks 3x-6x #9067

Merged
mustiikhalil merged 10 commits into
google:masterfrom
ordo-one:swift-performance-tweaks
Jun 18, 2026
Merged

swift 6.0+ performance tweaks 3x-6x #9067
mustiikhalil merged 10 commits into
google:masterfrom
ordo-one:swift-performance-tweaks

Conversation

@blindspotbounty

Copy link
Copy Markdown
Contributor

We were experimenting with the latest changes for swift flatbuffers runtime and found several tweaks that allow to have significantly faster:

  1. Remove exclusivity checks by making Blob let
  2. Mark Blob as @frozen for library evolution mode
  3. Mark Blob as ~Copyable to avoid compiler inserted copies (swift 6.0+)
  4. Add BitwiseCopyable annotation for directly read types (swift 6.0+)
  5. Add exclusivity(unchecked) for FlatbuffersBuilder - assuming it is always exclusive anyway
  6. Fix func duplicate to not crash in debug with default value

Benchmarks give a lot of false negative/positive results in main vs main on my machine (i.e. deviation is >+-5%). I guess mainly due to allocations.
However, there are the main improvements:

==============================================================
Threshold deviations for FlatbuffersBenchmarks:Reading Doubles
==============================================================
╒══════════════════════════════════════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╕
│ Time (wall clock) (ms, %)                │          tweaks │            main │    Difference % │     Threshold % │
╞══════════════════════════════════════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╡
│ p25                                      │               5 │              52 │            1051 │               5 │
├──────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ p50                                      │               5 │              52 │            1039 │               5 │
├──────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ p75                                      │               5 │              53 │            1036 │               5 │
╘══════════════════════════════════════════╧═════════════════╧═════════════════╧═════════════════╧═════════════════╛

╒══════════════════════════════════════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╕
│ Time (total CPU) (ms, %)                 │          tweaks │            main │    Difference % │     Threshold % │
╞══════════════════════════════════════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╡
│ p25                                      │               5 │              52 │            1050 │               5 │
├──────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ p50                                      │               5 │              53 │            1040 │               5 │
├──────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ p75                                      │               5 │              53 │            1036 │               5 │
╘══════════════════════════════════════════╧═════════════════╧═════════════════╧═════════════════╧═════════════════╛

╒══════════════════════════════════════════╤═════════════════╤═════════════════╤═════════════════╤═════════════════╕
│ Releases (K, %)                          │          tweaks │            main │    Difference % │     Threshold % │
╞══════════════════════════════════════════╪═════════════════╪═════════════════╪═════════════════╪═════════════════╡
│ p25                                      │               0 │            1000 │        50000000 │               5 │
├──────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ p50                                      │               0 │            1000 │        50000000 │               5 │
├──────────────────────────────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┤
│ p75                                      │               0 │            1000 │        50000000 │               5 │
╘══════════════════════════════════════════╧═════════════════╧═════════════════╧═════════════════╧═════════════════╛

For serialization I probably need to add one more test. However the difference is quite noticeable.

Was:
image

Became:
image

All improvements are manly due to removing swift runtime calls: exclusivity checks and runtime metadata access.
In practice, all improvements give up to 3x-6x gain on scale.

@mustiikhalil I put all tweaks that we have currently together.
Let me know if I should split them or if something is going to be implemented other way (or if I need to make separate cases for those improvements).

@mustiikhalil

Copy link
Copy Markdown
Collaborator

@blindspotbounty

1- is there any chance that you also add or investigate the same changes to the flexbuffer implementation?
2- Can we wait on #8983? It will reduce the number of #if

@hassila

hassila commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Sorry for extra ping, I was a bit trigger happy there - missed you were in @mustiikhalil - will let @blindspotbounty respond.

@hassila

hassila commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

2- Can we wait on #8983? It will reduce the number of #if

When do you think it will go in approx? Looks like a great update, 5.10 and older is more or less gone now I think.

@mustiikhalil

mustiikhalil commented Apr 28, 2026

Copy link
Copy Markdown
Collaborator

When do you think it will go in approx? Looks like a great update, 5.10 and older is more or less gone now I think.

Not sure, I pinged for help to review it let's see though! If it doesn't get reviewed the today or tomorrow then we can merge this and update the other branch

@mustiikhalil mustiikhalil self-requested a review April 28, 2026 15:32
@blindspotbounty

Copy link
Copy Markdown
Contributor Author

@blindspotbounty

1- is there any chance that you also add or investigate the same changes to the flexbuffer implementation? 2- Can we wait on #8983? It will reduce the number of #if

Yeah, why not!
I'll take a look at #8983.
Also can look at FlexBuffers (can't promise this week though)

@mustiikhalil mustiikhalil left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work! some minor changes! and ofc the CI hasnt passed

Comment thread swift/Sources/FlatBuffers/FlatBufferBuilder.swift Outdated
Comment thread swift/Sources/FlatBuffers/ByteBuffer.swift
Comment thread swift/Sources/FlatBuffers/ByteBuffer.swift Outdated
Comment thread swift/Sources/FlatBuffers/ByteBuffer.swift
@mustiikhalil

mustiikhalil commented May 6, 2026

Copy link
Copy Markdown
Collaborator

@blindspotbounty The swift upgrade PR has been merged! Please pull, and update the current Pr with the changes required.

We will need to check if the gRPC changes will run with the new code change that you've done.

Then it would be interesting to look at the changes for flexbuffers (I use those more than flatbuffers nowadays)

@mustiikhalil

Copy link
Copy Markdown
Collaborator

What happened to this pr? any updates?
@blindspotbounty

@blindspotbounty

Copy link
Copy Markdown
Contributor Author

What happened to this pr? any updates? @blindspotbounty

@mustiikhalil thank you for pinging me. Sorry, I was busy with other stuff.
I think now I can look at it again!

@mustiikhalil

Copy link
Copy Markdown
Collaborator

@blindspotbounty no worries! And thanks again

@blindspotbounty

Copy link
Copy Markdown
Contributor Author

@mustiikhalil I pushed addressed feedback.

Also added several flex buffers benchmarks as a separate target (let me know if should be the same).

Re-measured latest main vs branch locally:

  FlatBuffers — origin/master → optimized (p50, wall-clock)
  
  ┌───────────────────────────────┬────────────┬────────┬────────┬─────────────────────┐
  │           Benchmark           │    Path    │ Before │ After  │        Δ p50        │
  ├───────────────────────────────┼────────────┼────────┼────────┼─────────────────────┤
  │ Reading Doubles               │ read       │ 39 ns  │ 5 ns   │ +87% (releases 1→0) │
  ├───────────────────────────────┼────────────┼────────┼────────┼─────────────────────┤
  │ Vector 100 Ints               │ read+write │ 230 ns │ 189 ns │ +18%                │
  ├───────────────────────────────┼────────────┼────────┼────────┼─────────────────────┤
  │ Strings 100                   │ write      │ 33 ns  │ 27 ns  │ +18%                │
  ├───────────────────────────────┼────────────┼────────┼────────┼─────────────────────┤
  │ Vector 1 Ints                 │ read+write │ 223 ns │ 188 ns │ +16%                │
  ├───────────────────────────────┼────────────┼────────┼────────┼─────────────────────┤
  │ Vector 100 Bytes              │ read+write │ 47 ns  │ 40 ns  │ +15%                │
  ├───────────────────────────────┼────────────┼────────┼────────┼─────────────────────┤
  │ Vector 1 Bytes                │ read+write │ 58 ns  │ 50 ns  │ +14%                │
  ├───────────────────────────────┼────────────┼────────┼────────┼─────────────────────┤
  │ Vector 100 ContiguousBytes    │ read+write │ 62 ns  │ 56 ns  │ +10%                │
  ├───────────────────────────────┼────────────┼────────┼────────┼─────────────────────┤
  │ Vector of Offsets             │ read+write │ 89 ns  │ 80 ns  │ +10%                │
  └───────────────────────────────┴────────────┴────────┴────────┴─────────────────────┘
  
  FlexBuffers — origin/master → optimized (p50, wall-clock)
  
  ┌─────────────────────────┬───────┬─────────┬─────────┬─────────────────────┐
  │        Benchmark        │ Path  │ Before  │  After  │        Δ p50        │
  ├─────────────────────────┼───────┼─────────┼─────────┼─────────────────────┤
  │ Reading Doubles (raw)   │ read  │ 50 ns   │ 9 ns    │ +82% (releases 1→0) │
  ├─────────────────────────┼───────┼─────────┼─────────┼─────────────────────┤
  │ Decode Map Scalar       │ read  │ 277 ns  │ 142 ns  │ +49%                │
  ├─────────────────────────┼───────┼─────────┼─────────┼─────────────────────┤
  │ Decode Map String       │ read  │ 350 ns  │ 223 ns  │ +36%                │
  ├─────────────────────────┼───────┼─────────┼─────────┼─────────────────────┤
  │ Decode Vector (100)     │ read  │ 4498 ns │ 3729 ns │ +17%                │
  └─────────────────────────┴───────┴─────────┴─────────┴─────────────────────┘

@blindspotbounty

Copy link
Copy Markdown
Contributor Author

Feels like all CI passes except typescript but that one seems unrelated...

@mustiikhalil mustiikhalil left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@blindspotbounty thanks for opening the PR and taking it to the finish line!

@blindspotbounty

Copy link
Copy Markdown
Contributor Author

@mustiikhalil let me know if I should do anything about TS CI which is required.
Perhaps I should update branch at some point (currently on the latest master) or CI should be restarted by someone who has privileges.

@mustiikhalil

Copy link
Copy Markdown
Collaborator

@blindspotbounty Yes, there is a pr that will hopefully fix the TS ci #9140.

@mustiikhalil

Copy link
Copy Markdown
Collaborator

@blindspotbounty can you rebase from master?

@blindspotbounty

Copy link
Copy Markdown
Contributor Author

@mustiikhalil rebased! Waiting CI!

@mustiikhalil

mustiikhalil commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

@blindspotbounty thanks for your contribution. Should i merge?

@blindspotbounty

Copy link
Copy Markdown
Contributor Author

Go ahead!

@mustiikhalil mustiikhalil merged commit 81edeb1 into google:master Jun 18, 2026
54 checks passed
@blindspotbounty blindspotbounty deleted the swift-performance-tweaks branch June 18, 2026 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants