Skip to content

HDDS-15417. Handle mixed datanode versions on the replication path#10570

Open
dombizita wants to merge 3 commits into
apache:HDDS-14496-zdufrom
dombizita:HDDS-15417
Open

HDDS-15417. Handle mixed datanode versions on the replication path#10570
dombizita wants to merge 3 commits into
apache:HDDS-14496-zdufrom
dombizita:HDDS-15417

Conversation

@dombizita

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Datanodes act in a client server relationship between themselves for replication. In general the replication process is quite simple - commands are sent to a datanode hosting a container replica (the replication client), and it is told to push/upload the data to another node (the replication server). Incompatible changes in this area can be handled in a similar way to write requests.

SCM knows the apparent version of all Datanodes since their last heartbeat. The last known apparent version of the target/server Datanode can be included in the replicate command sent by SCM to the source/client Datanode. It is always the job of the newer component to handle compatibility, whether it involves software or apparent version. If the source/client Datanode is newer in software or apparent version, it must "downgrade" its request to the lowest common apparent version supported by the server indicated by the SCM command. A newer target/server Datanode will not make incompatible changes to existing APIs, so old client Datanodes will continue to work.

Changes in this patch:

  • Added peerApparentVersion field to ReplicateContainerCommandProto.
  • SCM (ReplicationManager): When sending a replicate command, SCM looks up the peer datanode's last known apparent version (from heartbeat reports) and sets it on the command. For push replication the peer is the target DN; for pull replication it is the minimum apparent version across all source DNs.
  • Datanode (ReplicateContainerCommandHandler): Upon receiving the command, the datanode computes min(own apparent version, peer apparent version) and stores it as lowestCommonApparentVersion on the ReplicationTask.
  • ReplicationTask: Exposes getLowestCommonApparentVersion() for replicators to use when gating version-dependent protocol features in the future.

Used Claude Opus 4.6.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15417

How was this patch tested?

Green CI on my fork: https://github.com/dombizita/ozone/actions/runs/27760621062

@github-actions github-actions Bot added the zdu Pull requests for Zero Downtime Upgrade (ZDU) https://issues.apache.org/jira/browse/HDDS-14496 label Jun 22, 2026
@errose28 errose28 self-requested a review June 23, 2026 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

zdu Pull requests for Zero Downtime Upgrade (ZDU) https://issues.apache.org/jira/browse/HDDS-14496

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant