Skip to content

FizzyFlow/doublesync

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DoubleSync

Tree snapshot and delta sync in JavaScript. Content-defined chunking splits files into variable-size pieces whose boundaries are determined by content, not position — so inserting a byte at the start of a file only invalidates the chunks immediately around the insertion. Chunks are stored by SHA-256 hash, giving automatic cross-file deduplication. Folder trees are fingerprinted as Merkle trees, so unchanged subtrees can be skipped with a single hash compare.

Zero dependencies beyond @noble/hashes.

Installation

pnpm add @fizzyflow/doublesync

Quick Start

import DoubleSync, {
    CDCStore,
    DoubleSyncMemoryFolder,
    DoubleSyncMemoryFile,
    DoubleSyncSession,
} from '@fizzyflow/doublesync';

const sync = new DoubleSync({ avgSize: 8192 });

// --- Sender ---
const root = new DoubleSyncMemoryFolder('project');
const src = await root.addFolder('src');
await src.addFile('main.js', new TextEncoder().encode('console.log("hello")'));

const senderStore = new CDCStore();
const sender = new DoubleSyncSession({ sync, senderStore });

const patch = await sender.snapshot(root);  // full patch (first call)

// --- Receiver ---
const dest = new DoubleSyncMemoryFolder('copy');
const receiverStore = new CDCStore();
const receiver = new DoubleSyncSession({ sync, store: receiverStore, dest });

await receiver.apply(patch);
// dest now has src/main.js with the same content

// --- Incremental update ---
await src.addFile('main.js', new TextEncoder().encode('console.log("updated")'));

const diffPatch = await sender.snapshot(root);  // diff patch (subsequent calls)
await receiver.apply(diffPatch);
// only changed chunks are transferred

How It Works

Content-Defined Chunking (FastCDC)

Files are split into variable-size chunks using the FastCDC algorithm with normalized chunking. A rolling Gear hash scans byte-by-byte and declares boundaries when the hash's low bits match a mask. Two masks keep chunk sizes tightly centered around avgSize:

  • Strict mask in [minSize, avgSize) — fewer boundaries, avoids tiny chunks
  • Lenient mask in [avgSize, maxSize) — more boundaries, avoids huge chunks
  • Hard clamps at minSize (default: avgSize / 4, min 64) and maxSize (default: avgSize * 8)

Because boundaries are determined by content, inserting or deleting bytes only affects the immediately surrounding chunks — all other chunks keep their identity (same hash) and don't need retransmission.

Snapshots

A snapshot is a binary document describing a folder tree at a point in time. Each file entry carries a CDC manifest (ordered list of chunk hashes) and a fingerprint (SHA-256 of content). Each folder carries a fingerprint computed recursively from its sorted children, forming a Merkle tree.

Patches

A full patch bundles a snapshot with every chunk it references — fully self-contained.

A diff patch carries only the operations needed to go from the previous snapshot to the current one (add file, replace file, remove, add empty folder), plus only the new chunks. The receiver must hold the previous snapshot to apply it.

DoubleSyncSession manages this automatically: first call to snapshot() returns a full patch, subsequent calls return diff patches.

Compression

Any patch can be wrapped in a gzip compression envelope. The receiver auto-detects and unwraps transparently. Enable with compress: 'gzip' on DoubleSyncSession.

API

DoubleSync

The core sync engine.

const sync = new DoubleSync({ avgSize: 8192, enableMerkleSkip: true });
Parameter Type Default Description
avgSize number 8192 Target average chunk size (must be power of 2)
minSize number avgSize / 4 Minimum chunk size (min 64)
maxSize number avgSize * 8 Maximum chunk size
enableMerkleSkip boolean true Skip unchanged subtrees by fingerprint comparison

Methods

Method Returns Description
buildSnapshot({ root, store }) Promise<Uint8Array> Walk tree, chunk files into store, return snapshot document
buildPatch({ root, senderStore, receiverStore? }) Promise<Uint8Array> Build self-contained patch (snapshot + all chunks)
buildDiffPatch({ root, senderStore, receiverStore?, prevSnapshot }) Promise<{ patch, newSnapshot }> Build incremental diff patch against previous snapshot
applySnapshot({ snapshot, store, dest }) Promise<void> Reconstruct tree from snapshot + chunks in store
applyPatch({ patch, store, dest }) Promise<void> Apply full patch to destination
applyDiffPatch({ patch, store, dest, prevSnapshot }) Promise<Uint8Array> Apply diff patch; returns new snapshot
missingChunks({ snapshot, store }) Array<{ hash, length }> List chunks referenced by snapshot that store lacks

DoubleSyncSession

Stateful sender/receiver wrapper that tracks the previous snapshot automatically.

// Sender
const sender = new DoubleSyncSession({ sync, senderStore, compress: 'gzip' });
const patch = await sender.snapshot(root);

// Receiver
const receiver = new DoubleSyncSession({ sync, store, dest });
await receiver.apply(patch);
Parameter Type Required Description
sync DoubleSync yes Sync engine instance
senderStore CDCStore for snapshot() Chunk store for building patches
store CDCStore for apply() Chunk store for receiving
dest DoubleSyncFolder for apply() Target folder to write into
compress 'gzip' | false no Compress outgoing patches
Method Returns Description
snapshot(root) Promise<Uint8Array> Build next patch (full on first call, diff on subsequent)
apply(patch) Promise<void> Apply patch (auto-detects full vs diff, handles decompression)
Property Type Description
lastSnapshot Uint8Array | null Snapshot from the last snapshot() or apply() call

CDCSync

Low-level chunking engine for a single byte buffer (no tree awareness).

const cdc = new CDCSync({ avgSize: 8192 });
const { manifest, chunks } = cdc.split(data);
const rebuilt = cdc.reconstruct({ manifest, store });
Method Returns Description
split(data) { manifest, chunks } Split buffer, return manifest + chunk list
splitInto({ data, store }) Uint8Array Split + insert chunks into store; return manifest bytes
reconstruct({ manifest, store }) Uint8Array Reassemble buffer from manifest + store
missingChunks({ manifest, store }) Array<{ hash, length }> Chunks manifest references that store lacks

CDCStore

In-memory content-addressable chunk store.

const store = new CDCStore();
const hash = store.put(bytes);         // store by content, returns SHA-256
store.putWithHash(hash, bytes);        // store by known hash
store.has(hash);                       // boolean
const data = store.get(hash);          // Uint8Array | null
store.delete(hash);                    // boolean
Static Method Returns Description
CDCStore.hashOf(bytes) Uint8Array SHA-256 of bytes
CDCStore.hashToHex(hash) string 32-byte hash to lowercase hex
CDCStore.hashFromHex(hex) Uint8Array Hex string to 32-byte hash

Filesystem Abstraction

DoubleSync operates on an abstract filesystem interface. Two in-memory implementations are provided; extend DoubleSyncFile / DoubleSyncFolder to back them with real FS, S3, a database, etc.

DoubleSyncFolder (abstract)

Method Returns Description
list() Promise<Array> Direct children (files and folders)
walk(prefix?) async generator Depth-first { path, file } for all files
findByPath(parts) Promise<File | Folder | null> Navigate by path segments
addFile(name, bytes) Promise<DoubleSyncFile> Create or replace file
addFolder(name) Promise<DoubleSyncFolder> Create or return existing subfolder
removeChild(name) Promise<void> Remove direct child
ensurePath(parts) Promise<DoubleSyncFolder> Create nested folder path

DoubleSyncFile (abstract)

Method Returns Description
getContent() Promise<Uint8Array> Read file bytes
getSize() Promise<number> File size
getFingerprint() Promise<Uint8Array> SHA-256 of content
setContent(bytes) Promise<void> Write file bytes

Concrete implementations

  • DoubleSyncMemoryFolder — children stored in a Map
  • DoubleSyncMemoryFile — content stored as Uint8Array

Format Detection

import { DoubleSyncFormat, MAGICS } from '@fizzyflow/doublesync';

DoubleSyncFormat.detect(bytes);        // 'manifest' | 'snapshot' | 'patch' | 'diff-patch' | 'compressed' | null
DoubleSyncFormat.isDoubleSync(bytes);  // true if any known format
DoubleSyncFormat.isPatch(bytes);       // true if full patch
DoubleSyncFormat.isDiffPatch(bytes);   // true if diff patch
DoubleSyncFormat.isCompressed(bytes);  // true if compression envelope

CDCManifest

Parsed view of a manifest document — the ordered list of chunk hashes and lengths that fully describes one file under CDC. Pair with CDCStore to reconstruct the original bytes.

import { CDCManifest } from '@fizzyflow/doublesync';

const m = new CDCManifest(bytes);   // parse from wire format
m.chunkCount;                        // number
m.totalLength;                       // bigint — original file byte count
for (const { hash, length } of m.chunks()) { /* hash: Uint8Array(32), length: number */ }

CDCManifest.fromChunks(chunks);     // build manifest from an array of { hash, length }

diffSnapshots

Pure diff algorithm: given two parsed DoubleSyncSnapshot instances, returns the ordered list of operations that turn prev into next. Used internally by DoubleSync.buildDiffPatch but available for advanced use.

import { diffSnapshots } from '@fizzyflow/doublesync';

const ops = diffSnapshots(prevSnapshot, nextSnapshot);
// ops: Array<{ kind, path, manifest?, fingerprint? }>
// kind: 'replace_file' | 'add_file' | 'add_empty_folder' | 'remove'

Patch Documents

For advanced use, parse patch documents directly:

import { DoubleSyncPatch, DoubleSyncDiffPatch, DoubleSyncSnapshot } from '@fizzyflow/doublesync';

// Full patch
const p = new DoubleSyncPatch(bytes);
p.snapshot;    // Uint8Array — embedded snapshot
p.chunkCount;  // number
for (const { hash, bytes } of p.chunks()) { /* ... */ }

// Diff patch
const d = new DoubleSyncDiffPatch(bytes);
d.prevSnapshotHash;  // Uint8Array (32 bytes)
d.opCount;           // number
for (const { kind, path, manifest, fingerprint } of d.ops()) { /* ... */ }
// kind: 'replace_file' | 'add_file' | 'add_empty_folder' | 'remove'

// Snapshot
const s = new DoubleSyncSnapshot(bytes);
s.fileCount;     // number
s.folderCount;   // number
s.totalBytes;    // bigint
for (const file of s.files()) { /* ... */ }

Low-level exports

The following are also exported for advanced/internal use:

Export Description
FastCDC Raw FastCDC chunker (used by CDCSync internally)
GEAR The 256-entry Gear hash table used by FastCDC
folderFingerprint Compute a Merkle fingerprint from a list of { name, fingerprint } child specs

Wire Format

All documents use little-endian byte order.

Document Magic Description
CDC Manifest 0xCDCAA001 Ordered chunk hash + length list for one file
Snapshot 0xCDCB0001 Full tree structure with manifests and fingerprints
Full Patch 0xCDCB0101 Snapshot + all referenced chunks
Diff Patch 0xCDCB0102 Prev-snapshot hash + ops + new chunks only
Compressed 0xCDCB02FF Gzip envelope wrapping any of the above

Testing

pnpm test

Tests use vitest:

  • test/cdc.test.js — FastCDC chunking, manifest round-trip, deduplication
  • test/double-sync.test.js — end-to-end tree sync: build snapshot, apply, verify
  • test/double-sync-diff.test.js — incremental diff patches, add/remove/replace ops
  • test/fingerprint.test.js — folder Merkle fingerprint correctness
  • test/double-sync-compressed.test.js — gzip compression envelope round-trip
  • test/double-sync-stress.test.js — large trees, edge cases
  • test/format.test.js — magic number detection and classification

License

AGPL-3.0

About

Tree snapshot and delta sync in JavaScript. Content-defined chunking splits files into variable-size pieces whose boundaries are determined by content, not position — so inserting a byte at the start of a file only invalidates the chunks immediately around the insertion. Chunks are stored by SHA-256 hash, giving automatic cross-file deduplication.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors