Skip to content

Add property preprocessors to @fedify/vocab-tools #792

@dahlia

Description

@dahlia

Fedify needs a way to normalize some wire-level ActivityStreams shapes before the generated vocabulary decoder applies the declared TypeScript-facing range of a property.

This came up while investigating #790. ActivityStreams 2.0 allows Object.icon and Object.image to contain either Image or Link, but Fedify has exposed those properties as Image-oriented APIs since 0.4.0. Restoring Image | Link at the TypeScript API level is probably the right long-term fix, but it should wait for Fedify 3.0 because it would widen public constructor and accessor types.

For the current major version, we need a narrower compatibility path: keep the public icon and image APIs as Image-oriented, but allow incoming explicit Link objects to be normalized into the best available Image representation during decoding.

Proposed schema extension

@fedify/vocab-tools could allow a property schema to list preprocessors. The generated decoder would run them before the normal range decoder for each expanded JSON-LD property value.

For example, packages/vocab/src/object.yaml could keep the public range as Image, while adding a normalizer:

- pluralName: icons
  singularName: icon
  singularAccessor: true
  compactName: icon
  uri: "https://www.w3.org/ns/activitystreams#icon"
  description: |
    Indicates an entity that describes an icon for this object.
    The image should have an aspect ratio of one (horizontal) to one
    (vertical) and should be suitable for presentation at a small size.
  range:
  - "https://www.w3.org/ns/activitystreams#Image"
  preprocessors:
  - module: ./preprocessors.ts
    function: normalizeLinkToImage

- pluralName: images
  singularName: image
  singularAccessor: true
  compactName: image
  uri: "https://www.w3.org/ns/activitystreams#image"
  description: |
    Indicates an entity that describes an image for this object.
    Unlike the icon property, there are no aspect ratio or display size
    limitations assumed.
  range:
  - "https://www.w3.org/ns/activitystreams#Image"
  preprocessors:
  - module: ./preprocessors.ts
    function: normalizeLinkToImage

The module path should be resolved from the generated vocabulary source file, or otherwise have a clear documented resolution rule. A relative module path may be safer than @fedify/vocab/preprocessors for generated code inside the same package, with a package subpath export handled separately for consumers.

Preprocessor contract

A preprocessor should receive an expanded JSON-LD property value and return one of three results:

  • a vocabulary object matching the property's declared range, when it handled the value
  • undefined, when it did not handle the value and the normal decoder should continue
  • an Error, when it recognized the value but failed while converting it

The generated decoder should treat an Error result as a decoding failure and throw it immediately. undefined should mean only “not handled.”

The exact runtime type can be adjusted during implementation, but the shape should be close to this:

import type { TracerProvider } from "@opentelemetry/api";
import type { DocumentLoader, VocabularyObject } from "@fedify/vocab-runtime";

export type Json =
  | string
  | number
  | boolean
  | null
  | readonly Json[]
  | { readonly [key: string]: Json };

export interface PropertyPreprocessorContext {
  documentLoader?: DocumentLoader;
  contextLoader?: DocumentLoader;
  tracerProvider?: TracerProvider;
  baseUrl?: URL;
}

export type PropertyPreprocessor<T extends VocabularyObject> = (
  value: Json,
  context: PropertyPreprocessorContext,
) => T | undefined | Error | Promise<T | undefined | Error>;

VocabularyObject may need to be introduced as a shared runtime type for generated vocabulary classes, or replaced with the closest existing base type if there is already a better fit.

The context argument is useful because preprocessors may need to reuse the same documentLoader, contextLoader, tracerProvider, or baseUrl that the generated decoder already passes to fromJsonLd(). This matters for relative URL handling and for preprocessors that delegate to generated vocabulary parsers.

Generated decoder behavior

For each expanded property value, the generated decoder should run preprocessors before the normal range decoder:

let decoded: Image | undefined;

for (const preprocessor of preprocessors) {
  const preprocessed = await preprocessor(v, {
    documentLoader: options.documentLoader,
    contextLoader: options.contextLoader,
    tracerProvider: options.tracerProvider,
    baseUrl: values["@id"] == null ? options.baseUrl : new URL(values["@id"]),
  });
  if (preprocessed instanceof Error) throw preprocessed;
  if (preprocessed !== undefined) {
    decoded = preprocessed;
    break;
  }
}

if (decoded === undefined) {
  decoded = await Image.fromJsonLd(v, {
    ...options,
    baseUrl: values["@id"] == null ? options.baseUrl : new URL(values["@id"]),
  });
}

icons.push(decoded);

This should apply to generated decoding only. Serialization can continue to use the property's declared public range.

Example preprocessor

normalizeLinkToImage would handle expanded JSON-LD values whose @type contains https://www.w3.org/ns/activitystreams#Link. Since expanded JSON-LD represents @type as an array, the implementation should not check for a bare string.

import { Image, Link } from "@fedify/vocab";
import type { PropertyPreprocessor } from "@fedify/vocab-runtime";

export const normalizeLinkToImage: PropertyPreprocessor<Image> = async (
  value,
  context,
) => {
  if (
    typeof value !== "object" ||
    value === null ||
    Array.isArray(value) ||
    !("@type" in value) ||
    !Array.isArray(value["@type"]) ||
    !value["@type"].includes("https://www.w3.org/ns/activitystreams#Link")
  ) {
    return undefined;
  }

  let link: Link;
  try {
    link = await Link.fromJsonLd(value, context);
  } catch (error) {
    return error instanceof Error ? error : new Error(String(error));
  }

  if (link.href == null) return undefined;

  return new Image({
    url: link.href,
    mediaType: link.mediaType,
    names: link.names,
    width: link.width,
    height: link.height,
  });
};

This does not try to solve the bare URL string case from #420. Bare strings are normalized by JSON-LD into @id references because the ActivityStreams context uses @type: @id; deciding how Fedify should treat those references is a separate policy question.

Expected outcome

With this feature, Fedify can fix #790 in a maintenance-friendly way:

  • Object.icon and Object.image remain Image-oriented in the public TypeScript API.
  • Explicit ActivityStreams Link objects in incoming icon and image values no longer fail parsing.
  • @fedify/vocab-tools gains a reusable hook for future cases where the wire format should be normalized before the generated range decoder runs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    component/vocabActivity Vocabulary relatedcomponent/vocab-toolsVocabulary code generation (@fedify/vocab-tools)

    Priority

    Medium

    Effort

    Medium

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions