Skip to content

Add multi-page asset-aware markdown-to-html storage implementation#5

Open
Arlodotexe wants to merge 9 commits into
mainfrom
feature/blog-generator-multipage
Open

Add multi-page asset-aware markdown-to-html storage implementation#5
Arlodotexe wants to merge 9 commits into
mainfrom
feature/blog-generator-multipage

Conversation

@Arlodotexe
Copy link
Copy Markdown
Member

@Arlodotexe Arlodotexe commented Nov 24, 2025

Major refactoring of the blog generation asset pipeline to improve flexibility and maintainability.

Core Interface Changes:

  • Renamed IAssetInclusionStrategy -> IAssetStrategy with nullable return type
  • Updated method signatures across asset interfaces for clarity and consistency
  • Renamed ReferencedAsset record -> PageAsset to better reflect its purpose
  • Renamed PostPageDataModel -> HtmlMarkdownDataTemplateModel for generic use

Asset Strategy System:

  • Deleted ReferenceOnlyInclusionStrategy.cs (replaced with new architecture)
  • Added KnownAssetStrategy.cs: configurable strategy using known asset ID lists
    • Supports both included and referenced asset file ID sets
    • Includes fallback behavior options (Reference/Include/Drop)
  • Added FaultStrategy.cs: enum for unknown asset handling (None/LogWarn/LogError/Throw)
  • Added page-aware markdown route rewriting through the existing IAssetStrategy seam

Recursive Linked Markdown Inclusion:

  • Added source-derived markdown page route indexing
  • Recursively follows linked markdown from the seed source tree
  • Materializes linked external markdown pages under generated _linked/<hash>/ routes
  • Rewrites markdown-to-markdown links to generated page routes instead of raw .md paths
  • Preserves fragments while avoiding raw markdown copies
  • Handles directory links through default markdown landing files such as wct.md, planning,log.md, log.md, and index.md
  • Handles legacy copied note paths with URL-encoded separators, unique Notes-root suffix resolution, and a narrow copied-context alias for the March 3.26 planning-log quote

Asset Detection Improvements:

  • Enhanced RegexAssetLinkDetector to detect markdown links/images and HTML href/src attributes directly
  • Removed arbitrary filename substring detection that could rewrite path fragments incorrectly
  • Added protocol/absolute-link filtering to avoid treating external URLs as local assets
  • Fixed relative path resolution in RelativePathAssetResolver

Processing Pipeline Refactoring:

  • Moved asset detection earlier in pipeline to include template file assets
  • Changed AssetAwareHtmlTemplatedMarkdownFile to scan both template and markdown
  • Updated post-processing to return nullable for dropped assets
  • Reworked rendered-link replacement to rewrite complete href/src values after URL normalization, avoiding partial path replacements
  • Refactored PagesCommand to build the full markdown page route graph before materialization and emit every indexed page route

Command Materialization:

  • Added shared page asset materialization for generated page assets
  • Completed single-page page output materialization
  • Fixed multi-page pages asset copying so rewritten file paths are split into parent folder + file
  • Prevented source markdown files from being copied into generated output
  • Kept materialization file-focused while markdown links are handled as generated page routes

Command Structure:

  • Deleted legacy PostPageCommand.cs, PostPageFolder.cs, IndexHtmlFile.cs, PostPageAssetFolder.cs
  • Added new PageCommand.cs for single-page generation
  • Updated WacsdkBlogCommands to use new command structure

Dependencies:

  • Added OwlCore.Extensions package reference for enhanced functionality

Tests and Validation:

  • Updated test references from InclusionStrategy -> AssetStrategy
  • Replaced stale reference-only test usage with KnownAssetStrategy reference behavior
  • Added command-level materialization coverage for page and pages
  • Added route-index and markdown page link rewrite coverage
  • Added resolver fallback coverage for directory landing pages, copied legacy suffix links, and copied-context planning-log links
  • Verified dotnet build and dotnet test (17 tests)
  • Full April 26 WCT acceptance gate:
    • generated index.html: 92
    • copied .md files: 0
    • checked relative href/src links: 652
    • existing relative link targets: 652
    • broken relative links: 0
    • raw .md href/src links: 0
    • user-level IPFS CID: QmPD49UyfL5dQfdboNQpn5dJpMCdFfhdvuF6UJphQpBvkA

This refactoring enables fine-grained control over asset handling, supporting scenarios like template-based asset inclusion vs markdown-based asset referencing, with configurable fallback behavior for unknown assets. It also verifies that generated sites materialize their index.html, stylesheet, image assets, and recursively linked markdown pages as real output while keeping raw .md files out of the generated site.

…ReferencedAsset tracking

Changes to asset resolution architecture:
- Modified IAssetResolver to accept markdown source per-call instead of storing it as state
- Updated RelativePathAssetResolver to be stateless, receiving context file in ResolveAsync()
- Enables shared resolver instance across multiple pages without coupling

Introduced ReferencedAsset record:
- Captures complete asset reference info: original path, rewritten path, and resolved file
- Facilitates materialization in consumer code (PagesCommand)
- Replaces IFile collection with structured ReferencedAsset collection

AssetAwareHtmlTemplatedMarkdownFile improvements:
- Tracks all referenced assets (both included and referenced) via ReferencedAsset
- Unified asset processing for markdown source and template file detection
- Extracted ProcessAssetLinkAsync() method for shared pipeline logic

Virtual structure simplification:
- Removed automatic asset yielding from AssetAwareHtmlTemplatedMarkdownPageFolder
- Removed template asset yielding from HtmlTemplatedMarkdownPageFolder base class
- Assets now tracked in ReferencedAsset collection for explicit consumer materialization

Implemented PagesCommand:
- CLI command for multi-page blog generation using AssetAwareHtmlTemplatedMarkdownPagesFolder
- Materializes virtual structure by iterating page folders and copying files
- Uses ReferencedAsset.RewrittenPath for correct asset output placement

Added comprehensive test coverage:
- AssetAwareHtmlTemplatedMarkdownPagesFolderTests with 7 test cases
- Covers markdown discovery, hierarchy preservation, asset resolution, and link rewriting

Minor cleanup:
- Removed debug logging from PostPageAssetFolder
- Fixed formatting/whitespace in PostPageFolder and HtmlTemplatedMarkdownPageFolder
- Changed return types to IChildFolder for Pages-related classes
Simplifies asset resolver architecture:
- Removed IFolder SourceFolder property from IAssetResolver interface
- Removed SourceFolder property from RelativePathAssetResolver implementation
- Asset resolution now fully stateless, relying only on markdown source context

Updated PagesCommand implementation:
- Simplified folder materialization logic with better variable naming
- Changed --template-file option to --template-file-name for clarity
- Removed debug logging statements
- Used DepthFirstRecursiveFolder at pagesFolder level instead of per-page
- Improved code formatting and inline documentation
- Streamlined folder creation using CreateFoldersAlongRelativePathAsync pattern

Restored HtmlTemplatedMarkdownPageFolder template asset yielding:
- Re-added template folder enumeration and asset passthrough
- Template HTML file exclusion logic restored
- Enables PostPage scenario to continue working as expected

Updated test setup:
- Removed SourceFolder initialization from RelativePathAssetResolver instances
- Tests now reflect stateless resolver design
…attern

Major refactoring of the blog generation asset pipeline to improve flexibility and maintainability:

**Core Interface Changes:**
- Renamed `IAssetInclusionStrategy` → `IAssetStrategy` with nullable return type
- Updated method signatures across asset interfaces for clarity and consistency
- Renamed `ReferencedAsset` record → `PageAsset` to better reflect its purpose
- Renamed `PostPageDataModel` → `HtmlMarkdownDataTemplateModel` for generic use

**Asset Strategy System:**
- Deleted `ReferenceOnlyInclusionStrategy.cs` (replaced with new architecture)
- Added `KnownAssetStrategy.cs`: configurable strategy using known asset ID lists
  - Supports both included and referenced asset file ID sets
  - Includes fallback behavior options (Reference/Include/Drop)
- Added `FaultStrategy.cs`: enum for unknown asset handling (None/LogWarn/LogError/Throw)

**Asset Detection Improvements:**
- Enhanced `RegexAssetLinkDetector` with improved path matching patterns
- Added protocol scheme detection to filter out absolute URLs
- Added standalone filename pattern detection
- Fixed relative path resolution in `RelativePathAssetResolver`

**Processing Pipeline Refactoring:**
- Moved asset detection earlier in pipeline to include template file assets
- Changed `AssetAwareHtmlTemplatedMarkdownFile` to scan both template and markdown
- Updated post-processing to return nullable for dropped assets
- Refactored `PagesCommand` to configure new asset strategy with separate ID sets

**Command Structure:**
- Deleted legacy `PostPageCommand.cs`, `PostPageFolder.cs`, `IndexHtmlFile.cs`, `PostPageAssetFolder.cs`
- Added new `PageCommand.cs` for single-page generation
- Updated `WacsdkBlogCommands` to use new command structure

**Dependencies:**
- Added `OwlCore.Extensions` package reference for enhanced functionality

**Tests:**
- Updated test references from `InclusionStrategy` → `AssetStrategy`
- Created temporary `ReferenceOnlyAssetStrategy` for test compatibility

This refactoring enables fine-grained control over asset handling, supporting scenarios like template-based asset inclusion vs markdown-based asset referencing, with configurable fallback behavior for unknown assets.
@Arlodotexe Arlodotexe marked this pull request as draft November 24, 2025 16:02
@Arlodotexe Arlodotexe marked this pull request as ready for review May 16, 2026 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant