Files
prompts/.github/prompts/plan-step4.prompt.md
T
John Lancaster adaa4177fe adjustments
2026-06-20 13:33:18 -05:00

9.5 KiB

Step 4 Results: Docs Registry Loader Design (importlib.resources + Fail-Fast Validation)

This section finalizes Step 4 by defining a production-ready docs registry loader that reads packaged docs through Python resource APIs, parses SKILL.md frontmatter, validates schema and cross-links, and builds an immutable in-memory registry keyed by skill_id.

Greenfield Framing (Normative)

This Step 4 design is for the greenfield target state:

  1. No legacy metadata sidecars (metadata.yaml) are part of the runtime contract.
  2. No dual-loader compatibility path is required.
  3. Registry loading from packaged resources is the only runtime source of truth.
  4. Compatibility shims are prohibited.

Research Baseline (Python + Design Guidance)

Authoritative references used for this step:

  1. Python importlib.resources docs (files, as_file, Traversable APIs)
  2. Python importlib.resources.abc docs (Traversable, path traversal semantics, joinpath compatibility notes)
  3. Pydantic v2 model/validation docs (model_validate, ValidationError, strictness and extra handling)
  4. Python packaging guidance for including package data in wheels/sdists

Best-practice conclusions applied to this design:

  1. Prefer importlib.resources.files(<package>).joinpath(...) over filesystem assumptions so stdio deployments from installed wheels work.
  2. Treat resources as potentially non-filesystem artifacts (zip-import compatible); only use as_file(...) when an actual OS path is required.
  3. Validate metadata with explicit Pydantic models and fail startup on contract violations.
  4. Keep registry load deterministic (sorted traversal, stable error messages, no hidden fallback mutations).
  5. Resolve references via manifest ids declared in frontmatter, not by global file conventions.

Loader Responsibilities (Normative)

The Step 4 loader MUST:

  1. Read canonical docs from package resources (not repo-root paths).
  2. Discover all skill directories under docs/skills/ in packaged resources.
  3. For each skill, read and parse SKILL.md frontmatter.
  4. Validate frontmatter using the Step 2 schema contract.
  5. Validate directory/id invariants from Step 1 (directory name equals frontmatter id).
  6. Validate URI/reference semantics from Step 3 assumptions.
  7. Build a single in-memory registry keyed by skill_id.
  8. Fail fast on any integrity error before FastMCP resource registration.
  9. Precompute compact discovery projections so index resources can be served without reading full markdown bodies at request time.

Package Resource Contract

Runtime anchor:

  1. The loader resolves content from an importable package anchor, for example personal_mcp.
  2. Docs root is located as files(anchor).joinpath("docs") when docs are packaged at package root, or an equivalent configured subpath.
  3. Skill root is docs/skills.

Resource assumptions:

  1. SKILL.md is UTF-8 text.
  2. Reference files declared in frontmatter are UTF-8 markdown by default unless otherwise declared.
  3. Path resolution always remains inside the same skill directory.

Registry Data Model

Build immutable runtime records with explicit structure:

  1. SkillRecord

    • skill_id
    • name
    • description
    • version
    • tags
    • capabilities
    • depends_on
    • document_uri
    • document_relpath (canonical resource-relative path)
    • references map keyed by ref_id
  2. ReferenceRecord

    • ref_id
    • uri
    • relpath
    • mime_type
    • title
  3. DocsRegistry

    • skills_by_id: dict[str, SkillRecord]
    • skills_in_load_order: list[str] (deterministic ordering)
    • helper indexes for catalog payload generation
    • skills_summary_in_load_order: list[SkillSummaryRecord] for progressive discovery responses
    • filter indexes (for example by tag/capability) derived once at startup
  4. SkillSummaryRecord

    • skill_id
    • name
    • description
    • tags
    • capabilities
    • document_uri
    • optional version

Immutability rule:

  1. Once built, registry records are treated as read-only for the process lifetime.
  2. No runtime mutation during requests; refresh only via process restart.

Frontmatter Parsing Contract

SKILL.md parse steps:

  1. Read full markdown text from resource.
  2. Parse YAML frontmatter block at file start (between the first two --- delimiters).
  3. Parse YAML with safe loader semantics.
  4. Validate parsed object with Step 2 Pydantic model(s).
  5. Preserve markdown body as document content payload.

Parsing failure behavior:

  1. Missing frontmatter block: startup error.
  2. Invalid YAML: startup error with skill path and YAML parser detail.
  3. Missing required fields (name, description, x-personal-mcp contract fields): startup error.

Validation Pipeline (Fail-Fast)

Validation happens in this order:

  1. Structural discovery validation
    • skill directory exists under docs/skills
    • required SKILL.md exists for each discovered skill
  2. Schema validation
    • Pydantic frontmatter validation for all required and constrained fields
  3. Identity validation
    • frontmatter name equals x-personal-mcp.id
    • frontmatter id equals skill directory name
  4. Reference manifest validation
    • unique ref_id keys per skill
    • each manifest path is relative, in-skill, and under references/
    • each manifest target exists and is a file
  5. Dependency graph validation
    • every depends_on target exists in discovered skill set
    • no self-dependency
    • cycle detection enabled (hard error on cycle)
  6. Capability sanity checks
    • required primary capability resource://skills/{skill_id}/document is present
  7. Global uniqueness checks
    • no duplicate skill_id
    • no duplicate canonical resource URIs generated from registry
  8. Discovery payload checks
    • summary fields required by catalog index are present and non-empty
    • summary generation does not require reading markdown body content during request handling

Error Model and Reporting

Error handling contract:

  1. Collect errors per validation phase for clarity, then raise one startup exception containing all findings.
  2. Error messages must include:
    • skill id (when known)
    • packaged relative path
    • violated rule
    • actionable fix hint
  3. If any error exists, registry is not published and FastMCP resource registration does not proceed.

Recommended exception shape:

  1. DocsRegistryValidationError(errors: list[RegistryIssue])
  2. RegistryIssue fields: code, message, skill_id, path, hint

Determinism and Runtime Safety

Determinism rules:

  1. Traverse directories in sorted order.
  2. Normalize all stored relative paths to POSIX form.
  3. Normalize ids/tags exactly once at parse boundary.
  4. Produce stable catalog ordering to reduce client churn.
  5. Produce stable summary projections and filter indexes from the same normalized source records.

Runtime safety rules:

  1. No dependence on Path(__file__) or repository root.
  2. No ad-hoc fallback probing across multiple locations.
  3. No lazy validation deferred until first request.

Integration Plan for Existing Modules

Primary integration target:

  1. Implement the canonical package-resource-based registry loader in src/personal_mcp/skills/document_loader.py as the only supported runtime loader path.

Catalog integration:

  1. Update src/personal_mcp/catalog/server.py to consume the shared in-memory registry as the only catalog data source.
  2. Keep catalog payload normalization deterministic and sourced from registry records only.

Startup wiring:

  1. Initialize registry once during app/server startup in src/personal_mcp/main.py or equivalent composition point.
  2. Pass registry to resource registration step (Step 5).

Proposed Loader API Surface

Use a small, testable API:

  1. load_docs_registry(*, package_anchor: str, docs_root: str = "docs") -> DocsRegistry
  2. read_skill_document(registry: DocsRegistry, skill_id: str) -> DocumentPayload
  3. read_skill_reference(registry: DocsRegistry, skill_id: str, ref_id: str) -> DocumentPayload

Design constraints:

  1. Loader functions are pure relative to package resources and input args.
  2. No global mutable singleton required for unit tests.
  3. Caching is explicit and owned by startup composition.

Test and Validation Plan (Step 4 Scope)

Unit tests:

  1. valid multi-skill registry load from packaged test fixtures
  2. duplicate id detection
  3. missing SKILL.md detection
  4. invalid frontmatter field constraints
  5. broken reference target detection
  6. invalid depends_on target detection
  7. cycle detection in depends_on graph
  8. deterministic output ordering across runs

Packaging/runtime tests:

  1. install built wheel in isolated env
  2. load registry via importlib.resources.files(...)
  3. assert representative skill document/reference are readable

Expected command path in this repo:

  1. uv run pytest -q

Acceptance Criteria for Step 4 Completion

Step 4 is complete when all are true:

  1. Registry loads exclusively from packaged resources.
  2. All Step 2 and Step 3 dependent validations are enforced at startup.
  3. Invalid docs state blocks startup with actionable diagnostics.
  4. Registry is deterministic and immutable for runtime use.
  5. Catalog and later resource registration can consume registry without direct filesystem scanning.

Non-goals for Step 4

  1. No FastMCP resource registration wiring details (Step 5).
  2. No discovery-tool fallback behavior design (Step 6).
  3. No final packaging/build-system migration mechanics (Step 7).
  4. No backward-compat alias rollout mechanics in the greenfield baseline.
  5. No compatibility layer of any kind (URI aliases, dual reads, adapter shims, or legacy schema bridges).