From 85355a85099d0cb0944ca12d48fdc8f05f3658b5 Mon Sep 17 00:00:00 2001
From: John Lancaster <32917998+jsl12@users.noreply.github.com>
Date: Sat, 20 Jun 2026 13:22:03 -0500
Subject: [PATCH] added step 4 and 5

---
 .github/prompts/plan-step4.prompt.md | 238 +++++++++++++++++++++++++++
 .github/prompts/plan-step5.prompt.md | 210 +++++++++++++++++++++++
 2 files changed, 448 insertions(+)
 create mode 100644 .github/prompts/plan-step4.prompt.md
 create mode 100644 .github/prompts/plan-step5.prompt.md
diff --git a/.github/prompts/plan-step4.prompt.md b/.github/prompts/plan-step4.prompt.md
new file mode 100644
index 0000000..3545ecc
--- /dev/null
+++ b/.github/prompts/plan-step4.prompt.md
@@ -0,0 +1,238 @@
+**Step 4 Results: Docs Registry Loader Design (importlib.resources + Fail-Fast Validation)**
+
+This section finalizes Step 4 by defining a production-ready docs registry loader that reads packaged docs through Python resource APIs, parses SKILL.md frontmatter, validates schema and cross-links, and builds an immutable in-memory registry keyed by skill_id.
+
+### Research Baseline (Python + Design Guidance)
+
+Authoritative references used for this step:
+
+1. Python `importlib.resources` docs (`files`, `as_file`, `Traversable` APIs)
+2. Python `importlib.resources.abc` docs (`Traversable`, path traversal semantics, joinpath compatibility notes)
+3. Pydantic v2 model/validation docs (`model_validate`, `ValidationError`, strictness and extra handling)
+4. Python packaging guidance for including package data in wheels/sdists
+
+Best-practice conclusions applied to this design:
+
+1. Prefer `importlib.resources.files(<package>).joinpath(...)` over filesystem assumptions so stdio deployments from installed wheels work.
+2. Treat resources as potentially non-filesystem artifacts (zip-import compatible); only use `as_file(...)` when an actual OS path is required.
+3. Validate metadata with explicit Pydantic models and fail startup on contract violations.
+4. Keep registry load deterministic (sorted traversal, stable error messages, no hidden fallback mutations).
+5. Resolve references via manifest ids declared in frontmatter, not by global file conventions.
+
+### Loader Responsibilities (Normative)
+
+The Step 4 loader MUST:
+
+1. Read canonical docs from package resources (not repo-root paths).
+2. Discover all skill directories under `docs/skills/` in packaged resources.
+3. For each skill, read and parse `SKILL.md` frontmatter.
+4. Validate frontmatter using the Step 2 schema contract.
+5. Validate directory/id invariants from Step 1 (directory name equals frontmatter id).
+6. Validate URI/reference semantics from Step 3 assumptions.
+7. Build a single in-memory registry keyed by `skill_id`.
+8. Fail fast on any integrity error before FastMCP resource registration.
+9. Precompute compact discovery projections so index resources can be served without reading full markdown bodies at request time.
+
+### Package Resource Contract
+
+Runtime anchor:
+
+1. The loader resolves content from an importable package anchor, for example `personal_mcp`.
+2. Docs root is located as `files(anchor).joinpath("docs")` when docs are packaged at package root, or an equivalent configured subpath.
+3. Skill root is `docs/skills`.
+
+Resource assumptions:
+
+1. `SKILL.md` is UTF-8 text.
+2. Reference files declared in frontmatter are UTF-8 markdown by default unless otherwise declared.
+3. Path resolution always remains inside the same skill directory.
+
+### Registry Data Model
+
+Build immutable runtime records with explicit structure:
+
+1. `SkillRecord`
+   - `skill_id`
+   - `name`
+   - `description`
+   - `version`
+   - `tags`
+   - `capabilities`
+   - `depends_on`
+   - `document_uri`
+   - `document_relpath` (canonical resource-relative path)
+   - `references` map keyed by `ref_id`
+2. `ReferenceRecord`
+   - `ref_id`
+   - `uri`
+   - `relpath`
+   - `mime_type`
+   - `title`
+3. `DocsRegistry`
+   - `skills_by_id: dict[str, SkillRecord]`
+   - `skills_in_load_order: list[str]` (deterministic ordering)
+   - helper indexes for catalog payload generation
+   - `skills_summary_in_load_order: list[SkillSummaryRecord]` for progressive discovery responses
+   - filter indexes (for example by tag/capability) derived once at startup
+
+4. `SkillSummaryRecord`
+   - `skill_id`
+   - `name`
+   - `description`
+   - `tags`
+   - `capabilities`
+   - `document_uri`
+   - optional `version`
+
+Immutability rule:
+
+1. Once built, registry records are treated as read-only for the process lifetime.
+2. No runtime mutation during requests; refresh only via process restart.
+
+### Frontmatter Parsing Contract
+
+`SKILL.md` parse steps:
+
+1. Read full markdown text from resource.
+2. Parse YAML frontmatter block at file start (between the first two `---` delimiters).
+3. Parse YAML with safe loader semantics.
+4. Validate parsed object with Step 2 Pydantic model(s).
+5. Preserve markdown body as document content payload.
+
+Parsing failure behavior:
+
+1. Missing frontmatter block: startup error.
+2. Invalid YAML: startup error with skill path and YAML parser detail.
+3. Missing required fields (`name`, `description`, `x-personal-mcp` contract fields): startup error.
+
+### Validation Pipeline (Fail-Fast)
+
+Validation happens in this order:
+
+1. Structural discovery validation
+   - skill directory exists under `docs/skills`
+   - required `SKILL.md` exists for each discovered skill
+2. Schema validation
+   - Pydantic frontmatter validation for all required and constrained fields
+3. Identity validation
+   - frontmatter `name` equals `x-personal-mcp.id`
+   - frontmatter id equals skill directory name
+4. Reference manifest validation
+   - unique `ref_id` keys per skill
+   - each manifest path is relative, in-skill, and under `references/`
+   - each manifest target exists and is a file
+5. Dependency graph validation
+   - every `depends_on` target exists in discovered skill set
+   - no self-dependency
+   - cycle detection enabled (hard error on cycle)
+6. Capability sanity checks
+   - required primary capability `resource://skills/{skill_id}/document` is present
+7. Global uniqueness checks
+   - no duplicate `skill_id`
+   - no duplicate canonical resource URIs generated from registry
+8. Discovery payload checks
+   - summary fields required by catalog index are present and non-empty
+   - summary generation does not require reading markdown body content during request handling
+
+### Error Model and Reporting
+
+Error handling contract:
+
+1. Collect errors per validation phase for clarity, then raise one startup exception containing all findings.
+2. Error messages must include:
+   - skill id (when known)
+   - packaged relative path
+   - violated rule
+   - actionable fix hint
+3. If any error exists, registry is not published and FastMCP resource registration does not proceed.
+
+Recommended exception shape:
+
+1. `DocsRegistryValidationError(errors: list[RegistryIssue])`
+2. `RegistryIssue` fields: `code`, `message`, `skill_id`, `path`, `hint`
+
+### Determinism and Runtime Safety
+
+Determinism rules:
+
+1. Traverse directories in sorted order.
+2. Normalize all stored relative paths to POSIX form.
+3. Normalize ids/tags exactly once at parse boundary.
+4. Produce stable catalog ordering to reduce client churn.
+5. Produce stable summary projections and filter indexes from the same normalized source records.
+
+Runtime safety rules:
+
+1. No dependence on `Path(__file__)` or repository root.
+2. No ad-hoc fallback probing across multiple locations.
+3. No lazy validation deferred until first request.
+
+### Integration Plan for Existing Modules
+
+Primary integration target:
+
+1. Replace path-based logic in `src/personal_mcp/skills/document_loader.py` with package-resource-based registry loading.
+
+Catalog integration:
+
+1. Update `src/personal_mcp/catalog/server.py` to consume the shared in-memory registry instead of scanning `metadata.yaml` files.
+2. Keep catalog payload normalization deterministic and sourced from registry records only.
+
+Startup wiring:
+
+1. Initialize registry once during app/server startup in `src/personal_mcp/main.py` or equivalent composition point.
+2. Pass registry to resource registration step (Step 5).
+
+### Proposed Loader API Surface
+
+Use a small, testable API:
+
+1. `load_docs_registry(*, package_anchor: str, docs_root: str = "docs") -> DocsRegistry`
+2. `read_skill_document(registry: DocsRegistry, skill_id: str) -> DocumentPayload`
+3. `read_skill_reference(registry: DocsRegistry, skill_id: str, ref_id: str) -> DocumentPayload`
+
+Design constraints:
+
+1. Loader functions are pure relative to package resources and input args.
+2. No global mutable singleton required for unit tests.
+3. Caching is explicit and owned by startup composition.
+
+### Test and Validation Plan (Step 4 Scope)
+
+Unit tests:
+
+1. valid multi-skill registry load from packaged test fixtures
+2. duplicate id detection
+3. missing SKILL.md detection
+4. invalid frontmatter field constraints
+5. broken reference target detection
+6. invalid depends_on target detection
+7. cycle detection in depends_on graph
+8. deterministic output ordering across runs
+
+Packaging/runtime tests:
+
+1. install built wheel in isolated env
+2. load registry via `importlib.resources.files(...)`
+3. assert representative skill document/reference are readable
+
+Expected command path in this repo:
+
+1. `uv run pytest -q`
+
+### Acceptance Criteria for Step 4 Completion
+
+Step 4 is complete when all are true:
+
+1. Registry loads exclusively from packaged resources.
+2. All Step 2 and Step 3 dependent validations are enforced at startup.
+3. Invalid docs state blocks startup with actionable diagnostics.
+4. Registry is deterministic and immutable for runtime use.
+5. Catalog and later resource registration can consume registry without direct filesystem scanning.
+
+### Non-goals for Step 4
+
+1. No FastMCP resource registration wiring details (Step 5).
+2. No discovery-tool fallback behavior design (Step 6).
+3. No final packaging/build-system migration mechanics (Step 7).
+4. No backward-compat alias rollout mechanics beyond validation readiness.
diff --git a/.github/prompts/plan-step5.prompt.md b/.github/prompts/plan-step5.prompt.md
new file mode 100644
index 0000000..efcf3e8
--- /dev/null
+++ b/.github/prompts/plan-step5.prompt.md
@@ -0,0 +1,210 @@
+**Step 5 Results: Registry-Driven FastMCP Resource Registration (RFC6570 + Startup Safety)**
+
+This section finalizes Step 5 by defining how FastMCP resources are registered from the Step 4 docs registry using RFC6570 URI templates, explicit metadata, and strict duplicate-registration safety.
+
+### Research Baseline (FastMCP + URI Templates)
+
+Authoritative references used for this step:
+
+1. FastMCP Resources and Templates docs (resource decorator, template behavior)
+2. FastMCP RFC6570 support docs (simple params, wildcard params, query params)
+3. FastMCP duplicate handling docs (`on_duplicate_resources`)
+4. FastMCP annotations guidance (`readOnlyHint`, `idempotentHint`)
+
+Best-practice conclusions applied to this design:
+
+1. Use URI templates for parameterized resources instead of generating N static resource handlers.
+2. Use wildcard template parameters (`{path*}`) for hierarchical docs paths.
+3. Set startup duplicate policy to `on_duplicate_resources="error"` to fail fast on contract collisions.
+4. Set explicit `mime_type` and resource annotations for all docs resources.
+5. Keep registration deterministic and sourced only from the validated Step 4 registry.
+
+### Registration Responsibilities (Normative)
+
+The Step 5 registration layer MUST:
+
+1. Consume only the validated in-memory registry produced by Step 4.
+2. Register canonical resource discovery surfaces and skill document/reference surfaces.
+3. Use RFC6570 templates where URI patterns are parameterized.
+4. Use wildcard templates where path depth is variable.
+5. Attach read-only/idempotent annotations to documentation resources.
+6. Set explicit MIME types for all registered resources.
+7. Fail startup if duplicate URI/template keys are encountered.
+
+### Canonical Resource Surface (from Registry)
+
+The preferred resources registered in this phase are:
+
+1. `resource://catalog/skills_index`
+2. `resource://catalog/skills_index{?q,tag,capability,cursor,limit}` (optional filtered/paginated discovery template)
+3. `resource://catalog/skills/{skill_id}`
+4. `resource://skills/{skill_id}/document`
+5. `resource://skills/{skill_id}/references/{ref_id}`
+6. `resource://docs/{path*}`
+
+Registration decision rules:
+
+1. Use static resource registration for fixed singleton endpoints (for example `skills_index`).
+2. Use template registration for parameterized endpoints (`{skill_id}`, `{ref_id}`) and optional discovery query templates.
+3. Use wildcard template registration for hierarchical docs routing (`{path*}`).
+4. Keep the singleton and query-template discovery surfaces semantically equivalent (same schema, query template adds filtering/pagination only).
+
+### Progressive Discovery Contract
+
+Discovery-first behavior for Step 5 resources:
+
+1. `skills_index` returns summaries only (no embedded full SKILL.md bodies).
+2. Each summary includes canonical follow-up URIs so clients can progressively fetch detail (`catalog/skills/{skill_id}` then `skills/{skill_id}/document`).
+3. Filtered/paginated discovery uses RFC6570 query params (`q`, `tag`, `capability`, `cursor`, `limit`) with deterministic ordering.
+4. Handlers should enforce bounded page size and return explicit continuation metadata when pagination is active.
+5. Errors for unsupported filter params or invalid cursor/limit are explicit and actionable.
+
+### RFC6570 Template Contract
+
+Path parameters:
+
+1. `{skill_id}` and `{ref_id}` are single-segment template params.
+2. `{path*}` is a wildcard param and may capture multi-segment paths separated by `/`.
+
+Validation contract at resource-read time:
+
+1. `skill_id` must exist in registry.
+2. `ref_id` must exist in that skill’s reference manifest.
+3. wildcard `path*` must normalize to an allowed docs-relative markdown path.
+4. invalid params return explicit not-found or validation errors (no silent fallback).
+
+Template function signature contract:
+
+1. Required URI params must exist as function parameters.
+2. Avoid hidden implicit params not represented in template.
+3. Keep template handlers side-effect free.
+
+### Metadata and Annotation Contract
+
+Each docs/resource registration should specify explicit metadata:
+
+1. `mime_type`
+   - skill docs and references: `text/markdown`
+   - catalog payloads: `application/json`
+2. `annotations`
+   - `readOnlyHint: true`
+   - `idempotentHint: true`
+3. `tags`
+   - include stable categories such as `catalog`, `skill-doc`, `reference`, `docs`
+4. `version`
+   - project-defined version from registry metadata where applicable
+5. `meta`
+   - include normalized identifiers (for example `skill_id`, `ref_id`, `source_relpath`) when useful
+
+### Startup Safety and Duplicate Policy
+
+FastMCP initialization contract for this phase:
+
+1. Construct the root server with `on_duplicate_resources="error"`.
+2. Register all Step 5 resources during startup composition before serving traffic.
+3. Treat duplicate registration as a hard startup failure.
+
+Duplicate conflict classes covered:
+
+1. static URI vs static URI collision
+2. static URI vs template key collision
+3. template URI vs template URI collision
+4. conflicting registrations introduced by future aliases without explicit migration handling
+
+### Registration Architecture
+
+Use one dedicated registration module that converts registry records into FastMCP resources.
+
+Recommended API:
+
+1. `register_docs_resources(mcp: FastMCP, registry: DocsRegistry) -> None`
+
+Responsibilities of `register_docs_resources`:
+
+1. register singleton catalog resources
+2. register parameterized catalog/detail templates
+3. register skill document and reference templates
+4. register docs wildcard template
+5. apply shared annotations and MIME defaults consistently
+
+Separation of concerns:
+
+1. Step 4 validates and normalizes docs state.
+2. Step 5 only registers handlers and reads from validated registry state.
+3. Request handlers do not re-discover filesystem/package structure.
+
+### Handler Behavior Contract
+
+Catalog handlers:
+
+1. `skills_index` returns compact deterministic discovery payload (summary records only) and supports progressive follow-up links.
+2. `skills/{skill_id}` returns one normalized detail record or not-found.
+
+Skill document handlers:
+
+1. `skills/{skill_id}/document` returns canonical SKILL markdown content.
+2. MIME type is always `text/markdown`.
+
+Reference handlers:
+
+1. `skills/{skill_id}/references/{ref_id}` resolves via frontmatter manifest mapping.
+2. MIME type is explicit from manifest or defaults to `text/markdown`.
+
+Wildcard docs handler:
+
+1. `docs/{path*}` serves markdown docs under canonical packaged docs tree.
+2. traversal outside docs root is blocked.
+
+### Integration Plan for Existing Modules
+
+Primary composition updates:
+
+1. Introduce registry-driven registration in [src/personal_mcp/mcp.py](src/personal_mcp/mcp.py).
+2. Keep [src/personal_mcp/main.py](src/personal_mcp/main.py) responsible for startup wiring order (load registry first, then register resources).
+3. Refactor [src/personal_mcp/catalog/server.py](src/personal_mcp/catalog/server.py) toward registry-backed handlers.
+
+Lifecycle order (required):
+
+1. load and validate registry (Step 4)
+2. initialize FastMCP with duplicate error policy
+3. register all Step 5 resources/templates
+4. start server
+
+### Testing Plan (Step 5 Scope)
+
+Unit/integration tests:
+
+1. resource registration succeeds with valid registry
+2. duplicate resource registration fails at startup
+3. `skills/{skill_id}` template resolves expected record
+4. `skills/{skill_id}/document` returns markdown with correct MIME
+5. `skills/{skill_id}/references/{ref_id}` resolves manifest-mapped file
+6. `docs/{path*}` resolves nested docs paths and blocks traversal attempts
+7. all registered docs resources include `readOnlyHint` and `idempotentHint`
+8. catalog payload order is deterministic
+9. filtered/paginated `skills_index{?q,tag,capability,cursor,limit}` responses are deterministic and schema-compatible with the singleton index response
+10. catalog index payload excludes full markdown bodies and includes follow-up URIs for progressive reads
+
+Smoke tests:
+
+1. list resources includes singleton and template entries
+2. read representative skill doc URI and reference URI successfully
+3. read representative wildcard docs URI successfully
+
+### Acceptance Criteria for Step 5 Completion
+
+Step 5 is complete when all are true:
+
+1. Resource registration is fully registry-driven (no per-skill hardcoded decorators required for core docs surfaces).
+2. RFC6570 templates are used for parameterized URI families, including wildcard where needed.
+3. All docs resources declare explicit MIME types and read-only/idempotent annotations.
+4. `on_duplicate_resources="error"` is enabled and verified by tests.
+5. Startup fails safely on registration conflicts.
+
+### Non-goals for Step 5
+
+1. No tool fallback discovery behavior implementation (Step 6).
+2. No packaging build inclusion mechanics (Step 7).
+3. No CI gate expansion details (Step 9).
+4. No migration shims for legacy URI aliases beyond what is needed to preserve current behavior.
+5. No ranking-strategy implementation for discovery tools beyond what is needed to preserve deterministic resource-first discovery contracts.