From 85355a85099d0cb0944ca12d48fdc8f05f3658b5 Mon Sep 17 00:00:00 2001 From: John Lancaster <32917998+jsl12@users.noreply.github.com> Date: Sat, 20 Jun 2026 13:22:03 -0500 Subject: [PATCH] added step 4 and 5 --- .github/prompts/plan-step4.prompt.md | 238 +++++++++++++++++++++++++++ .github/prompts/plan-step5.prompt.md | 210 +++++++++++++++++++++++ 2 files changed, 448 insertions(+) create mode 100644 .github/prompts/plan-step4.prompt.md create mode 100644 .github/prompts/plan-step5.prompt.md diff --git a/.github/prompts/plan-step4.prompt.md b/.github/prompts/plan-step4.prompt.md new file mode 100644 index 0000000..3545ecc --- /dev/null +++ b/.github/prompts/plan-step4.prompt.md @@ -0,0 +1,238 @@ +**Step 4 Results: Docs Registry Loader Design (importlib.resources + Fail-Fast Validation)** + +This section finalizes Step 4 by defining a production-ready docs registry loader that reads packaged docs through Python resource APIs, parses SKILL.md frontmatter, validates schema and cross-links, and builds an immutable in-memory registry keyed by skill_id. + +### Research Baseline (Python + Design Guidance) + +Authoritative references used for this step: + +1. Python `importlib.resources` docs (`files`, `as_file`, `Traversable` APIs) +2. Python `importlib.resources.abc` docs (`Traversable`, path traversal semantics, joinpath compatibility notes) +3. Pydantic v2 model/validation docs (`model_validate`, `ValidationError`, strictness and extra handling) +4. Python packaging guidance for including package data in wheels/sdists + +Best-practice conclusions applied to this design: + +1. Prefer `importlib.resources.files().joinpath(...)` over filesystem assumptions so stdio deployments from installed wheels work. +2. Treat resources as potentially non-filesystem artifacts (zip-import compatible); only use `as_file(...)` when an actual OS path is required. +3. Validate metadata with explicit Pydantic models and fail startup on contract violations. +4. Keep registry load deterministic (sorted traversal, stable error messages, no hidden fallback mutations). +5. Resolve references via manifest ids declared in frontmatter, not by global file conventions. + +### Loader Responsibilities (Normative) + +The Step 4 loader MUST: + +1. Read canonical docs from package resources (not repo-root paths). +2. Discover all skill directories under `docs/skills/` in packaged resources. +3. For each skill, read and parse `SKILL.md` frontmatter. +4. Validate frontmatter using the Step 2 schema contract. +5. Validate directory/id invariants from Step 1 (directory name equals frontmatter id). +6. Validate URI/reference semantics from Step 3 assumptions. +7. Build a single in-memory registry keyed by `skill_id`. +8. Fail fast on any integrity error before FastMCP resource registration. +9. Precompute compact discovery projections so index resources can be served without reading full markdown bodies at request time. + +### Package Resource Contract + +Runtime anchor: + +1. The loader resolves content from an importable package anchor, for example `personal_mcp`. +2. Docs root is located as `files(anchor).joinpath("docs")` when docs are packaged at package root, or an equivalent configured subpath. +3. Skill root is `docs/skills`. + +Resource assumptions: + +1. `SKILL.md` is UTF-8 text. +2. Reference files declared in frontmatter are UTF-8 markdown by default unless otherwise declared. +3. Path resolution always remains inside the same skill directory. + +### Registry Data Model + +Build immutable runtime records with explicit structure: + +1. `SkillRecord` + - `skill_id` + - `name` + - `description` + - `version` + - `tags` + - `capabilities` + - `depends_on` + - `document_uri` + - `document_relpath` (canonical resource-relative path) + - `references` map keyed by `ref_id` +2. `ReferenceRecord` + - `ref_id` + - `uri` + - `relpath` + - `mime_type` + - `title` +3. `DocsRegistry` + - `skills_by_id: dict[str, SkillRecord]` + - `skills_in_load_order: list[str]` (deterministic ordering) + - helper indexes for catalog payload generation + - `skills_summary_in_load_order: list[SkillSummaryRecord]` for progressive discovery responses + - filter indexes (for example by tag/capability) derived once at startup + +4. `SkillSummaryRecord` + - `skill_id` + - `name` + - `description` + - `tags` + - `capabilities` + - `document_uri` + - optional `version` + +Immutability rule: + +1. Once built, registry records are treated as read-only for the process lifetime. +2. No runtime mutation during requests; refresh only via process restart. + +### Frontmatter Parsing Contract + +`SKILL.md` parse steps: + +1. Read full markdown text from resource. +2. Parse YAML frontmatter block at file start (between the first two `---` delimiters). +3. Parse YAML with safe loader semantics. +4. Validate parsed object with Step 2 Pydantic model(s). +5. Preserve markdown body as document content payload. + +Parsing failure behavior: + +1. Missing frontmatter block: startup error. +2. Invalid YAML: startup error with skill path and YAML parser detail. +3. Missing required fields (`name`, `description`, `x-personal-mcp` contract fields): startup error. + +### Validation Pipeline (Fail-Fast) + +Validation happens in this order: + +1. Structural discovery validation + - skill directory exists under `docs/skills` + - required `SKILL.md` exists for each discovered skill +2. Schema validation + - Pydantic frontmatter validation for all required and constrained fields +3. Identity validation + - frontmatter `name` equals `x-personal-mcp.id` + - frontmatter id equals skill directory name +4. Reference manifest validation + - unique `ref_id` keys per skill + - each manifest path is relative, in-skill, and under `references/` + - each manifest target exists and is a file +5. Dependency graph validation + - every `depends_on` target exists in discovered skill set + - no self-dependency + - cycle detection enabled (hard error on cycle) +6. Capability sanity checks + - required primary capability `resource://skills/{skill_id}/document` is present +7. Global uniqueness checks + - no duplicate `skill_id` + - no duplicate canonical resource URIs generated from registry +8. Discovery payload checks + - summary fields required by catalog index are present and non-empty + - summary generation does not require reading markdown body content during request handling + +### Error Model and Reporting + +Error handling contract: + +1. Collect errors per validation phase for clarity, then raise one startup exception containing all findings. +2. Error messages must include: + - skill id (when known) + - packaged relative path + - violated rule + - actionable fix hint +3. If any error exists, registry is not published and FastMCP resource registration does not proceed. + +Recommended exception shape: + +1. `DocsRegistryValidationError(errors: list[RegistryIssue])` +2. `RegistryIssue` fields: `code`, `message`, `skill_id`, `path`, `hint` + +### Determinism and Runtime Safety + +Determinism rules: + +1. Traverse directories in sorted order. +2. Normalize all stored relative paths to POSIX form. +3. Normalize ids/tags exactly once at parse boundary. +4. Produce stable catalog ordering to reduce client churn. +5. Produce stable summary projections and filter indexes from the same normalized source records. + +Runtime safety rules: + +1. No dependence on `Path(__file__)` or repository root. +2. No ad-hoc fallback probing across multiple locations. +3. No lazy validation deferred until first request. + +### Integration Plan for Existing Modules + +Primary integration target: + +1. Replace path-based logic in `src/personal_mcp/skills/document_loader.py` with package-resource-based registry loading. + +Catalog integration: + +1. Update `src/personal_mcp/catalog/server.py` to consume the shared in-memory registry instead of scanning `metadata.yaml` files. +2. Keep catalog payload normalization deterministic and sourced from registry records only. + +Startup wiring: + +1. Initialize registry once during app/server startup in `src/personal_mcp/main.py` or equivalent composition point. +2. Pass registry to resource registration step (Step 5). + +### Proposed Loader API Surface + +Use a small, testable API: + +1. `load_docs_registry(*, package_anchor: str, docs_root: str = "docs") -> DocsRegistry` +2. `read_skill_document(registry: DocsRegistry, skill_id: str) -> DocumentPayload` +3. `read_skill_reference(registry: DocsRegistry, skill_id: str, ref_id: str) -> DocumentPayload` + +Design constraints: + +1. Loader functions are pure relative to package resources and input args. +2. No global mutable singleton required for unit tests. +3. Caching is explicit and owned by startup composition. + +### Test and Validation Plan (Step 4 Scope) + +Unit tests: + +1. valid multi-skill registry load from packaged test fixtures +2. duplicate id detection +3. missing SKILL.md detection +4. invalid frontmatter field constraints +5. broken reference target detection +6. invalid depends_on target detection +7. cycle detection in depends_on graph +8. deterministic output ordering across runs + +Packaging/runtime tests: + +1. install built wheel in isolated env +2. load registry via `importlib.resources.files(...)` +3. assert representative skill document/reference are readable + +Expected command path in this repo: + +1. `uv run pytest -q` + +### Acceptance Criteria for Step 4 Completion + +Step 4 is complete when all are true: + +1. Registry loads exclusively from packaged resources. +2. All Step 2 and Step 3 dependent validations are enforced at startup. +3. Invalid docs state blocks startup with actionable diagnostics. +4. Registry is deterministic and immutable for runtime use. +5. Catalog and later resource registration can consume registry without direct filesystem scanning. + +### Non-goals for Step 4 + +1. No FastMCP resource registration wiring details (Step 5). +2. No discovery-tool fallback behavior design (Step 6). +3. No final packaging/build-system migration mechanics (Step 7). +4. No backward-compat alias rollout mechanics beyond validation readiness. diff --git a/.github/prompts/plan-step5.prompt.md b/.github/prompts/plan-step5.prompt.md new file mode 100644 index 0000000..efcf3e8 --- /dev/null +++ b/.github/prompts/plan-step5.prompt.md @@ -0,0 +1,210 @@ +**Step 5 Results: Registry-Driven FastMCP Resource Registration (RFC6570 + Startup Safety)** + +This section finalizes Step 5 by defining how FastMCP resources are registered from the Step 4 docs registry using RFC6570 URI templates, explicit metadata, and strict duplicate-registration safety. + +### Research Baseline (FastMCP + URI Templates) + +Authoritative references used for this step: + +1. FastMCP Resources and Templates docs (resource decorator, template behavior) +2. FastMCP RFC6570 support docs (simple params, wildcard params, query params) +3. FastMCP duplicate handling docs (`on_duplicate_resources`) +4. FastMCP annotations guidance (`readOnlyHint`, `idempotentHint`) + +Best-practice conclusions applied to this design: + +1. Use URI templates for parameterized resources instead of generating N static resource handlers. +2. Use wildcard template parameters (`{path*}`) for hierarchical docs paths. +3. Set startup duplicate policy to `on_duplicate_resources="error"` to fail fast on contract collisions. +4. Set explicit `mime_type` and resource annotations for all docs resources. +5. Keep registration deterministic and sourced only from the validated Step 4 registry. + +### Registration Responsibilities (Normative) + +The Step 5 registration layer MUST: + +1. Consume only the validated in-memory registry produced by Step 4. +2. Register canonical resource discovery surfaces and skill document/reference surfaces. +3. Use RFC6570 templates where URI patterns are parameterized. +4. Use wildcard templates where path depth is variable. +5. Attach read-only/idempotent annotations to documentation resources. +6. Set explicit MIME types for all registered resources. +7. Fail startup if duplicate URI/template keys are encountered. + +### Canonical Resource Surface (from Registry) + +The preferred resources registered in this phase are: + +1. `resource://catalog/skills_index` +2. `resource://catalog/skills_index{?q,tag,capability,cursor,limit}` (optional filtered/paginated discovery template) +3. `resource://catalog/skills/{skill_id}` +4. `resource://skills/{skill_id}/document` +5. `resource://skills/{skill_id}/references/{ref_id}` +6. `resource://docs/{path*}` + +Registration decision rules: + +1. Use static resource registration for fixed singleton endpoints (for example `skills_index`). +2. Use template registration for parameterized endpoints (`{skill_id}`, `{ref_id}`) and optional discovery query templates. +3. Use wildcard template registration for hierarchical docs routing (`{path*}`). +4. Keep the singleton and query-template discovery surfaces semantically equivalent (same schema, query template adds filtering/pagination only). + +### Progressive Discovery Contract + +Discovery-first behavior for Step 5 resources: + +1. `skills_index` returns summaries only (no embedded full SKILL.md bodies). +2. Each summary includes canonical follow-up URIs so clients can progressively fetch detail (`catalog/skills/{skill_id}` then `skills/{skill_id}/document`). +3. Filtered/paginated discovery uses RFC6570 query params (`q`, `tag`, `capability`, `cursor`, `limit`) with deterministic ordering. +4. Handlers should enforce bounded page size and return explicit continuation metadata when pagination is active. +5. Errors for unsupported filter params or invalid cursor/limit are explicit and actionable. + +### RFC6570 Template Contract + +Path parameters: + +1. `{skill_id}` and `{ref_id}` are single-segment template params. +2. `{path*}` is a wildcard param and may capture multi-segment paths separated by `/`. + +Validation contract at resource-read time: + +1. `skill_id` must exist in registry. +2. `ref_id` must exist in that skill’s reference manifest. +3. wildcard `path*` must normalize to an allowed docs-relative markdown path. +4. invalid params return explicit not-found or validation errors (no silent fallback). + +Template function signature contract: + +1. Required URI params must exist as function parameters. +2. Avoid hidden implicit params not represented in template. +3. Keep template handlers side-effect free. + +### Metadata and Annotation Contract + +Each docs/resource registration should specify explicit metadata: + +1. `mime_type` + - skill docs and references: `text/markdown` + - catalog payloads: `application/json` +2. `annotations` + - `readOnlyHint: true` + - `idempotentHint: true` +3. `tags` + - include stable categories such as `catalog`, `skill-doc`, `reference`, `docs` +4. `version` + - project-defined version from registry metadata where applicable +5. `meta` + - include normalized identifiers (for example `skill_id`, `ref_id`, `source_relpath`) when useful + +### Startup Safety and Duplicate Policy + +FastMCP initialization contract for this phase: + +1. Construct the root server with `on_duplicate_resources="error"`. +2. Register all Step 5 resources during startup composition before serving traffic. +3. Treat duplicate registration as a hard startup failure. + +Duplicate conflict classes covered: + +1. static URI vs static URI collision +2. static URI vs template key collision +3. template URI vs template URI collision +4. conflicting registrations introduced by future aliases without explicit migration handling + +### Registration Architecture + +Use one dedicated registration module that converts registry records into FastMCP resources. + +Recommended API: + +1. `register_docs_resources(mcp: FastMCP, registry: DocsRegistry) -> None` + +Responsibilities of `register_docs_resources`: + +1. register singleton catalog resources +2. register parameterized catalog/detail templates +3. register skill document and reference templates +4. register docs wildcard template +5. apply shared annotations and MIME defaults consistently + +Separation of concerns: + +1. Step 4 validates and normalizes docs state. +2. Step 5 only registers handlers and reads from validated registry state. +3. Request handlers do not re-discover filesystem/package structure. + +### Handler Behavior Contract + +Catalog handlers: + +1. `skills_index` returns compact deterministic discovery payload (summary records only) and supports progressive follow-up links. +2. `skills/{skill_id}` returns one normalized detail record or not-found. + +Skill document handlers: + +1. `skills/{skill_id}/document` returns canonical SKILL markdown content. +2. MIME type is always `text/markdown`. + +Reference handlers: + +1. `skills/{skill_id}/references/{ref_id}` resolves via frontmatter manifest mapping. +2. MIME type is explicit from manifest or defaults to `text/markdown`. + +Wildcard docs handler: + +1. `docs/{path*}` serves markdown docs under canonical packaged docs tree. +2. traversal outside docs root is blocked. + +### Integration Plan for Existing Modules + +Primary composition updates: + +1. Introduce registry-driven registration in [src/personal_mcp/mcp.py](src/personal_mcp/mcp.py). +2. Keep [src/personal_mcp/main.py](src/personal_mcp/main.py) responsible for startup wiring order (load registry first, then register resources). +3. Refactor [src/personal_mcp/catalog/server.py](src/personal_mcp/catalog/server.py) toward registry-backed handlers. + +Lifecycle order (required): + +1. load and validate registry (Step 4) +2. initialize FastMCP with duplicate error policy +3. register all Step 5 resources/templates +4. start server + +### Testing Plan (Step 5 Scope) + +Unit/integration tests: + +1. resource registration succeeds with valid registry +2. duplicate resource registration fails at startup +3. `skills/{skill_id}` template resolves expected record +4. `skills/{skill_id}/document` returns markdown with correct MIME +5. `skills/{skill_id}/references/{ref_id}` resolves manifest-mapped file +6. `docs/{path*}` resolves nested docs paths and blocks traversal attempts +7. all registered docs resources include `readOnlyHint` and `idempotentHint` +8. catalog payload order is deterministic +9. filtered/paginated `skills_index{?q,tag,capability,cursor,limit}` responses are deterministic and schema-compatible with the singleton index response +10. catalog index payload excludes full markdown bodies and includes follow-up URIs for progressive reads + +Smoke tests: + +1. list resources includes singleton and template entries +2. read representative skill doc URI and reference URI successfully +3. read representative wildcard docs URI successfully + +### Acceptance Criteria for Step 5 Completion + +Step 5 is complete when all are true: + +1. Resource registration is fully registry-driven (no per-skill hardcoded decorators required for core docs surfaces). +2. RFC6570 templates are used for parameterized URI families, including wildcard where needed. +3. All docs resources declare explicit MIME types and read-only/idempotent annotations. +4. `on_duplicate_resources="error"` is enabled and verified by tests. +5. Startup fails safely on registration conflicts. + +### Non-goals for Step 5 + +1. No tool fallback discovery behavior implementation (Step 6). +2. No packaging build inclusion mechanics (Step 7). +3. No CI gate expansion details (Step 9). +4. No migration shims for legacy URI aliases beyond what is needed to preserve current behavior. +5. No ranking-strategy implementation for discovery tools beyond what is needed to preserve deterministic resource-first discovery contracts.