added step 4 and 5
This commit is contained in:
@@ -0,0 +1,238 @@
|
||||
**Step 4 Results: Docs Registry Loader Design (importlib.resources + Fail-Fast Validation)**
|
||||
|
||||
This section finalizes Step 4 by defining a production-ready docs registry loader that reads packaged docs through Python resource APIs, parses SKILL.md frontmatter, validates schema and cross-links, and builds an immutable in-memory registry keyed by skill_id.
|
||||
|
||||
### Research Baseline (Python + Design Guidance)
|
||||
|
||||
Authoritative references used for this step:
|
||||
|
||||
1. Python `importlib.resources` docs (`files`, `as_file`, `Traversable` APIs)
|
||||
2. Python `importlib.resources.abc` docs (`Traversable`, path traversal semantics, joinpath compatibility notes)
|
||||
3. Pydantic v2 model/validation docs (`model_validate`, `ValidationError`, strictness and extra handling)
|
||||
4. Python packaging guidance for including package data in wheels/sdists
|
||||
|
||||
Best-practice conclusions applied to this design:
|
||||
|
||||
1. Prefer `importlib.resources.files(<package>).joinpath(...)` over filesystem assumptions so stdio deployments from installed wheels work.
|
||||
2. Treat resources as potentially non-filesystem artifacts (zip-import compatible); only use `as_file(...)` when an actual OS path is required.
|
||||
3. Validate metadata with explicit Pydantic models and fail startup on contract violations.
|
||||
4. Keep registry load deterministic (sorted traversal, stable error messages, no hidden fallback mutations).
|
||||
5. Resolve references via manifest ids declared in frontmatter, not by global file conventions.
|
||||
|
||||
### Loader Responsibilities (Normative)
|
||||
|
||||
The Step 4 loader MUST:
|
||||
|
||||
1. Read canonical docs from package resources (not repo-root paths).
|
||||
2. Discover all skill directories under `docs/skills/` in packaged resources.
|
||||
3. For each skill, read and parse `SKILL.md` frontmatter.
|
||||
4. Validate frontmatter using the Step 2 schema contract.
|
||||
5. Validate directory/id invariants from Step 1 (directory name equals frontmatter id).
|
||||
6. Validate URI/reference semantics from Step 3 assumptions.
|
||||
7. Build a single in-memory registry keyed by `skill_id`.
|
||||
8. Fail fast on any integrity error before FastMCP resource registration.
|
||||
9. Precompute compact discovery projections so index resources can be served without reading full markdown bodies at request time.
|
||||
|
||||
### Package Resource Contract
|
||||
|
||||
Runtime anchor:
|
||||
|
||||
1. The loader resolves content from an importable package anchor, for example `personal_mcp`.
|
||||
2. Docs root is located as `files(anchor).joinpath("docs")` when docs are packaged at package root, or an equivalent configured subpath.
|
||||
3. Skill root is `docs/skills`.
|
||||
|
||||
Resource assumptions:
|
||||
|
||||
1. `SKILL.md` is UTF-8 text.
|
||||
2. Reference files declared in frontmatter are UTF-8 markdown by default unless otherwise declared.
|
||||
3. Path resolution always remains inside the same skill directory.
|
||||
|
||||
### Registry Data Model
|
||||
|
||||
Build immutable runtime records with explicit structure:
|
||||
|
||||
1. `SkillRecord`
|
||||
- `skill_id`
|
||||
- `name`
|
||||
- `description`
|
||||
- `version`
|
||||
- `tags`
|
||||
- `capabilities`
|
||||
- `depends_on`
|
||||
- `document_uri`
|
||||
- `document_relpath` (canonical resource-relative path)
|
||||
- `references` map keyed by `ref_id`
|
||||
2. `ReferenceRecord`
|
||||
- `ref_id`
|
||||
- `uri`
|
||||
- `relpath`
|
||||
- `mime_type`
|
||||
- `title`
|
||||
3. `DocsRegistry`
|
||||
- `skills_by_id: dict[str, SkillRecord]`
|
||||
- `skills_in_load_order: list[str]` (deterministic ordering)
|
||||
- helper indexes for catalog payload generation
|
||||
- `skills_summary_in_load_order: list[SkillSummaryRecord]` for progressive discovery responses
|
||||
- filter indexes (for example by tag/capability) derived once at startup
|
||||
|
||||
4. `SkillSummaryRecord`
|
||||
- `skill_id`
|
||||
- `name`
|
||||
- `description`
|
||||
- `tags`
|
||||
- `capabilities`
|
||||
- `document_uri`
|
||||
- optional `version`
|
||||
|
||||
Immutability rule:
|
||||
|
||||
1. Once built, registry records are treated as read-only for the process lifetime.
|
||||
2. No runtime mutation during requests; refresh only via process restart.
|
||||
|
||||
### Frontmatter Parsing Contract
|
||||
|
||||
`SKILL.md` parse steps:
|
||||
|
||||
1. Read full markdown text from resource.
|
||||
2. Parse YAML frontmatter block at file start (between the first two `---` delimiters).
|
||||
3. Parse YAML with safe loader semantics.
|
||||
4. Validate parsed object with Step 2 Pydantic model(s).
|
||||
5. Preserve markdown body as document content payload.
|
||||
|
||||
Parsing failure behavior:
|
||||
|
||||
1. Missing frontmatter block: startup error.
|
||||
2. Invalid YAML: startup error with skill path and YAML parser detail.
|
||||
3. Missing required fields (`name`, `description`, `x-personal-mcp` contract fields): startup error.
|
||||
|
||||
### Validation Pipeline (Fail-Fast)
|
||||
|
||||
Validation happens in this order:
|
||||
|
||||
1. Structural discovery validation
|
||||
- skill directory exists under `docs/skills`
|
||||
- required `SKILL.md` exists for each discovered skill
|
||||
2. Schema validation
|
||||
- Pydantic frontmatter validation for all required and constrained fields
|
||||
3. Identity validation
|
||||
- frontmatter `name` equals `x-personal-mcp.id`
|
||||
- frontmatter id equals skill directory name
|
||||
4. Reference manifest validation
|
||||
- unique `ref_id` keys per skill
|
||||
- each manifest path is relative, in-skill, and under `references/`
|
||||
- each manifest target exists and is a file
|
||||
5. Dependency graph validation
|
||||
- every `depends_on` target exists in discovered skill set
|
||||
- no self-dependency
|
||||
- cycle detection enabled (hard error on cycle)
|
||||
6. Capability sanity checks
|
||||
- required primary capability `resource://skills/{skill_id}/document` is present
|
||||
7. Global uniqueness checks
|
||||
- no duplicate `skill_id`
|
||||
- no duplicate canonical resource URIs generated from registry
|
||||
8. Discovery payload checks
|
||||
- summary fields required by catalog index are present and non-empty
|
||||
- summary generation does not require reading markdown body content during request handling
|
||||
|
||||
### Error Model and Reporting
|
||||
|
||||
Error handling contract:
|
||||
|
||||
1. Collect errors per validation phase for clarity, then raise one startup exception containing all findings.
|
||||
2. Error messages must include:
|
||||
- skill id (when known)
|
||||
- packaged relative path
|
||||
- violated rule
|
||||
- actionable fix hint
|
||||
3. If any error exists, registry is not published and FastMCP resource registration does not proceed.
|
||||
|
||||
Recommended exception shape:
|
||||
|
||||
1. `DocsRegistryValidationError(errors: list[RegistryIssue])`
|
||||
2. `RegistryIssue` fields: `code`, `message`, `skill_id`, `path`, `hint`
|
||||
|
||||
### Determinism and Runtime Safety
|
||||
|
||||
Determinism rules:
|
||||
|
||||
1. Traverse directories in sorted order.
|
||||
2. Normalize all stored relative paths to POSIX form.
|
||||
3. Normalize ids/tags exactly once at parse boundary.
|
||||
4. Produce stable catalog ordering to reduce client churn.
|
||||
5. Produce stable summary projections and filter indexes from the same normalized source records.
|
||||
|
||||
Runtime safety rules:
|
||||
|
||||
1. No dependence on `Path(__file__)` or repository root.
|
||||
2. No ad-hoc fallback probing across multiple locations.
|
||||
3. No lazy validation deferred until first request.
|
||||
|
||||
### Integration Plan for Existing Modules
|
||||
|
||||
Primary integration target:
|
||||
|
||||
1. Replace path-based logic in `src/personal_mcp/skills/document_loader.py` with package-resource-based registry loading.
|
||||
|
||||
Catalog integration:
|
||||
|
||||
1. Update `src/personal_mcp/catalog/server.py` to consume the shared in-memory registry instead of scanning `metadata.yaml` files.
|
||||
2. Keep catalog payload normalization deterministic and sourced from registry records only.
|
||||
|
||||
Startup wiring:
|
||||
|
||||
1. Initialize registry once during app/server startup in `src/personal_mcp/main.py` or equivalent composition point.
|
||||
2. Pass registry to resource registration step (Step 5).
|
||||
|
||||
### Proposed Loader API Surface
|
||||
|
||||
Use a small, testable API:
|
||||
|
||||
1. `load_docs_registry(*, package_anchor: str, docs_root: str = "docs") -> DocsRegistry`
|
||||
2. `read_skill_document(registry: DocsRegistry, skill_id: str) -> DocumentPayload`
|
||||
3. `read_skill_reference(registry: DocsRegistry, skill_id: str, ref_id: str) -> DocumentPayload`
|
||||
|
||||
Design constraints:
|
||||
|
||||
1. Loader functions are pure relative to package resources and input args.
|
||||
2. No global mutable singleton required for unit tests.
|
||||
3. Caching is explicit and owned by startup composition.
|
||||
|
||||
### Test and Validation Plan (Step 4 Scope)
|
||||
|
||||
Unit tests:
|
||||
|
||||
1. valid multi-skill registry load from packaged test fixtures
|
||||
2. duplicate id detection
|
||||
3. missing SKILL.md detection
|
||||
4. invalid frontmatter field constraints
|
||||
5. broken reference target detection
|
||||
6. invalid depends_on target detection
|
||||
7. cycle detection in depends_on graph
|
||||
8. deterministic output ordering across runs
|
||||
|
||||
Packaging/runtime tests:
|
||||
|
||||
1. install built wheel in isolated env
|
||||
2. load registry via `importlib.resources.files(...)`
|
||||
3. assert representative skill document/reference are readable
|
||||
|
||||
Expected command path in this repo:
|
||||
|
||||
1. `uv run pytest -q`
|
||||
|
||||
### Acceptance Criteria for Step 4 Completion
|
||||
|
||||
Step 4 is complete when all are true:
|
||||
|
||||
1. Registry loads exclusively from packaged resources.
|
||||
2. All Step 2 and Step 3 dependent validations are enforced at startup.
|
||||
3. Invalid docs state blocks startup with actionable diagnostics.
|
||||
4. Registry is deterministic and immutable for runtime use.
|
||||
5. Catalog and later resource registration can consume registry without direct filesystem scanning.
|
||||
|
||||
### Non-goals for Step 4
|
||||
|
||||
1. No FastMCP resource registration wiring details (Step 5).
|
||||
2. No discovery-tool fallback behavior design (Step 6).
|
||||
3. No final packaging/build-system migration mechanics (Step 7).
|
||||
4. No backward-compat alias rollout mechanics beyond validation readiness.
|
||||
@@ -0,0 +1,210 @@
|
||||
**Step 5 Results: Registry-Driven FastMCP Resource Registration (RFC6570 + Startup Safety)**
|
||||
|
||||
This section finalizes Step 5 by defining how FastMCP resources are registered from the Step 4 docs registry using RFC6570 URI templates, explicit metadata, and strict duplicate-registration safety.
|
||||
|
||||
### Research Baseline (FastMCP + URI Templates)
|
||||
|
||||
Authoritative references used for this step:
|
||||
|
||||
1. FastMCP Resources and Templates docs (resource decorator, template behavior)
|
||||
2. FastMCP RFC6570 support docs (simple params, wildcard params, query params)
|
||||
3. FastMCP duplicate handling docs (`on_duplicate_resources`)
|
||||
4. FastMCP annotations guidance (`readOnlyHint`, `idempotentHint`)
|
||||
|
||||
Best-practice conclusions applied to this design:
|
||||
|
||||
1. Use URI templates for parameterized resources instead of generating N static resource handlers.
|
||||
2. Use wildcard template parameters (`{path*}`) for hierarchical docs paths.
|
||||
3. Set startup duplicate policy to `on_duplicate_resources="error"` to fail fast on contract collisions.
|
||||
4. Set explicit `mime_type` and resource annotations for all docs resources.
|
||||
5. Keep registration deterministic and sourced only from the validated Step 4 registry.
|
||||
|
||||
### Registration Responsibilities (Normative)
|
||||
|
||||
The Step 5 registration layer MUST:
|
||||
|
||||
1. Consume only the validated in-memory registry produced by Step 4.
|
||||
2. Register canonical resource discovery surfaces and skill document/reference surfaces.
|
||||
3. Use RFC6570 templates where URI patterns are parameterized.
|
||||
4. Use wildcard templates where path depth is variable.
|
||||
5. Attach read-only/idempotent annotations to documentation resources.
|
||||
6. Set explicit MIME types for all registered resources.
|
||||
7. Fail startup if duplicate URI/template keys are encountered.
|
||||
|
||||
### Canonical Resource Surface (from Registry)
|
||||
|
||||
The preferred resources registered in this phase are:
|
||||
|
||||
1. `resource://catalog/skills_index`
|
||||
2. `resource://catalog/skills_index{?q,tag,capability,cursor,limit}` (optional filtered/paginated discovery template)
|
||||
3. `resource://catalog/skills/{skill_id}`
|
||||
4. `resource://skills/{skill_id}/document`
|
||||
5. `resource://skills/{skill_id}/references/{ref_id}`
|
||||
6. `resource://docs/{path*}`
|
||||
|
||||
Registration decision rules:
|
||||
|
||||
1. Use static resource registration for fixed singleton endpoints (for example `skills_index`).
|
||||
2. Use template registration for parameterized endpoints (`{skill_id}`, `{ref_id}`) and optional discovery query templates.
|
||||
3. Use wildcard template registration for hierarchical docs routing (`{path*}`).
|
||||
4. Keep the singleton and query-template discovery surfaces semantically equivalent (same schema, query template adds filtering/pagination only).
|
||||
|
||||
### Progressive Discovery Contract
|
||||
|
||||
Discovery-first behavior for Step 5 resources:
|
||||
|
||||
1. `skills_index` returns summaries only (no embedded full SKILL.md bodies).
|
||||
2. Each summary includes canonical follow-up URIs so clients can progressively fetch detail (`catalog/skills/{skill_id}` then `skills/{skill_id}/document`).
|
||||
3. Filtered/paginated discovery uses RFC6570 query params (`q`, `tag`, `capability`, `cursor`, `limit`) with deterministic ordering.
|
||||
4. Handlers should enforce bounded page size and return explicit continuation metadata when pagination is active.
|
||||
5. Errors for unsupported filter params or invalid cursor/limit are explicit and actionable.
|
||||
|
||||
### RFC6570 Template Contract
|
||||
|
||||
Path parameters:
|
||||
|
||||
1. `{skill_id}` and `{ref_id}` are single-segment template params.
|
||||
2. `{path*}` is a wildcard param and may capture multi-segment paths separated by `/`.
|
||||
|
||||
Validation contract at resource-read time:
|
||||
|
||||
1. `skill_id` must exist in registry.
|
||||
2. `ref_id` must exist in that skill’s reference manifest.
|
||||
3. wildcard `path*` must normalize to an allowed docs-relative markdown path.
|
||||
4. invalid params return explicit not-found or validation errors (no silent fallback).
|
||||
|
||||
Template function signature contract:
|
||||
|
||||
1. Required URI params must exist as function parameters.
|
||||
2. Avoid hidden implicit params not represented in template.
|
||||
3. Keep template handlers side-effect free.
|
||||
|
||||
### Metadata and Annotation Contract
|
||||
|
||||
Each docs/resource registration should specify explicit metadata:
|
||||
|
||||
1. `mime_type`
|
||||
- skill docs and references: `text/markdown`
|
||||
- catalog payloads: `application/json`
|
||||
2. `annotations`
|
||||
- `readOnlyHint: true`
|
||||
- `idempotentHint: true`
|
||||
3. `tags`
|
||||
- include stable categories such as `catalog`, `skill-doc`, `reference`, `docs`
|
||||
4. `version`
|
||||
- project-defined version from registry metadata where applicable
|
||||
5. `meta`
|
||||
- include normalized identifiers (for example `skill_id`, `ref_id`, `source_relpath`) when useful
|
||||
|
||||
### Startup Safety and Duplicate Policy
|
||||
|
||||
FastMCP initialization contract for this phase:
|
||||
|
||||
1. Construct the root server with `on_duplicate_resources="error"`.
|
||||
2. Register all Step 5 resources during startup composition before serving traffic.
|
||||
3. Treat duplicate registration as a hard startup failure.
|
||||
|
||||
Duplicate conflict classes covered:
|
||||
|
||||
1. static URI vs static URI collision
|
||||
2. static URI vs template key collision
|
||||
3. template URI vs template URI collision
|
||||
4. conflicting registrations introduced by future aliases without explicit migration handling
|
||||
|
||||
### Registration Architecture
|
||||
|
||||
Use one dedicated registration module that converts registry records into FastMCP resources.
|
||||
|
||||
Recommended API:
|
||||
|
||||
1. `register_docs_resources(mcp: FastMCP, registry: DocsRegistry) -> None`
|
||||
|
||||
Responsibilities of `register_docs_resources`:
|
||||
|
||||
1. register singleton catalog resources
|
||||
2. register parameterized catalog/detail templates
|
||||
3. register skill document and reference templates
|
||||
4. register docs wildcard template
|
||||
5. apply shared annotations and MIME defaults consistently
|
||||
|
||||
Separation of concerns:
|
||||
|
||||
1. Step 4 validates and normalizes docs state.
|
||||
2. Step 5 only registers handlers and reads from validated registry state.
|
||||
3. Request handlers do not re-discover filesystem/package structure.
|
||||
|
||||
### Handler Behavior Contract
|
||||
|
||||
Catalog handlers:
|
||||
|
||||
1. `skills_index` returns compact deterministic discovery payload (summary records only) and supports progressive follow-up links.
|
||||
2. `skills/{skill_id}` returns one normalized detail record or not-found.
|
||||
|
||||
Skill document handlers:
|
||||
|
||||
1. `skills/{skill_id}/document` returns canonical SKILL markdown content.
|
||||
2. MIME type is always `text/markdown`.
|
||||
|
||||
Reference handlers:
|
||||
|
||||
1. `skills/{skill_id}/references/{ref_id}` resolves via frontmatter manifest mapping.
|
||||
2. MIME type is explicit from manifest or defaults to `text/markdown`.
|
||||
|
||||
Wildcard docs handler:
|
||||
|
||||
1. `docs/{path*}` serves markdown docs under canonical packaged docs tree.
|
||||
2. traversal outside docs root is blocked.
|
||||
|
||||
### Integration Plan for Existing Modules
|
||||
|
||||
Primary composition updates:
|
||||
|
||||
1. Introduce registry-driven registration in [src/personal_mcp/mcp.py](src/personal_mcp/mcp.py).
|
||||
2. Keep [src/personal_mcp/main.py](src/personal_mcp/main.py) responsible for startup wiring order (load registry first, then register resources).
|
||||
3. Refactor [src/personal_mcp/catalog/server.py](src/personal_mcp/catalog/server.py) toward registry-backed handlers.
|
||||
|
||||
Lifecycle order (required):
|
||||
|
||||
1. load and validate registry (Step 4)
|
||||
2. initialize FastMCP with duplicate error policy
|
||||
3. register all Step 5 resources/templates
|
||||
4. start server
|
||||
|
||||
### Testing Plan (Step 5 Scope)
|
||||
|
||||
Unit/integration tests:
|
||||
|
||||
1. resource registration succeeds with valid registry
|
||||
2. duplicate resource registration fails at startup
|
||||
3. `skills/{skill_id}` template resolves expected record
|
||||
4. `skills/{skill_id}/document` returns markdown with correct MIME
|
||||
5. `skills/{skill_id}/references/{ref_id}` resolves manifest-mapped file
|
||||
6. `docs/{path*}` resolves nested docs paths and blocks traversal attempts
|
||||
7. all registered docs resources include `readOnlyHint` and `idempotentHint`
|
||||
8. catalog payload order is deterministic
|
||||
9. filtered/paginated `skills_index{?q,tag,capability,cursor,limit}` responses are deterministic and schema-compatible with the singleton index response
|
||||
10. catalog index payload excludes full markdown bodies and includes follow-up URIs for progressive reads
|
||||
|
||||
Smoke tests:
|
||||
|
||||
1. list resources includes singleton and template entries
|
||||
2. read representative skill doc URI and reference URI successfully
|
||||
3. read representative wildcard docs URI successfully
|
||||
|
||||
### Acceptance Criteria for Step 5 Completion
|
||||
|
||||
Step 5 is complete when all are true:
|
||||
|
||||
1. Resource registration is fully registry-driven (no per-skill hardcoded decorators required for core docs surfaces).
|
||||
2. RFC6570 templates are used for parameterized URI families, including wildcard where needed.
|
||||
3. All docs resources declare explicit MIME types and read-only/idempotent annotations.
|
||||
4. `on_duplicate_resources="error"` is enabled and verified by tests.
|
||||
5. Startup fails safely on registration conflicts.
|
||||
|
||||
### Non-goals for Step 5
|
||||
|
||||
1. No tool fallback discovery behavior implementation (Step 6).
|
||||
2. No packaging build inclusion mechanics (Step 7).
|
||||
3. No CI gate expansion details (Step 9).
|
||||
4. No migration shims for legacy URI aliases beyond what is needed to preserve current behavior.
|
||||
5. No ranking-strategy implementation for discovery tools beyond what is needed to preserve deterministic resource-first discovery contracts.
|
||||
Reference in New Issue
Block a user