Compare commits

..

2 Commits

Author SHA1 Message Date
John Lancaster 85355a8509 added step 4 and 5 2026-06-20 13:22:03 -05:00
John Lancaster 127e56692e the plan 2026-06-20 12:50:38 -05:00
6 changed files with 1009 additions and 0 deletions
@@ -0,0 +1,46 @@
## Plan: Docs-First FastMCP End State
Create a docs-first FastMCP architecture where all Markdown remains in docs/ as the only source of truth, each skill is Anthropic-compatible in its own directory, skill metadata lives in SKILL.md frontmatter, and packaged docs are served through importlib.resources so stdio deployments work from installed wheels.
**Steps**
1. Phase 1: Define the end-state content contract. Confirm canonical structure as docs/skills/<skill-id>/SKILL.md plus docs/skills/<skill-id>/references/..., with strict per-skill ownership and no metadata.yaml sidecar. Also define stable skill-id rules (kebab-case, immutable after release).
2. Phase 1: Define SKILL.md frontmatter schema with Pydantic-compatible fields: id, version, name, description, tags, capabilities, depends_on, and references manifest entries. The references manifest must map logical reference ids to relative paths so each skill can reorganize references internally without changing global server code. Depends on step 1.
3. Phase 1: Define URI contract and lightweight compatibility policy. Recommend resource://catalog/skills_index, resource://catalog/skills/{skill_id}, resource://skills/{skill_id}/document, resource://skills/{skill_id}/references/{ref_id}, and resource://docs/{path*}. Add change-friendly guidance for evolving URIs and reference ids with optional short-lived aliases when needed. Depends on steps 1-2.
4. Phase 2: Build a docs registry loader that reads packaged docs via importlib.resources.files(...) Traversable APIs, parses SKILL.md frontmatter, validates schema, and creates an in-memory registry keyed by skill_id. Fail fast for duplicate ids, missing files, broken reference mappings, or invalid depends_on. Depends on steps 2-3.
5. Phase 2: Register FastMCP resources from the registry using RFC6570 templates (including wildcard paths where appropriate), read-only/idempotent annotations, explicit mime types, and on_duplicate_resources="error" for startup safety. Depends on step 4.
6. Phase 2: Add discovery surfaces as resources first, then tool fallback. Keep catalog discovery in resources, then add ResourcesAsTools for tool-only clients. Add thin discovery tools only for parity and optional BM25/regex tool search when catalog/tool volume grows enough to affect token efficiency. Depends on step 5.
7. Phase 3: Implement packaging so docs/ is copied into package resource space at build time (wheel + sdist) while docs/ remains canonical in source control. Use importlib.resources at runtime only; avoid direct filesystem assumptions. Depends on steps 4-6.
8. Phase 3: Remove materialization coupling between skill source modules and docs. The website build reads docs/ directly, while MCP reads packaged docs resources from the installed package. This preserves one authored source with two distribution surfaces. Depends on step 7.
9. Phase 4: Add validation and CI gates: frontmatter schema checks, URI uniqueness checks, reference integrity checks, docs build check, package content check, and stdio smoke checks that read representative skill/document resources from an installed wheel. Depends on steps 5-8.
10. Phase 4: Add long-term maintainability guardrails: architecture decision record for URI and schema contracts, skill authoring checklist, and release checklist for evolving references safely within one skill. Parallel with step 9 after core architecture is stable.
**Relevant files**
- /home/john/Documents/prompts/docs/index.md — Keep top-level docs entry and explain docs-first architecture contract.
- /home/john/Documents/prompts/docs/skills — Canonical location for all skill content, including SKILL.md and references.
- /home/john/Documents/prompts/pyproject.toml — Build inclusion rules for packaged markdown resources in wheel/sdist.
- /home/john/Documents/prompts/src/personal_mcp/main.py — App/server startup wiring for resource registry initialization.
- /home/john/Documents/prompts/src/personal_mcp/mcp.py — FastMCP instance composition and transform registration.
- /home/john/Documents/prompts/src/personal_mcp/catalog/server.py — Catalog resource and fallback discovery behavior.
- /home/john/Documents/prompts/src/personal_mcp/skills/document_loader.py — Replace file-path assumptions with importlib.resources docs registry loading.
- /home/john/Documents/prompts/src/personal_mcp/web/materialize_skill_docs.py — De-scope or retire materialization once docs-first runtime is authoritative.
**Verification**
1. Run uv run zensical build to verify docs/ remains valid and site output is stable.
2. Run uv run pytest -q with tests that validate frontmatter parsing, URI generation, reference mapping, and catalog responses.
3. Run a packaging integrity check using importlib.resources.files(...) to confirm packaged docs resources exist and are readable from an installed wheel.
4. Run a stdio MCP smoke test that lists resources and reads at least one skill document and one reference document.
5. Run fallback-client smoke tests verifying list_resources/read_resource tools work and return expected metadata for both static and templated resources.
**Decisions**
- Anthropic compatibility: strict skill directory pattern with SKILL.md and references subtree.
- Metadata strategy: YAML frontmatter in SKILL.md (no separate metadata file).
- Discovery strategy: resource-first catalog with tool fallback for tool-only MCP clients.
- Included scope: ideal end-state architecture, contracts, validation, and packaging for stdio operation.
- Excluded scope: migration mechanics from current implementation, backward-compat shim details, and docs visual redesign.
**Further Considerations**
1. Prefer recursive references support under each skill plus frontmatter manifest ids, so skill teams can reorganize internal reference folders without URI churn.
2. Define a hard rule that skill_id and directory name must match exactly to eliminate namespace/slug drift classes of bugs.
3. Prefer optional, short-lived URI aliases only when active clients depend on older catalog or skill resource paths; otherwise simplify quickly.
+80
View File
@@ -0,0 +1,80 @@
**Step 1 Results: End-State Content Contract**
This section finalizes Step 1 by defining the canonical authored content model.
### Canonical source of truth
- All authored Markdown lives under `docs/`.
- MCP resources and static docs are two distribution surfaces of the same authored files.
- No parallel authored markdown is allowed in `src/` or other package-only paths.
### Canonical skill shape (Anthropic-compatible)
Each skill is one directory under `docs/skills/`:
```text
docs/
skills/
<skill-id>/
SKILL.md
references/
... (one or more markdown files, optional nested folders)
```
Rules:
- `SKILL.md` is required for every skill.
- `references/` is the only place for skill-specific supporting docs.
- Nested folders inside `references/` are allowed so a skill can reorganize internals without changing global architecture.
- Skill directories are independent ownership boundaries; no cross-skill file writes.
### File placement and ownership boundaries
- Top-level project docs stay in `docs/*.md`.
- Skill docs stay in `docs/skills/<skill-id>/...`.
- A skill may link to other skills, but must not store content inside another skill's directory.
- Server/runtime code may index and serve docs, but must not be the source of authored markdown.
### Metadata location constraint
- Skill metadata is embedded in YAML frontmatter in `SKILL.md`.
- No `metadata.yaml` sidecar in the end state.
- Reference lookup metadata (ids to relative paths) is declared from `SKILL.md` frontmatter, not inferred as a hidden global convention.
### Skill-id contract (change-friendly)
`skill-id` is the public identifier and SHOULD satisfy all rules below:
- Format: lowercase kebab-case only.
- Character set: `a-z`, `0-9`, and `-`.
- Must start with a letter.
- No underscores, spaces, dots, or uppercase characters.
- Directory name should equal `skill-id` in each committed revision.
- Frontmatter `id` should equal directory name in each committed revision.
- Prefer keeping `skill-id` stable, but renames are allowed when needed if mappings are updated together.
Example valid ids:
- `fastapi-uv-docker`
- `zensical-docs`
- `pytest-scaffolding`
Example invalid ids:
- `fastapi_uv_docker` (underscore)
- `Zensical-Docs` (uppercase)
- `docs.zensical` (dot)
### Invariants this contract guarantees
- One authored source tree (`docs/`) for both website and MCP.
- One skill directory maps to one skill identity per revision.
- Namespace/slug drift is minimized by keeping directory and frontmatter ids aligned per revision.
- Per-skill reference structure can evolve without changing cross-skill architecture.
- Packaging for stdio is deterministic because authored content is path-stable.
### Non-goals for Step 1
- No URI versioning policy details yet (handled in Step 3).
- No full frontmatter schema details yet (handled in Step 2).
- No migration instructions from current architecture (out of scope for this plan).
+294
View File
@@ -0,0 +1,294 @@
**Step 2 Results: SKILL.md Frontmatter and FastMCP Metadata Contract**
This section finalizes Step 2 by defining the canonical SKILL.md frontmatter schema, separating Anthropic-supported fields from repository extension fields, and mapping frontmatter to FastMCP-native metadata surfaces for resources and tools.
### Anthropic Frontmatter Support (Research Baseline)
Across Anthropic API and Agent Skills specification surfaces:
- Required for custom skill bundles: `name`, `description`.
- `name` constraints (Agent Skills API docs): 1-64 chars, lowercase letters/numbers/hyphens, no XML tags, and must not use reserved words `anthropic` or `claude`.
- `description` constraints (Agent Skills API docs): 1-1024 chars, non-empty, no XML tags.
Portable optional fields from the Agent Skills specification:
- `license`
- `compatibility`
- `metadata`
- `allowed-tools` (experimental)
Claude Code-specific optional fields (supported by Claude Code skills docs):
- `when_to_use`, `argument-hint`, `arguments`
- `disable-model-invocation`, `user-invocable`
- `allowed-tools`, `disallowed-tools`
- `model`, `effort`, `context`, `agent`, `hooks`, `paths`, `shell`
Contract decision for this repository:
- Treat `name` and `description` as required in all SKILL.md files, even where a client could infer defaults.
- Keep Anthropic-facing semantics in standard fields and keep MCP indexing metadata in a namespaced extension block.
- Preserve forward compatibility by allowing additive optional metadata fields over time.
### Canonical Frontmatter Schema For This Repository
Use this exact two-layer pattern:
1. Anthropic layer (portable): top-level fields intended for Anthropic/Agent Skills behavior.
2. Repository layer (runtime indexing): one namespaced block, `x-personal-mcp`, for MCP catalog and routing metadata.
Canonical shape:
```yaml
---
name: <skill-id>
description: <what this skill does and when to use it>
# Optional Anthropic/Agent Skills fields (use only when needed)
when_to_use: <extra trigger guidance>
allowed-tools: <space-separated string or YAML list>
disable-model-invocation: false
user-invocable: true
license: <optional>
compatibility: <optional>
# Repository-specific metadata (authoritative for MCP indexing)
x-personal-mcp:
id: <skill-id>
version: <semver>
tags:
- <tag>
capabilities:
- resource://skills/<skill-id>/document
depends_on: []
references:
<ref-id>:
path: references/<file>.md
mime_type: text/markdown
title: <short title>
---
```
### Repository Metadata Field Rules (`x-personal-mcp`)
- `id` required: must follow Step 1 skill-id rules and equal directory name.
- `version` required: semantic version string.
- `tags` optional: list of kebab-case discovery labels.
- `capabilities` required: list of MCP URIs this skill publishes.
- `depends_on` optional: list of other skill ids.
- `references` optional map:
- key is `ref-id` (kebab-case).
- `path` is a skill-relative markdown path and must stay inside the same skill directory.
- nested folders under `references/` are allowed.
- `mime_type` defaults to `text/markdown` if omitted.
- `title` is an optional display label.
- renaming `ref-id` values is allowed when needed; optional aliases may be used during transitions.
### Pydantic Models For Frontmatter Validation
Define the Step 2 contract with Pydantic v2 models and change-friendly validation.
Normative model sketch:
```python
from __future__ import annotations
import re
from pathlib import PurePosixPath
from typing import Any
from pydantic import BaseModel, ConfigDict, Field, field_validator, model_validator
SKILL_ID_RE = re.compile(r"^[a-z][a-z0-9-]*$")
SEMVER_RE = re.compile(r"^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:[-+][0-9A-Za-z.-]+)?$")
class ReferenceEntry(BaseModel):
model_config = ConfigDict(extra="ignore", str_strip_whitespace=True)
path: str
mime_type: str = "text/markdown"
title: str | None = None
@field_validator("path")
@classmethod
def validate_reference_path(cls, value: str) -> str:
p = PurePosixPath(value)
if p.is_absolute() or ".." in p.parts:
raise ValueError("reference path must be a relative in-skill path")
if not str(p).startswith("references/"):
raise ValueError("reference path must stay under references/")
if p.suffix.lower() != ".md":
raise ValueError("reference path must target a markdown file")
return str(p)
class PersonalMcpMetadata(BaseModel):
model_config = ConfigDict(extra="ignore", str_strip_whitespace=True)
id: str
version: str
tags: list[str] = Field(default_factory=list)
capabilities: list[str] = Field(min_length=1)
depends_on: list[str] = Field(default_factory=list)
references: dict[str, ReferenceEntry] = Field(default_factory=dict)
@field_validator("id")
@classmethod
def validate_id(cls, value: str) -> str:
if not SKILL_ID_RE.fullmatch(value):
raise ValueError("id must be lowercase kebab-case and start with a letter")
return value
@field_validator("version")
@classmethod
def validate_version(cls, value: str) -> str:
if not SEMVER_RE.fullmatch(value):
raise ValueError("version must be semver")
return value
@field_validator("depends_on")
@classmethod
def validate_depends_on(cls, value: list[str]) -> list[str]:
for dep in value:
if not SKILL_ID_RE.fullmatch(dep):
raise ValueError(f"invalid depends_on skill id: {dep}")
return value
@field_validator("references")
@classmethod
def validate_reference_ids(cls, value: dict[str, ReferenceEntry]) -> dict[str, ReferenceEntry]:
for ref_id in value:
if not SKILL_ID_RE.fullmatch(ref_id):
raise ValueError(f"invalid reference id: {ref_id}")
return value
@model_validator(mode="after")
def ensure_primary_capability(self) -> "PersonalMcpMetadata":
expected = f"resource://skills/{self.id}/document"
if expected not in self.capabilities:
raise ValueError(f"capabilities must include {expected}")
return self
class SkillFrontmatter(BaseModel):
model_config = ConfigDict(extra="ignore", str_strip_whitespace=True)
# Anthropic/Agent Skills fields
name: str = Field(min_length=1, max_length=64)
description: str = Field(min_length=1, max_length=1024)
when_to_use: str | None = None
allowed_tools: str | list[str] | None = Field(default=None, alias="allowed-tools")
disallowed_tools: str | list[str] | None = Field(default=None, alias="disallowed-tools")
disable_model_invocation: bool | None = Field(default=None, alias="disable-model-invocation")
user_invocable: bool | None = Field(default=None, alias="user-invocable")
argument_hint: str | None = Field(default=None, alias="argument-hint")
arguments: str | list[str] | None = None
license: str | None = None
compatibility: str | None = None
metadata: dict[str, str] | None = None
# Repository extension block
x_personal_mcp: PersonalMcpMetadata = Field(alias="x-personal-mcp")
@field_validator("name")
@classmethod
def validate_name(cls, value: str) -> str:
if not SKILL_ID_RE.fullmatch(value):
raise ValueError("name must be lowercase kebab-case and start with a letter")
if "anthropic" in value or "claude" in value:
raise ValueError("name must not contain reserved words anthropic or claude")
return value
@model_validator(mode="after")
def cross_validate(self) -> "SkillFrontmatter":
if self.x_personal_mcp.id != self.name:
raise ValueError("x-personal-mcp.id must exactly match name")
return self
def validate_skill_frontmatter(raw: dict[str, Any], skill_dir_name: str) -> SkillFrontmatter:
model = SkillFrontmatter.model_validate(raw)
if model.name != skill_dir_name:
raise ValueError("frontmatter name must exactly match skill directory name")
return model
```
Validation behavior contract:
- Validate required core fields and relationships during registry load before FastMCP resource/tool registration.
- Allow unknown additive fields so frontmatter can evolve without blocking startup.
- Treat hard contract violations (missing required fields, invalid ids, broken required mappings) as startup errors.
- Treat non-critical compatibility issues as warnings when possible.
- Error messages should include skill path and failing field for CI readability.
Projection mode contract (for Anthropic API upload pipelines):
- Parse with `SkillFrontmatter` first.
- Emit Anthropic-safe frontmatter with standard fields only.
- Serialize repository metadata into standard `metadata` as namespaced keys.
- Preserve the canonical authored source in `x-personal-mcp`; projection output is a build artifact.
### Anthropic Upload Compatibility Rule
- Anthropic documentation guarantees behavior for standard frontmatter fields but does not explicitly guarantee handling of arbitrary unknown top-level keys.
- Therefore, publishing pipelines that target strict API compatibility should support a projection mode that emits only standard frontmatter fields for upload.
- In projection mode, repository extension metadata is serialized into the standard `metadata` field (for example as namespaced keys or JSON-encoded values), while source-of-truth authoring remains in `x-personal-mcp`.
### FastMCP Native Metadata Surfaces (Research Baseline)
Resources (`@mcp.resource` and templates) support native definition metadata:
- `name`, `description`, `mime_type`, `tags`
- `annotations` (`readOnlyHint`, `idempotentHint`)
- `icons`
- `meta` (custom metadata passed through to the MCP client resource object)
- `version`
- `enabled` (deprecated in v3; prefer server-level `mcp.enable()` / `mcp.disable()`)
Resources support runtime metadata:
- `ResourceContent.meta` (item-level)
- `ResourceResult.meta` (result-level `_meta`)
Tools (`@mcp.tool`) support native definition metadata:
- `name`, `description`, `tags`
- `annotations` (`title`, `readOnlyHint`, `destructiveHint`, `idempotentHint`, `openWorldHint`)
- `icons`
- `meta` (custom metadata passed through to the MCP client tool object)
- `version`
- `timeout`, `output_schema`, `run_in_thread`
- `enabled` (deprecated in v3; prefer server-level `mcp.enable()` / `mcp.disable()`)
Tools support runtime metadata:
- `ToolResult.meta` (execution-level metadata for each call)
### Frontmatter To FastMCP Mapping Contract
At server startup, map `x-personal-mcp` fields into FastMCP registration as follows:
- `x-personal-mcp.id` -> canonical URI namespace and identity checks.
- `description` -> default `description` for the primary skill document resource.
- `x-personal-mcp.tags` -> `tags` on resources/tools.
- `x-personal-mcp.version` -> `version` on resources/tools.
- `x-personal-mcp.capabilities` -> registered URI list plus catalog exposure.
- `x-personal-mcp.references[*]` -> resource templates or concrete resources with:
- `mime_type` from reference entry (or default)
- `meta` including `skill_id`, `ref_id`, and source `path`
- read-only annotations for documentation resources
- `x-personal-mcp.depends_on` -> catalog dependency graph metadata and validation checks.
### Invariants This Contract Guarantees
- Anthropic-required frontmatter stays valid for custom skill upload and Claude Code loading.
- MCP-specific metadata remains embedded in SKILL.md frontmatter, with no `metadata.yaml` sidecar.
- FastMCP registration uses only native metadata fields for resources/tools.
- Reference ids and metadata can evolve with low-friction updates while internal file layout under `references/` stays refactor-friendly.
### Non-goals For Step 2
- No URI versioning/deprecation rollout policy details (handled in Step 3).
- No migration script design from existing `metadata.yaml` files.
- No runtime caching/indexing performance tuning details.
+141
View File
@@ -0,0 +1,141 @@
**Step 3 Results: URI Contract and Compatibility Policy**
This section finalizes Step 3 by defining the canonical resource URI contract, template parameter rules, and explicit compatibility/versioning policy for URIs and reference ids.
### Canonical URI Surface (Normative)
The public, preferred URIs are:
1. `resource://catalog/skills_index`
2. `resource://catalog/skills/{skill_id}`
3. `resource://skills/{skill_id}/document`
4. `resource://skills/{skill_id}/references/{ref_id}`
5. `resource://docs/{path*}`
Contract intent:
- Catalog URIs are discovery surfaces.
- Skill URIs are primary per-skill guidance surfaces.
- Docs wildcard URI is a direct authored-markdown access surface under `docs/`.
### URI Semantics
`resource://catalog/skills_index`
- Returns a compact list of skill records for discovery.
- One entry per `skill_id`.
- Must include enough metadata for client-side selection (at minimum id, name, description, tags, capabilities).
`resource://catalog/skills/{skill_id}`
- Returns one normalized record for `skill_id`.
- Must include canonical document URI and declared reference ids.
- Returns not-found when `skill_id` does not exist.
`resource://skills/{skill_id}/document`
- Returns the canonical `SKILL.md` authored content for that skill.
- `skill_id` must match Step 1 stable id rules.
`resource://skills/{skill_id}/references/{ref_id}`
- Returns one reference document declared in the skill frontmatter references manifest.
- `ref_id` is the stable public handle for that reference document.
`resource://docs/{path*}`
- Returns authored markdown at a normalized relative path under `docs/`.
- Supports nested paths via RFC6570 wildcard expansion.
- Typical examples: `index.md`, `usage.md`, `skills/<skill-id>/SKILL.md`, `skills/<skill-id>/references/<file>.md`.
### Template Parameter and Validation Rules
`skill_id`
- Lowercase kebab-case.
- Must satisfy Step 1 stable id rules.
`ref_id`
- Lowercase kebab-case.
- Must be declared in the skills references manifest.
`path*`
- Relative POSIX path only.
- No leading slash.
- No `..` traversal segments.
- Resolves only inside `docs/`.
- This surface is markdown-only in end state (`.md` files).
### Legacy URI Policy (Current-to-Target Transition)
Current catalog URIs in this repo (`resource://catalog/patterns`, `resource://catalog/patterns_by_id`, `resource://catalog/skills_details`) are treated as compatibility aliases during migration.
Rules:
- Keep aliases only when needed for active clients.
- Prefer simple canonical URIs for new clients.
- Remove aliases once consumers have moved.
### URI Versioning Policy
Default rule:
- Keep URIs unversioned by default.
- Allow URI and payload updates when they improve clarity or implementation simplicity.
Breaking-change rule:
- If clients are already using an older shape, provide either:
- a short-lived alias, or
- a versioned URI family such as `resource://catalog/v2/...`.
- Choose the lightest migration path that minimizes maintenance overhead.
FastMCP version metadata usage:
- Resource `version` metadata MAY be used for implementation/version discovery.
- URI readability and maintainability remain the primary contract.
### Deprecation Policy For URIs
When deprecating a URI:
1. Document the replacement URI in changelog/docs.
2. Optionally return deprecation metadata while an alias exists.
3. Remove deprecated aliases when no active client needs them.
Recommended deprecation metadata fields in resource responses:
- `deprecated: true`
- `replacement_uri: <uri>`
- `sunset_at: <ISO-8601 timestamp>`
### Reference ID Compatibility Policy
`ref_id` is the public identifier for a reference document, separate from file path.
Rules:
- Prefer keeping `ref_id` stable when practical.
- File paths may change without URI churn as long as the mapped `ref_id` resolves.
- If a reference is renamed, introduce a new `ref_id`; keep an alias only if clients depend on the old id.
- Avoid reusing retired `ref_id` values for unrelated content.
Alias behavior for renamed references:
- If alias is kept, old `ref_id` continues to resolve and points to the replacement.
- Remove old alias as soon as migration is complete.
### Invariants This Contract Guarantees
- One canonical URI pattern per core capability surface.
- Fast, low-friction URI evolution with optional compatibility aliases.
- Explicit migration path for catalog URI consolidation when needed.
- Reference mappings can evolve with minimal churn.
### Non-goals For Step 3
- No implementation-specific transform wiring details (`VersionFilter`, mounts, provider composition).
- No migration script mechanics for auto-generating aliases.
- No authorization policy design for URI-level access control.
+238
View File
@@ -0,0 +1,238 @@
**Step 4 Results: Docs Registry Loader Design (importlib.resources + Fail-Fast Validation)**
This section finalizes Step 4 by defining a production-ready docs registry loader that reads packaged docs through Python resource APIs, parses SKILL.md frontmatter, validates schema and cross-links, and builds an immutable in-memory registry keyed by skill_id.
### Research Baseline (Python + Design Guidance)
Authoritative references used for this step:
1. Python `importlib.resources` docs (`files`, `as_file`, `Traversable` APIs)
2. Python `importlib.resources.abc` docs (`Traversable`, path traversal semantics, joinpath compatibility notes)
3. Pydantic v2 model/validation docs (`model_validate`, `ValidationError`, strictness and extra handling)
4. Python packaging guidance for including package data in wheels/sdists
Best-practice conclusions applied to this design:
1. Prefer `importlib.resources.files(<package>).joinpath(...)` over filesystem assumptions so stdio deployments from installed wheels work.
2. Treat resources as potentially non-filesystem artifacts (zip-import compatible); only use `as_file(...)` when an actual OS path is required.
3. Validate metadata with explicit Pydantic models and fail startup on contract violations.
4. Keep registry load deterministic (sorted traversal, stable error messages, no hidden fallback mutations).
5. Resolve references via manifest ids declared in frontmatter, not by global file conventions.
### Loader Responsibilities (Normative)
The Step 4 loader MUST:
1. Read canonical docs from package resources (not repo-root paths).
2. Discover all skill directories under `docs/skills/` in packaged resources.
3. For each skill, read and parse `SKILL.md` frontmatter.
4. Validate frontmatter using the Step 2 schema contract.
5. Validate directory/id invariants from Step 1 (directory name equals frontmatter id).
6. Validate URI/reference semantics from Step 3 assumptions.
7. Build a single in-memory registry keyed by `skill_id`.
8. Fail fast on any integrity error before FastMCP resource registration.
9. Precompute compact discovery projections so index resources can be served without reading full markdown bodies at request time.
### Package Resource Contract
Runtime anchor:
1. The loader resolves content from an importable package anchor, for example `personal_mcp`.
2. Docs root is located as `files(anchor).joinpath("docs")` when docs are packaged at package root, or an equivalent configured subpath.
3. Skill root is `docs/skills`.
Resource assumptions:
1. `SKILL.md` is UTF-8 text.
2. Reference files declared in frontmatter are UTF-8 markdown by default unless otherwise declared.
3. Path resolution always remains inside the same skill directory.
### Registry Data Model
Build immutable runtime records with explicit structure:
1. `SkillRecord`
- `skill_id`
- `name`
- `description`
- `version`
- `tags`
- `capabilities`
- `depends_on`
- `document_uri`
- `document_relpath` (canonical resource-relative path)
- `references` map keyed by `ref_id`
2. `ReferenceRecord`
- `ref_id`
- `uri`
- `relpath`
- `mime_type`
- `title`
3. `DocsRegistry`
- `skills_by_id: dict[str, SkillRecord]`
- `skills_in_load_order: list[str]` (deterministic ordering)
- helper indexes for catalog payload generation
- `skills_summary_in_load_order: list[SkillSummaryRecord]` for progressive discovery responses
- filter indexes (for example by tag/capability) derived once at startup
4. `SkillSummaryRecord`
- `skill_id`
- `name`
- `description`
- `tags`
- `capabilities`
- `document_uri`
- optional `version`
Immutability rule:
1. Once built, registry records are treated as read-only for the process lifetime.
2. No runtime mutation during requests; refresh only via process restart.
### Frontmatter Parsing Contract
`SKILL.md` parse steps:
1. Read full markdown text from resource.
2. Parse YAML frontmatter block at file start (between the first two `---` delimiters).
3. Parse YAML with safe loader semantics.
4. Validate parsed object with Step 2 Pydantic model(s).
5. Preserve markdown body as document content payload.
Parsing failure behavior:
1. Missing frontmatter block: startup error.
2. Invalid YAML: startup error with skill path and YAML parser detail.
3. Missing required fields (`name`, `description`, `x-personal-mcp` contract fields): startup error.
### Validation Pipeline (Fail-Fast)
Validation happens in this order:
1. Structural discovery validation
- skill directory exists under `docs/skills`
- required `SKILL.md` exists for each discovered skill
2. Schema validation
- Pydantic frontmatter validation for all required and constrained fields
3. Identity validation
- frontmatter `name` equals `x-personal-mcp.id`
- frontmatter id equals skill directory name
4. Reference manifest validation
- unique `ref_id` keys per skill
- each manifest path is relative, in-skill, and under `references/`
- each manifest target exists and is a file
5. Dependency graph validation
- every `depends_on` target exists in discovered skill set
- no self-dependency
- cycle detection enabled (hard error on cycle)
6. Capability sanity checks
- required primary capability `resource://skills/{skill_id}/document` is present
7. Global uniqueness checks
- no duplicate `skill_id`
- no duplicate canonical resource URIs generated from registry
8. Discovery payload checks
- summary fields required by catalog index are present and non-empty
- summary generation does not require reading markdown body content during request handling
### Error Model and Reporting
Error handling contract:
1. Collect errors per validation phase for clarity, then raise one startup exception containing all findings.
2. Error messages must include:
- skill id (when known)
- packaged relative path
- violated rule
- actionable fix hint
3. If any error exists, registry is not published and FastMCP resource registration does not proceed.
Recommended exception shape:
1. `DocsRegistryValidationError(errors: list[RegistryIssue])`
2. `RegistryIssue` fields: `code`, `message`, `skill_id`, `path`, `hint`
### Determinism and Runtime Safety
Determinism rules:
1. Traverse directories in sorted order.
2. Normalize all stored relative paths to POSIX form.
3. Normalize ids/tags exactly once at parse boundary.
4. Produce stable catalog ordering to reduce client churn.
5. Produce stable summary projections and filter indexes from the same normalized source records.
Runtime safety rules:
1. No dependence on `Path(__file__)` or repository root.
2. No ad-hoc fallback probing across multiple locations.
3. No lazy validation deferred until first request.
### Integration Plan for Existing Modules
Primary integration target:
1. Replace path-based logic in `src/personal_mcp/skills/document_loader.py` with package-resource-based registry loading.
Catalog integration:
1. Update `src/personal_mcp/catalog/server.py` to consume the shared in-memory registry instead of scanning `metadata.yaml` files.
2. Keep catalog payload normalization deterministic and sourced from registry records only.
Startup wiring:
1. Initialize registry once during app/server startup in `src/personal_mcp/main.py` or equivalent composition point.
2. Pass registry to resource registration step (Step 5).
### Proposed Loader API Surface
Use a small, testable API:
1. `load_docs_registry(*, package_anchor: str, docs_root: str = "docs") -> DocsRegistry`
2. `read_skill_document(registry: DocsRegistry, skill_id: str) -> DocumentPayload`
3. `read_skill_reference(registry: DocsRegistry, skill_id: str, ref_id: str) -> DocumentPayload`
Design constraints:
1. Loader functions are pure relative to package resources and input args.
2. No global mutable singleton required for unit tests.
3. Caching is explicit and owned by startup composition.
### Test and Validation Plan (Step 4 Scope)
Unit tests:
1. valid multi-skill registry load from packaged test fixtures
2. duplicate id detection
3. missing SKILL.md detection
4. invalid frontmatter field constraints
5. broken reference target detection
6. invalid depends_on target detection
7. cycle detection in depends_on graph
8. deterministic output ordering across runs
Packaging/runtime tests:
1. install built wheel in isolated env
2. load registry via `importlib.resources.files(...)`
3. assert representative skill document/reference are readable
Expected command path in this repo:
1. `uv run pytest -q`
### Acceptance Criteria for Step 4 Completion
Step 4 is complete when all are true:
1. Registry loads exclusively from packaged resources.
2. All Step 2 and Step 3 dependent validations are enforced at startup.
3. Invalid docs state blocks startup with actionable diagnostics.
4. Registry is deterministic and immutable for runtime use.
5. Catalog and later resource registration can consume registry without direct filesystem scanning.
### Non-goals for Step 4
1. No FastMCP resource registration wiring details (Step 5).
2. No discovery-tool fallback behavior design (Step 6).
3. No final packaging/build-system migration mechanics (Step 7).
4. No backward-compat alias rollout mechanics beyond validation readiness.
+210
View File
@@ -0,0 +1,210 @@
**Step 5 Results: Registry-Driven FastMCP Resource Registration (RFC6570 + Startup Safety)**
This section finalizes Step 5 by defining how FastMCP resources are registered from the Step 4 docs registry using RFC6570 URI templates, explicit metadata, and strict duplicate-registration safety.
### Research Baseline (FastMCP + URI Templates)
Authoritative references used for this step:
1. FastMCP Resources and Templates docs (resource decorator, template behavior)
2. FastMCP RFC6570 support docs (simple params, wildcard params, query params)
3. FastMCP duplicate handling docs (`on_duplicate_resources`)
4. FastMCP annotations guidance (`readOnlyHint`, `idempotentHint`)
Best-practice conclusions applied to this design:
1. Use URI templates for parameterized resources instead of generating N static resource handlers.
2. Use wildcard template parameters (`{path*}`) for hierarchical docs paths.
3. Set startup duplicate policy to `on_duplicate_resources="error"` to fail fast on contract collisions.
4. Set explicit `mime_type` and resource annotations for all docs resources.
5. Keep registration deterministic and sourced only from the validated Step 4 registry.
### Registration Responsibilities (Normative)
The Step 5 registration layer MUST:
1. Consume only the validated in-memory registry produced by Step 4.
2. Register canonical resource discovery surfaces and skill document/reference surfaces.
3. Use RFC6570 templates where URI patterns are parameterized.
4. Use wildcard templates where path depth is variable.
5. Attach read-only/idempotent annotations to documentation resources.
6. Set explicit MIME types for all registered resources.
7. Fail startup if duplicate URI/template keys are encountered.
### Canonical Resource Surface (from Registry)
The preferred resources registered in this phase are:
1. `resource://catalog/skills_index`
2. `resource://catalog/skills_index{?q,tag,capability,cursor,limit}` (optional filtered/paginated discovery template)
3. `resource://catalog/skills/{skill_id}`
4. `resource://skills/{skill_id}/document`
5. `resource://skills/{skill_id}/references/{ref_id}`
6. `resource://docs/{path*}`
Registration decision rules:
1. Use static resource registration for fixed singleton endpoints (for example `skills_index`).
2. Use template registration for parameterized endpoints (`{skill_id}`, `{ref_id}`) and optional discovery query templates.
3. Use wildcard template registration for hierarchical docs routing (`{path*}`).
4. Keep the singleton and query-template discovery surfaces semantically equivalent (same schema, query template adds filtering/pagination only).
### Progressive Discovery Contract
Discovery-first behavior for Step 5 resources:
1. `skills_index` returns summaries only (no embedded full SKILL.md bodies).
2. Each summary includes canonical follow-up URIs so clients can progressively fetch detail (`catalog/skills/{skill_id}` then `skills/{skill_id}/document`).
3. Filtered/paginated discovery uses RFC6570 query params (`q`, `tag`, `capability`, `cursor`, `limit`) with deterministic ordering.
4. Handlers should enforce bounded page size and return explicit continuation metadata when pagination is active.
5. Errors for unsupported filter params or invalid cursor/limit are explicit and actionable.
### RFC6570 Template Contract
Path parameters:
1. `{skill_id}` and `{ref_id}` are single-segment template params.
2. `{path*}` is a wildcard param and may capture multi-segment paths separated by `/`.
Validation contract at resource-read time:
1. `skill_id` must exist in registry.
2. `ref_id` must exist in that skills reference manifest.
3. wildcard `path*` must normalize to an allowed docs-relative markdown path.
4. invalid params return explicit not-found or validation errors (no silent fallback).
Template function signature contract:
1. Required URI params must exist as function parameters.
2. Avoid hidden implicit params not represented in template.
3. Keep template handlers side-effect free.
### Metadata and Annotation Contract
Each docs/resource registration should specify explicit metadata:
1. `mime_type`
- skill docs and references: `text/markdown`
- catalog payloads: `application/json`
2. `annotations`
- `readOnlyHint: true`
- `idempotentHint: true`
3. `tags`
- include stable categories such as `catalog`, `skill-doc`, `reference`, `docs`
4. `version`
- project-defined version from registry metadata where applicable
5. `meta`
- include normalized identifiers (for example `skill_id`, `ref_id`, `source_relpath`) when useful
### Startup Safety and Duplicate Policy
FastMCP initialization contract for this phase:
1. Construct the root server with `on_duplicate_resources="error"`.
2. Register all Step 5 resources during startup composition before serving traffic.
3. Treat duplicate registration as a hard startup failure.
Duplicate conflict classes covered:
1. static URI vs static URI collision
2. static URI vs template key collision
3. template URI vs template URI collision
4. conflicting registrations introduced by future aliases without explicit migration handling
### Registration Architecture
Use one dedicated registration module that converts registry records into FastMCP resources.
Recommended API:
1. `register_docs_resources(mcp: FastMCP, registry: DocsRegistry) -> None`
Responsibilities of `register_docs_resources`:
1. register singleton catalog resources
2. register parameterized catalog/detail templates
3. register skill document and reference templates
4. register docs wildcard template
5. apply shared annotations and MIME defaults consistently
Separation of concerns:
1. Step 4 validates and normalizes docs state.
2. Step 5 only registers handlers and reads from validated registry state.
3. Request handlers do not re-discover filesystem/package structure.
### Handler Behavior Contract
Catalog handlers:
1. `skills_index` returns compact deterministic discovery payload (summary records only) and supports progressive follow-up links.
2. `skills/{skill_id}` returns one normalized detail record or not-found.
Skill document handlers:
1. `skills/{skill_id}/document` returns canonical SKILL markdown content.
2. MIME type is always `text/markdown`.
Reference handlers:
1. `skills/{skill_id}/references/{ref_id}` resolves via frontmatter manifest mapping.
2. MIME type is explicit from manifest or defaults to `text/markdown`.
Wildcard docs handler:
1. `docs/{path*}` serves markdown docs under canonical packaged docs tree.
2. traversal outside docs root is blocked.
### Integration Plan for Existing Modules
Primary composition updates:
1. Introduce registry-driven registration in [src/personal_mcp/mcp.py](src/personal_mcp/mcp.py).
2. Keep [src/personal_mcp/main.py](src/personal_mcp/main.py) responsible for startup wiring order (load registry first, then register resources).
3. Refactor [src/personal_mcp/catalog/server.py](src/personal_mcp/catalog/server.py) toward registry-backed handlers.
Lifecycle order (required):
1. load and validate registry (Step 4)
2. initialize FastMCP with duplicate error policy
3. register all Step 5 resources/templates
4. start server
### Testing Plan (Step 5 Scope)
Unit/integration tests:
1. resource registration succeeds with valid registry
2. duplicate resource registration fails at startup
3. `skills/{skill_id}` template resolves expected record
4. `skills/{skill_id}/document` returns markdown with correct MIME
5. `skills/{skill_id}/references/{ref_id}` resolves manifest-mapped file
6. `docs/{path*}` resolves nested docs paths and blocks traversal attempts
7. all registered docs resources include `readOnlyHint` and `idempotentHint`
8. catalog payload order is deterministic
9. filtered/paginated `skills_index{?q,tag,capability,cursor,limit}` responses are deterministic and schema-compatible with the singleton index response
10. catalog index payload excludes full markdown bodies and includes follow-up URIs for progressive reads
Smoke tests:
1. list resources includes singleton and template entries
2. read representative skill doc URI and reference URI successfully
3. read representative wildcard docs URI successfully
### Acceptance Criteria for Step 5 Completion
Step 5 is complete when all are true:
1. Resource registration is fully registry-driven (no per-skill hardcoded decorators required for core docs surfaces).
2. RFC6570 templates are used for parameterized URI families, including wildcard where needed.
3. All docs resources declare explicit MIME types and read-only/idempotent annotations.
4. `on_duplicate_resources="error"` is enabled and verified by tests.
5. Startup fails safely on registration conflicts.
### Non-goals for Step 5
1. No tool fallback discovery behavior implementation (Step 6).
2. No packaging build inclusion mechanics (Step 7).
3. No CI gate expansion details (Step 9).
4. No migration shims for legacy URI aliases beyond what is needed to preserve current behavior.
5. No ranking-strategy implementation for discovery tools beyond what is needed to preserve deterministic resource-first discovery contracts.