added step 4 and 5
This commit is contained in:
@@ -0,0 +1,238 @@
|
||||
**Step 4 Results: Docs Registry Loader Design (importlib.resources + Fail-Fast Validation)**
|
||||
|
||||
This section finalizes Step 4 by defining a production-ready docs registry loader that reads packaged docs through Python resource APIs, parses SKILL.md frontmatter, validates schema and cross-links, and builds an immutable in-memory registry keyed by skill_id.
|
||||
|
||||
### Research Baseline (Python + Design Guidance)
|
||||
|
||||
Authoritative references used for this step:
|
||||
|
||||
1. Python `importlib.resources` docs (`files`, `as_file`, `Traversable` APIs)
|
||||
2. Python `importlib.resources.abc` docs (`Traversable`, path traversal semantics, joinpath compatibility notes)
|
||||
3. Pydantic v2 model/validation docs (`model_validate`, `ValidationError`, strictness and extra handling)
|
||||
4. Python packaging guidance for including package data in wheels/sdists
|
||||
|
||||
Best-practice conclusions applied to this design:
|
||||
|
||||
1. Prefer `importlib.resources.files(<package>).joinpath(...)` over filesystem assumptions so stdio deployments from installed wheels work.
|
||||
2. Treat resources as potentially non-filesystem artifacts (zip-import compatible); only use `as_file(...)` when an actual OS path is required.
|
||||
3. Validate metadata with explicit Pydantic models and fail startup on contract violations.
|
||||
4. Keep registry load deterministic (sorted traversal, stable error messages, no hidden fallback mutations).
|
||||
5. Resolve references via manifest ids declared in frontmatter, not by global file conventions.
|
||||
|
||||
### Loader Responsibilities (Normative)
|
||||
|
||||
The Step 4 loader MUST:
|
||||
|
||||
1. Read canonical docs from package resources (not repo-root paths).
|
||||
2. Discover all skill directories under `docs/skills/` in packaged resources.
|
||||
3. For each skill, read and parse `SKILL.md` frontmatter.
|
||||
4. Validate frontmatter using the Step 2 schema contract.
|
||||
5. Validate directory/id invariants from Step 1 (directory name equals frontmatter id).
|
||||
6. Validate URI/reference semantics from Step 3 assumptions.
|
||||
7. Build a single in-memory registry keyed by `skill_id`.
|
||||
8. Fail fast on any integrity error before FastMCP resource registration.
|
||||
9. Precompute compact discovery projections so index resources can be served without reading full markdown bodies at request time.
|
||||
|
||||
### Package Resource Contract
|
||||
|
||||
Runtime anchor:
|
||||
|
||||
1. The loader resolves content from an importable package anchor, for example `personal_mcp`.
|
||||
2. Docs root is located as `files(anchor).joinpath("docs")` when docs are packaged at package root, or an equivalent configured subpath.
|
||||
3. Skill root is `docs/skills`.
|
||||
|
||||
Resource assumptions:
|
||||
|
||||
1. `SKILL.md` is UTF-8 text.
|
||||
2. Reference files declared in frontmatter are UTF-8 markdown by default unless otherwise declared.
|
||||
3. Path resolution always remains inside the same skill directory.
|
||||
|
||||
### Registry Data Model
|
||||
|
||||
Build immutable runtime records with explicit structure:
|
||||
|
||||
1. `SkillRecord`
|
||||
- `skill_id`
|
||||
- `name`
|
||||
- `description`
|
||||
- `version`
|
||||
- `tags`
|
||||
- `capabilities`
|
||||
- `depends_on`
|
||||
- `document_uri`
|
||||
- `document_relpath` (canonical resource-relative path)
|
||||
- `references` map keyed by `ref_id`
|
||||
2. `ReferenceRecord`
|
||||
- `ref_id`
|
||||
- `uri`
|
||||
- `relpath`
|
||||
- `mime_type`
|
||||
- `title`
|
||||
3. `DocsRegistry`
|
||||
- `skills_by_id: dict[str, SkillRecord]`
|
||||
- `skills_in_load_order: list[str]` (deterministic ordering)
|
||||
- helper indexes for catalog payload generation
|
||||
- `skills_summary_in_load_order: list[SkillSummaryRecord]` for progressive discovery responses
|
||||
- filter indexes (for example by tag/capability) derived once at startup
|
||||
|
||||
4. `SkillSummaryRecord`
|
||||
- `skill_id`
|
||||
- `name`
|
||||
- `description`
|
||||
- `tags`
|
||||
- `capabilities`
|
||||
- `document_uri`
|
||||
- optional `version`
|
||||
|
||||
Immutability rule:
|
||||
|
||||
1. Once built, registry records are treated as read-only for the process lifetime.
|
||||
2. No runtime mutation during requests; refresh only via process restart.
|
||||
|
||||
### Frontmatter Parsing Contract
|
||||
|
||||
`SKILL.md` parse steps:
|
||||
|
||||
1. Read full markdown text from resource.
|
||||
2. Parse YAML frontmatter block at file start (between the first two `---` delimiters).
|
||||
3. Parse YAML with safe loader semantics.
|
||||
4. Validate parsed object with Step 2 Pydantic model(s).
|
||||
5. Preserve markdown body as document content payload.
|
||||
|
||||
Parsing failure behavior:
|
||||
|
||||
1. Missing frontmatter block: startup error.
|
||||
2. Invalid YAML: startup error with skill path and YAML parser detail.
|
||||
3. Missing required fields (`name`, `description`, `x-personal-mcp` contract fields): startup error.
|
||||
|
||||
### Validation Pipeline (Fail-Fast)
|
||||
|
||||
Validation happens in this order:
|
||||
|
||||
1. Structural discovery validation
|
||||
- skill directory exists under `docs/skills`
|
||||
- required `SKILL.md` exists for each discovered skill
|
||||
2. Schema validation
|
||||
- Pydantic frontmatter validation for all required and constrained fields
|
||||
3. Identity validation
|
||||
- frontmatter `name` equals `x-personal-mcp.id`
|
||||
- frontmatter id equals skill directory name
|
||||
4. Reference manifest validation
|
||||
- unique `ref_id` keys per skill
|
||||
- each manifest path is relative, in-skill, and under `references/`
|
||||
- each manifest target exists and is a file
|
||||
5. Dependency graph validation
|
||||
- every `depends_on` target exists in discovered skill set
|
||||
- no self-dependency
|
||||
- cycle detection enabled (hard error on cycle)
|
||||
6. Capability sanity checks
|
||||
- required primary capability `resource://skills/{skill_id}/document` is present
|
||||
7. Global uniqueness checks
|
||||
- no duplicate `skill_id`
|
||||
- no duplicate canonical resource URIs generated from registry
|
||||
8. Discovery payload checks
|
||||
- summary fields required by catalog index are present and non-empty
|
||||
- summary generation does not require reading markdown body content during request handling
|
||||
|
||||
### Error Model and Reporting
|
||||
|
||||
Error handling contract:
|
||||
|
||||
1. Collect errors per validation phase for clarity, then raise one startup exception containing all findings.
|
||||
2. Error messages must include:
|
||||
- skill id (when known)
|
||||
- packaged relative path
|
||||
- violated rule
|
||||
- actionable fix hint
|
||||
3. If any error exists, registry is not published and FastMCP resource registration does not proceed.
|
||||
|
||||
Recommended exception shape:
|
||||
|
||||
1. `DocsRegistryValidationError(errors: list[RegistryIssue])`
|
||||
2. `RegistryIssue` fields: `code`, `message`, `skill_id`, `path`, `hint`
|
||||
|
||||
### Determinism and Runtime Safety
|
||||
|
||||
Determinism rules:
|
||||
|
||||
1. Traverse directories in sorted order.
|
||||
2. Normalize all stored relative paths to POSIX form.
|
||||
3. Normalize ids/tags exactly once at parse boundary.
|
||||
4. Produce stable catalog ordering to reduce client churn.
|
||||
5. Produce stable summary projections and filter indexes from the same normalized source records.
|
||||
|
||||
Runtime safety rules:
|
||||
|
||||
1. No dependence on `Path(__file__)` or repository root.
|
||||
2. No ad-hoc fallback probing across multiple locations.
|
||||
3. No lazy validation deferred until first request.
|
||||
|
||||
### Integration Plan for Existing Modules
|
||||
|
||||
Primary integration target:
|
||||
|
||||
1. Replace path-based logic in `src/personal_mcp/skills/document_loader.py` with package-resource-based registry loading.
|
||||
|
||||
Catalog integration:
|
||||
|
||||
1. Update `src/personal_mcp/catalog/server.py` to consume the shared in-memory registry instead of scanning `metadata.yaml` files.
|
||||
2. Keep catalog payload normalization deterministic and sourced from registry records only.
|
||||
|
||||
Startup wiring:
|
||||
|
||||
1. Initialize registry once during app/server startup in `src/personal_mcp/main.py` or equivalent composition point.
|
||||
2. Pass registry to resource registration step (Step 5).
|
||||
|
||||
### Proposed Loader API Surface
|
||||
|
||||
Use a small, testable API:
|
||||
|
||||
1. `load_docs_registry(*, package_anchor: str, docs_root: str = "docs") -> DocsRegistry`
|
||||
2. `read_skill_document(registry: DocsRegistry, skill_id: str) -> DocumentPayload`
|
||||
3. `read_skill_reference(registry: DocsRegistry, skill_id: str, ref_id: str) -> DocumentPayload`
|
||||
|
||||
Design constraints:
|
||||
|
||||
1. Loader functions are pure relative to package resources and input args.
|
||||
2. No global mutable singleton required for unit tests.
|
||||
3. Caching is explicit and owned by startup composition.
|
||||
|
||||
### Test and Validation Plan (Step 4 Scope)
|
||||
|
||||
Unit tests:
|
||||
|
||||
1. valid multi-skill registry load from packaged test fixtures
|
||||
2. duplicate id detection
|
||||
3. missing SKILL.md detection
|
||||
4. invalid frontmatter field constraints
|
||||
5. broken reference target detection
|
||||
6. invalid depends_on target detection
|
||||
7. cycle detection in depends_on graph
|
||||
8. deterministic output ordering across runs
|
||||
|
||||
Packaging/runtime tests:
|
||||
|
||||
1. install built wheel in isolated env
|
||||
2. load registry via `importlib.resources.files(...)`
|
||||
3. assert representative skill document/reference are readable
|
||||
|
||||
Expected command path in this repo:
|
||||
|
||||
1. `uv run pytest -q`
|
||||
|
||||
### Acceptance Criteria for Step 4 Completion
|
||||
|
||||
Step 4 is complete when all are true:
|
||||
|
||||
1. Registry loads exclusively from packaged resources.
|
||||
2. All Step 2 and Step 3 dependent validations are enforced at startup.
|
||||
3. Invalid docs state blocks startup with actionable diagnostics.
|
||||
4. Registry is deterministic and immutable for runtime use.
|
||||
5. Catalog and later resource registration can consume registry without direct filesystem scanning.
|
||||
|
||||
### Non-goals for Step 4
|
||||
|
||||
1. No FastMCP resource registration wiring details (Step 5).
|
||||
2. No discovery-tool fallback behavior design (Step 6).
|
||||
3. No final packaging/build-system migration mechanics (Step 7).
|
||||
4. No backward-compat alias rollout mechanics beyond validation readiness.
|
||||
Reference in New Issue
Block a user