9.5 KiB
Step 4 Results: Docs Registry Loader Design (importlib.resources + Fail-Fast Validation)
This section finalizes Step 4 by defining a production-ready docs registry loader that reads packaged docs through Python resource APIs, parses SKILL.md frontmatter, validates schema and cross-links, and builds an immutable in-memory registry keyed by skill_id.
Greenfield Framing (Normative)
This Step 4 design is for the greenfield target state:
- No legacy metadata sidecars (
metadata.yaml) are part of the runtime contract. - No dual-loader compatibility path is required.
- Registry loading from packaged resources is the only runtime source of truth.
- Compatibility shims are prohibited.
Research Baseline (Python + Design Guidance)
Authoritative references used for this step:
- Python
importlib.resourcesdocs (files,as_file,TraversableAPIs) - Python
importlib.resources.abcdocs (Traversable, path traversal semantics, joinpath compatibility notes) - Pydantic v2 model/validation docs (
model_validate,ValidationError, strictness and extra handling) - Python packaging guidance for including package data in wheels/sdists
Best-practice conclusions applied to this design:
- Prefer
importlib.resources.files(<package>).joinpath(...)over filesystem assumptions so stdio deployments from installed wheels work. - Treat resources as potentially non-filesystem artifacts (zip-import compatible); only use
as_file(...)when an actual OS path is required. - Validate metadata with explicit Pydantic models and fail startup on contract violations.
- Keep registry load deterministic (sorted traversal, stable error messages, no hidden fallback mutations).
- Resolve references via manifest ids declared in frontmatter, not by global file conventions.
Loader Responsibilities (Normative)
The Step 4 loader MUST:
- Read canonical docs from package resources (not repo-root paths).
- Discover all skill directories under
docs/skills/in packaged resources. - For each skill, read and parse
SKILL.mdfrontmatter. - Validate frontmatter using the Step 2 schema contract.
- Validate directory/id invariants from Step 1 (directory name equals frontmatter id).
- Validate URI/reference semantics from Step 3 assumptions.
- Build a single in-memory registry keyed by
skill_id. - Fail fast on any integrity error before FastMCP resource registration.
- Precompute compact discovery projections so index resources can be served without reading full markdown bodies at request time.
Package Resource Contract
Runtime anchor:
- The loader resolves content from an importable package anchor, for example
personal_mcp. - Docs root is located as
files(anchor).joinpath("docs")when docs are packaged at package root, or an equivalent configured subpath. - Skill root is
docs/skills.
Resource assumptions:
SKILL.mdis UTF-8 text.- Reference files declared in frontmatter are UTF-8 markdown by default unless otherwise declared.
- Path resolution always remains inside the same skill directory.
Registry Data Model
Build immutable runtime records with explicit structure:
-
SkillRecordskill_idnamedescriptionversiontagscapabilitiesdepends_ondocument_uridocument_relpath(canonical resource-relative path)referencesmap keyed byref_id
-
ReferenceRecordref_idurirelpathmime_typetitle
-
DocsRegistryskills_by_id: dict[str, SkillRecord]skills_in_load_order: list[str](deterministic ordering)- helper indexes for catalog payload generation
skills_summary_in_load_order: list[SkillSummaryRecord]for progressive discovery responses- filter indexes (for example by tag/capability) derived once at startup
-
SkillSummaryRecordskill_idnamedescriptiontagscapabilitiesdocument_uri- optional
version
Immutability rule:
- Once built, registry records are treated as read-only for the process lifetime.
- No runtime mutation during requests; refresh only via process restart.
Frontmatter Parsing Contract
SKILL.md parse steps:
- Read full markdown text from resource.
- Parse YAML frontmatter block at file start (between the first two
---delimiters). - Parse YAML with safe loader semantics.
- Validate parsed object with Step 2 Pydantic model(s).
- Preserve markdown body as document content payload.
Parsing failure behavior:
- Missing frontmatter block: startup error.
- Invalid YAML: startup error with skill path and YAML parser detail.
- Missing required fields (
name,description,x-personal-mcpcontract fields): startup error.
Validation Pipeline (Fail-Fast)
Validation happens in this order:
- Structural discovery validation
- skill directory exists under
docs/skills - required
SKILL.mdexists for each discovered skill
- skill directory exists under
- Schema validation
- Pydantic frontmatter validation for all required and constrained fields
- Identity validation
- frontmatter
nameequalsx-personal-mcp.id - frontmatter id equals skill directory name
- frontmatter
- Reference manifest validation
- unique
ref_idkeys per skill - each manifest path is relative, in-skill, and under
references/ - each manifest target exists and is a file
- unique
- Dependency graph validation
- every
depends_ontarget exists in discovered skill set - no self-dependency
- cycle detection enabled (hard error on cycle)
- every
- Capability sanity checks
- required primary capability
resource://skills/{skill_id}/documentis present
- required primary capability
- Global uniqueness checks
- no duplicate
skill_id - no duplicate canonical resource URIs generated from registry
- no duplicate
- Discovery payload checks
- summary fields required by catalog index are present and non-empty
- summary generation does not require reading markdown body content during request handling
Error Model and Reporting
Error handling contract:
- Collect errors per validation phase for clarity, then raise one startup exception containing all findings.
- Error messages must include:
- skill id (when known)
- packaged relative path
- violated rule
- actionable fix hint
- If any error exists, registry is not published and FastMCP resource registration does not proceed.
Recommended exception shape:
DocsRegistryValidationError(errors: list[RegistryIssue])RegistryIssuefields:code,message,skill_id,path,hint
Determinism and Runtime Safety
Determinism rules:
- Traverse directories in sorted order.
- Normalize all stored relative paths to POSIX form.
- Normalize ids/tags exactly once at parse boundary.
- Produce stable catalog ordering to reduce client churn.
- Produce stable summary projections and filter indexes from the same normalized source records.
Runtime safety rules:
- No dependence on
Path(__file__)or repository root. - No ad-hoc fallback probing across multiple locations.
- No lazy validation deferred until first request.
Integration Plan for Existing Modules
Primary integration target:
- Implement the canonical package-resource-based registry loader in
src/personal_mcp/skills/document_loader.pyas the only supported runtime loader path.
Catalog integration:
- Update
src/personal_mcp/catalog/server.pyto consume the shared in-memory registry as the only catalog data source. - Keep catalog payload normalization deterministic and sourced from registry records only.
Startup wiring:
- Initialize registry once during app/server startup in
src/personal_mcp/main.pyor equivalent composition point. - Pass registry to resource registration step (Step 5).
Proposed Loader API Surface
Use a small, testable API:
load_docs_registry(*, package_anchor: str, docs_root: str = "docs") -> DocsRegistryread_skill_document(registry: DocsRegistry, skill_id: str) -> DocumentPayloadread_skill_reference(registry: DocsRegistry, skill_id: str, ref_id: str) -> DocumentPayload
Design constraints:
- Loader functions are pure relative to package resources and input args.
- No global mutable singleton required for unit tests.
- Caching is explicit and owned by startup composition.
Test and Validation Plan (Step 4 Scope)
Unit tests:
- valid multi-skill registry load from packaged test fixtures
- duplicate id detection
- missing SKILL.md detection
- invalid frontmatter field constraints
- broken reference target detection
- invalid depends_on target detection
- cycle detection in depends_on graph
- deterministic output ordering across runs
Packaging/runtime tests:
- install built wheel in isolated env
- load registry via
importlib.resources.files(...) - assert representative skill document/reference are readable
Expected command path in this repo:
uv run pytest -q
Acceptance Criteria for Step 4 Completion
Step 4 is complete when all are true:
- Registry loads exclusively from packaged resources.
- All Step 2 and Step 3 dependent validations are enforced at startup.
- Invalid docs state blocks startup with actionable diagnostics.
- Registry is deterministic and immutable for runtime use.
- Catalog and later resource registration can consume registry without direct filesystem scanning.
Non-goals for Step 4
- No FastMCP resource registration wiring details (Step 5).
- No discovery-tool fallback behavior design (Step 6).
- No final packaging/build-system migration mechanics (Step 7).
- No backward-compat alias rollout mechanics in the greenfield baseline.
- No compatibility layer of any kind (URI aliases, dual reads, adapter shims, or legacy schema bridges).