Metadata Schema Configuration Guide
This guide provides comprehensive documentation for the metadata-schema.yml
file, which serves as the central configuration for all metadata operations in the Notebook Automation toolkit.
Overview
The metadata-schema.yml
file defines the structure, validation rules, and behavior for all metadata processing operations. It replaces the legacy metadata.yaml
approach with a unified, extensible schema system that supports:
- Template Type Definitions: Structured schema for different content types
- Field Value Resolvers: Dynamic field population through registered resolvers
- Inheritance System: Recursive inheritance from base types and universal fields
- Reserved Tag Logic: Protected system tags with automatic validation
- Type Mapping: Canonical type normalization and aliasing
Note: As of the schema-driven metadata pipeline, all processors (PDF, Video, etc.) build frontmatter via this schema and a registry of resolvers. See “Pipeline Integration” and “Required Resolvers” below.
Schema Loader and Registry Pattern
The MetadataSchemaLoader serves as the central component for schema-driven metadata automation, supporting:
- Unified Schema File: Single
metadata-schema.yml
configuration for all metadata operations - Registry Pattern: Dynamic registration and runtime extension of field value resolvers
- Plugin Extensibility: Support for custom resolvers loaded from DLL plugins
- Inheritance System: Recursive template type inheritance with base types and universal fields
- Validation: Robust schema validation with reserved tag enforcement
// Load schema and register resolvers
var schemaLoader = new MetadataSchemaLoader("config/metadata-schema.yml", logger);
schemaLoader.LoadResolversFromDirectory("./resolvers");
// Access template definitions
var pdfSchema = schemaLoader.TemplateTypes["pdf-reference"];
// Resolve field values dynamically
var dateCreated = schemaLoader.ResolveFieldValue("pdf-reference", "date-created", context);
Pipeline Integration
The schema powers a unified metadata pipeline that all note processors use:
IMetadataPipeline
orchestrates: optional AI/legacy frontmatter parsing (viaIYamlHelper
), running context resolvers (hierarchy and course structure), file-type resolvers (PDF page-count, video duration), and the OneDrive share link resolver.IMetadataTemplateManager
applies template definitions from the schema using a template key (e.g.,pdf-reference
,video-reference
).- Merge precedence is enforced: CLI overrides > existing frontmatter (AI/legacy) > resolver-derived values > schema defaults.
- Required fields are validated after merge; violations are logged and should be surfaced to the caller.
For a conceptual overview, see the Metadata Extraction System document.
Reserved Tags and Universal Fields
The metadata schema system enforces consistent metadata through reserved tags and universal fields:
- Universal Fields: Automatically inherited by all template types (e.g.,
auto-generated-state
,date-created
,publisher
) - Reserved Tags: Protected system tags that cannot be overridden (e.g.,
case-study
,video
,pdf
) - Field Inheritance: Reserved tags are automatically included as fields in all template types
- Validation: Automatic validation prevents accidental overrides and ensures data integrity
Schema Structure:
# NOTE: All top-level keys must use PascalCase for C# compatibility
TemplateTypes:
pdf-reference:
BaseTypes:
- universal-fields
Type: note/case-study
RequiredFields:
- comprehension
- status
- tags
Fields:
date-created:
Resolver: DateCreatedResolver
share-link:
Resolver: OneDriveShareLinkResolver
status:
Default: unread
UniversalFields:
- auto-generated-state
- date-created
- publisher
ReservedTags:
- auto-generated-state
- case-study
- video
- pdf
Migration from Legacy metadata.yaml
Breaking Change: The system has migrated from legacy metadata.yaml
to the new metadata-schema.yml
format.
Key Changes:
- File extension changed from
.yaml
to.yml
- Schema structure unified under PascalCase top-level keys
- Template definitions restructured with inheritance support
- Reserved tag logic enforced automatically
Migration Steps:
- Update file references from
metadata.yaml
tometadata-schema.yml
- Convert template definitions to new schema structure
- Update code to use
MetadataSchemaLoader
instead of legacy template managers - Test reserved tag inheritance and validation
For detailed migration instructions, see the Migration Guide.
Related docs:
File Structure
Top-Level Sections
The schema file is organized into four main sections, all using PascalCase keys:
# NOTE: All top-level keys must use PascalCase for C# compatibility
TemplateTypes: # Template type definitions with inheritance
UniversalFields: # Fields inherited by all template types
TypeMapping: # Template type to canonical type mapping
ReservedTags: # Protected system tags
Case Sensitivity Requirements
Critical: All top-level YAML keys must use PascalCase (e.g., TemplateTypes
, UniversalFields
, TypeMapping
, ReservedTags
) to match C# property names. The deserializer is case-sensitive and will fail with incorrect casing.
TemplateTypes Section
Defines the schema for each template type, including inheritance, required fields, and field definitions.
Structure
TemplateTypes:
template-type-name:
BaseTypes: # Optional: List of base types to inherit from
- universal-fields
- other-template-type
Type: canonical-type-name # Canonical type for normalization
RequiredFields: # List of required fields for validation
- field-name-1
- field-name-2
Fields: # Field definitions with defaults and resolvers
field-name:
Default: default-value
Resolver: ResolverName
Example: PDF Reference Template
TemplateTypes:
pdf-reference:
BaseTypes:
- universal-fields
Type: note/case-study
RequiredFields:
- comprehension
- status
- completion-date
- authors
- tags
Fields:
publisher:
Default: University of Illinois at Urbana-Champaign
status:
Default: unread
comprehension:
Default: 0
date-created:
Resolver: DateCreatedResolver
share-link:
Resolver: OneDriveShareLinkResolver
title:
Resolver: TitleResolver
tags:
Default: [pdf, reference]
page-count:
Resolver: PdfPageCountResolver
Inheritance System
Template types support recursive inheritance through the BaseTypes
property:
Base Type Resolution
- universal-fields: Inherits all fields from the
UniversalFields
section - template-type-name: Inherits all fields from another template type
- Recursive: Base types are resolved recursively, supporting deep inheritance chains
Field Inheritance Rules
- Fields from base types are added only if they don't already exist in the derived type
- Universal fields are always included if not present
- Reserved tags are automatically injected as fields in all template types
- Field definitions in derived types override those in base types
Field Definitions
Each field in the Fields
section can specify:
Default Values
Static default values used when no resolver is present:
Fields:
status:
Default: unread
tags:
Default: [pdf, reference]
comprehension:
Default: 0
Resolvers
Dynamic value resolution through registered resolvers:
Fields:
date-created:
Resolver: DateCreatedResolver
page-count:
Resolver: PdfPageCountResolver
share-link:
Resolver: OneDriveShareLinkResolver
Resolver Lookup
The system supports flexible resolver lookup:
- Exact match: Looks for resolver with exact name
- Suffix match: Looks for registered resolvers ending with the specified name
- Fallback: Uses default value if no resolver is found
UniversalFields Section
Defines fields that are automatically inherited by all template types.
Structure (UniversalFields)
UniversalFields:
- auto-generated-state
- date-created
- publisher
Behavior (UniversalFields)
- Automatic Inheritance: All fields in this list are automatically added to every template type
- Reserved Tag Integration: Reserved tags are automatically included as universal fields
- Override Protection: Universal fields can be overridden by specific template type definitions
Example Usage
UniversalFields:
- auto-generated-state # Present in all template types
- date-created # Present in all template types
- publisher # Present in all template types
TypeMapping Section
Provides mapping from template type names to canonical type names for normalization.
Structure (TypeMapping)
TypeMapping:
template-type-name: canonical-type-name
Purpose
- Normalization: Maps custom or alias type names to canonical schema types
- Backwards Compatibility: Supports legacy type names while migrating to new schema
- Flexibility: Allows multiple template types to map to the same canonical type
Example (TypeMapping)
TypeMapping:
pdf-reference: note/case-study
video-reference: note/video-note
resource-reading: note/reading
note/instruction: note/instruction
ReservedTags Section
Defines protected system tags that cannot be overridden by custom metadata.
Structure (ReservedTags)
ReservedTags:
- tag-name-1
- tag-name-2
Behavior (ReservedTags)
- Protection: Reserved tags cannot be overridden or used for custom metadata
- Automatic Injection: Reserved tags are automatically injected as fields in all template types
- Validation: System validates that reserved tags are not overridden in custom metadata
Example (ReservedTags)
ReservedTags:
- auto-generated-state
- case-study
- live-class
- reading
- finance
- operations
- video
- pdf
## Required Resolvers
The following resolvers must be implemented and registered with the resolver registry used by the schema loader/pipeline:
- `DateCreatedResolver` — Provides `date-created` where not present
- `PdfPageCountResolver` — Populates `page-count` for PDFs
- `VideoDurationResolver` — Populates `video-duration` for videos
- `OneDriveShareLinkResolver` (required) — Populates `share-link` with a stable OneDrive sharing URL when files are under the OneDrive resources root
- Hierarchy and course structure adapters:
- `ProgramResolver`, `CourseResolver`, `ClassResolver`
- `ModuleResolver`, `LessonResolver`
If any required resolver is unavailable, validation should fail for templates that declare those fields.
Complete Example
Here's a complete example of a metadata-schema.yml
file:
## NOTE: All top-level keys must use PascalCase for C# compatibility
# Metadata Schema for Notebook Automation
# This file defines all template-types, type mappings, required fields, and reserved tags
TemplateTypes:
pdf-reference:
BaseTypes:
- universal-fields
Type: note/case-study
RequiredFields:
- comprehension
- status
- completion-date
- authors
- tags
Fields:
publisher:
Default: University of Illinois at Urbana-Champaign
status:
Default: unread
comprehension:
Default: 0
date-created:
Resolver: DateCreatedResolver
share-link:
Resolver: OneDriveShareLinkResolver
title:
Default: "PDF Note"
tags:
Default: [pdf, reference]
page-count:
Resolver: PdfPageCountResolver
video-reference:
BaseTypes:
- universal-fields
Type: note/video-note
RequiredFields:
- comprehension
- status
- video-duration
- author
- tags
Fields:
publisher:
Default: University of Illinois at Urbana-Champaign
status:
Default: unwatched
comprehension:
Default: 0
date-created:
Resolver: DateCreatedResolver
share-link:
Resolver: OneDriveShareLinkResolver
title:
Resolver: TitleResolver
tags:
Default: [video, reference]
video-duration:
Resolver: VideoDurationResolver
UniversalFields:
- auto-generated-state
- date-created
- publisher
TypeMapping:
pdf-reference: note/case-study
video-reference: note/video-note
resource-reading: note/reading
ReservedTags:
- auto-generated-state
- case-study
- live-class
- reading
- finance
- operations
- video
- pdf
Usage in Code
Loading the Schema
var schemaLoader = new MetadataSchemaLoader("config/metadata-schema.yml", logger);
Accessing Template Types
// Get template type schema
var pdfSchema = schemaLoader.TemplateTypes["pdf-reference"];
// Access template properties
var requiredFields = pdfSchema.RequiredFields;
var canonicalType = pdfSchema.Type;
var fields = pdfSchema.Fields;
Resolving Field Values
// Resolve field value with context
var context = new Dictionary<string, object> { ["user"] = "daniel" };
var dateCreated = schemaLoader.ResolveFieldValue("pdf-reference", "date-created", context);
// Get default value if no resolver
var defaultStatus = schemaLoader.ResolveFieldValue("pdf-reference", "status");
Registry Access
// Register custom resolver
schemaLoader.ResolverRegistry.Register("CustomResolver", new CustomFieldValueResolver());
// Load resolvers from directory
schemaLoader.LoadResolversFromDirectory("./resolvers");
// Get registered resolver
var resolver = schemaLoader.ResolverRegistry.Get("DateCreatedResolver");
Best Practices
Schema Design
- Use PascalCase for all top-level keys to ensure C# compatibility
- Define universal fields for metadata common to all template types
- Use reserved tags for system-critical fields that shouldn't be overridden
- Implement inheritance to reduce duplication and maintain consistency
- Provide meaningful defaults for all fields to ensure robust operation
Field Naming
- Use kebab-case for field names (e.g.,
date-created
,page-count
) - Use descriptive names that clearly indicate the field's purpose
- Avoid conflicts with reserved tags and universal fields
- Be consistent with naming conventions across template types
Resolver Implementation
- Implement IFieldValueResolver for custom field logic
- Handle null context gracefully in resolver implementations
- Use descriptive resolver names for easy identification
- Register resolvers before using them in field definitions
Validation
- Test schema loading in unit tests to catch configuration errors
- Validate required fields are present in all template instances
- Check reserved tag inheritance to ensure system integrity
- Test resolver registration and lookup functionality
Common Pitfalls
Case Sensitivity Issues
❌ Wrong: Using camelCase or snake_case
templateTypes: # Will fail - camelCase
template_types: # Will fail - snake_case
✅ Correct: Using PascalCase
TemplateTypes: # Correct - PascalCase
Missing Required Fields
❌ Wrong: Forgetting to define required fields
TemplateTypes:
pdf-reference:
Type: note/case-study
# Missing RequiredFields - validation will fail
✅ Correct: Defining required fields
TemplateTypes:
pdf-reference:
Type: note/case-study
RequiredFields:
- status
- tags
Resolver Not Found
❌ Wrong: Using unregistered resolver
Fields:
date-created:
Resolver: NonExistentResolver # Will fail at runtime
✅ Correct: Register resolver before use
schemaLoader.ResolverRegistry.Register("DateCreatedResolver", new DateCreatedResolver());
For more information, see the Migration Guide and API Reference.