Content modeling manual

Audience: system administrators, developers

For a generic introduction to content modeling in Pocket Archive, see the content modeling primer

Core schema and predefined content types

Some content types are considered part of the core functionality of Pocket Archive. These are defined by the configurations in core_schema which comes with the Pocket Archive installation and should not be altered. These core types include the foundational types, such as Anything, Artifact, File, etc.

The core types are extensible in the user configuration by adding properties to them. A configuration that extends a core schema MUST have the core attribute set to true and no other top-level attribute set, except for properties.

Pocket Archive ships with a sample configuration including extensions of core content types. For some very simple archives, this may be enough to get started with little or no customization. For a setup which needs to define more numerous or complex content types in a more articulated way, additional types can be defined. Please look at the default model configuration files that come with Pocket Archive.

Each type definition is encoded in a configuration file defining a single content category type. One doesn't have to define all possible types in detail. Pocket Archive provides some basic types, e.g.: Anything (the super-class of them all), Artifact, File, Part, which should not be radically altered, because some basic functionality of the system relies on them. To add more specific definitions, subtypes can be defined. A subtype inherits all the property definitions of its broader model, and adds more specific behavior. An example classification could be: Anything -> File -> Image File -> Scientific Image. Each of the sub-types would only define the special properties of that definition, which add to, or replace, the properties of its broader definitions.

All resources in Pocket Archive must be assigned a content type. If someone has to deal with a resource that doesn't fit in any of the predefined content models, they can asign it the most specific type that they can. At worst, they can put it under Anything. Of course, if one starts dealing with many unclassifiable resources that look similar, it's probably best to define a model for them; but that is not mandatory.

In addition to these three mandatory components, a property can have an optional description. This, when present, shows in the presentation as an "info" icon and a pop-up with a concise explanation of the property's scope and purpose.

Constraints

Each metadata field can be specified by constraints. These constraints can be on:

  • Type: the data type for the field, e.g. string, number, resource (relationship), etc.
  • Cardinality: how many values can be set for a field, for each resource. These values can be adjusted to set mandatory fields, single-valued fields, etc.
  • Range: the range of values allowed. How this is interpreted depends on the data type: for a number can be a min/max range, for a string a regular expression pattern, for a resource the type(s) of the resources pointed to, etc.

All of these constraints are optionals. Fields that are not defined may accept any number of values, and are optional. So it's up to the repository manager to decide how specific or how free-form their archive should be.

Note that fields that are not defined at least by a label, may be hard to understand by users browsing the discovery interface.

There are subtle but important differences between the two, and keeping a separation between an artifact and its related file(s) is key to a sustainable long-term archival strategy. A file (also called opaque resource in Pocket Archive) is simply stored and cataloged by Pocket Archive, which knows nothing about its contents.