Proposal: Add file attributes

TL;DR – Some “derivatives” in the DDR are created automatically by the system, and some are manually uploaded. Currently, the file information recorded in the repository does not reflect these different practices, and that could lead to accidental loss of data/work.

See discussion in https://duldev.atlassian.net/browse/DDK-239 for some motivation.

Our file model tracks four attributes: digest (SHA1), original_filename, media_type (a.k.a. MIME type), and file_identifier, which is reference to the storage location of file.

My proposal is to add new attributes to supplement this information and help track files better.

source - string (optional) - The resource/file from which this file is derived

For files derived (either manually or automatically) from other repository files, the value of source could be a full reference to the repository file as a URN, for example:

urn:uuid:30ba2ca2-0ab1-4617-9c34-d5f8a9103c6f#content

(We use the URN format to refer to resources in structural metadata already; this usage just adds the file reference as a sub-location[1].)

creator - string (optional) - The entity responsible for the creation of the file

This would be the key attribute with respect to automated derivative generation, so it’s important that there be a reserved value to represent the repository itself. In the context of preservation events, we have used the term SYSTEM.

In cases where the file is created manually outside the repository, it would be the curator’s responsibility to assign values to these attributes, if desired. Various batch ingest and update processes, as well as certain admin UI elements may need to be updated to support this functionality.


[1] https://datatracker.ietf.org/doc/html/rfc8141#section-2.3.3