TL;DR – Some “derivatives” in the DDR are created automatically by the system, and some are manually uploaded. Currently, the file information recorded in the repository does not reflect these different practices, and that could lead to accidental loss of data/work.
See discussion in - DDK-239Getting issue details... STATUS for some motivation.
Our file model tracks four attributes: digest
(SHA1), original_filename
, media_type
(a.k.a. MIME type), and file_identifier
, which is reference to the storage location of file.
My proposal is to add new attributes to supplement this information and help track files better.
source - string (optional) - The resource/file from which this file is derived
The value could be a full reference to the source as a URN:
urn:uuid:30ba2ca2-0ab1-4617-9c34-d5f8a9103c6f#content
We use the URN format to refer to resources in structural metadata already; this usage just adds the file reference as a sub-location[1].
creator - string (optional) - The entity responsible for the creation of the file
This would be the key attribute with respect to automated derivative generation, so it’s important that there be a reserved value to represent the repository itself. In the context of preservation events, we have used the term SYSTEM
.
[1] https://datatracker.ietf.org/doc/html/rfc8141#section-2.3.3