See discussion in - DDK-239Getting issue details... STATUS for some motivation.
TL;DR – Some “derivatives” in the DDR are created automatically by the system, and some are manually uploaded. Currently, the file information recorded in the repository does not reflect these different practices, and that could lead to accidental loss of data/work.
Our file model tracks four attributes: digest
(SHA1), original_filename
, media_type
(a.k.a. MIME type), and file_identifier
, which is reference to the storage location of file.
My proposal is to add a new attribute to supplement this information and help track files better.
derived_from - string (optional)
The purpose of this attribute would be twofold – first, to signal, by its presence, that a file was derived through an automated process from another file in the repository. Although the source file is often the original “content” file, this is not always the case, so my thought is to be explicity by encoding a full reference to the source as a URN:
urn:uuid:30ba2ca2-0ab1-4617-9c34-d5f8a9103c6f#content
We use the URN format in structural metadata already; this usage just adds the file reference as a sub-location.[1]
[1] https://datatracker.ietf.org/doc/html/rfc8141#section-2.3.3