Radio Haiti Project Plan
Project Summary
Objective | Primary: Launch Radio Haiti digital audio, video and tri-lingual metadata in DDR-Public. Secondary: Allow easy access to audio/video from smartphones (for the Haitian audience) |
---|---|
Dependencies | /wiki/spaces/DDR/pages/11599922, OHMS-like interface, hi-fi / lo-fi derivatives |
Out of Scope | Captioning |
Timeline | Requirements freeze by February 1, 2017 Launch in June 2017 |
Size of Collection | Total project size: ~5360 audio and video files (master and derivative), 8.5 TB total Total to be launched in June: TBD |
Working Description spreadsheet | https://docs.google.com/spreadsheets/d/1FDdviJHoLIR2ju3I-Akdoe23JLQSSty-EPLIIsln064/edit?usp=sharing |
Project Team | Craig Breaden (Unlicensed) and Laura Wagner, Co-Champions Laura Wagner: Processing collection and creating metadata Molly Bragg (old account) (Unlicensed), Project Manager Ginny Boyer (Deactivated), Software Development Manager Maggie Dickson (old account) (Unlicensed), Metadata Architect Content Ingest Specialists and Will Sexton (old account) (Unlicensed), Ingest files into DDR Enterprise Services Development Team Cory Lown (Deactivated), Sean Aery (Old Account) (Unlicensed), Jim Coble (Unlicensed), David Chandek-Stark (old account) (Unlicensed), Ayse Durmaz (old) (Unlicensed), Jack Hill (Unlicensed), Developers Alex Marsh (old account) (Unlicensed), Digitization Specialist Video Zeke Graves (old account) (Unlicensed), Digitization Specialist Audio Cutting Corp (Aaron Coe), Audio digitization vendor National Endowment for the Humanities: Granting Agency and stakeholder |
Detailed Project Information
Content Analysis
Material to be digitized:
Collection(s): Radio Haiti Recordings, 1957-2003
Format: audio reels, audio cassettes, VHS, betacam
Number of files:
~5300 master audio files
32 mov, 32 mp4
What is an item?
A file is an item (1 cassette may have more than 1 item)
Rights issues (see rights details below for more info):
Permission granted by Radio Haiti to digitize and publish Radio Haiti materials.
Recordings include a fair amount of 3rd party owned content and project team has been exercising due diligence to investigate and clear these. Often the content is intermixed.
The video content is the raw footage of the Agronomist (film by Jonathan Demme).
Will need to confer with Dave Hansen on rights and appropriate rights statements.
Scope:
All unique A/V items will or have been digitized.
Some items will not be available for public access.
Notes:
Digitization and File Details
Audio
Vendor: Cutting Corporation
Data produced: 5 TB
Digitization began in 2015?
File naming convention: [collection number][rr or cs][item 4 digits]_[side 2 digits]
RL10059CS0001_01
Files from vendor include hyphens: RL10059-CS-0001_01 and can be converted.
Do we have a preference as to which format we use?
Should be consistent w/ in the collection - so use hyphens or not hyphens - make it match whatever it is supposed to be.
File Formats, specs and numbers:
Archival masters: 5300 Wav at 24 bit 96 khz in stereo
Mezzanine: mp3s at 24 bit 48 khz in stereo
These are working copies only - will not be preserved or served up.
Not all files have a mezzanine copy.
Edited version with dead air removed.
Timecodes are all generated from the mezzanine.
Derivatives: 5300 mp3 at 64 kbs in mono
Some of the derivatives are edited versions of the masters.
Timecodes are generated from the derivatives.
- Location pre-ingest: Cifs 14 (masters there as of 11/1 - derivatives will be soon)
Video
Digitized in the DUL DPC, more details available on video project plan.
Data produced: 3.5 TB
Duration: September 2016
File naming convention: DPC file naming
File Formats, specs and numbers:
Archival masters: 32 uncompressed .mov files
Derivatives 1: 32 mp4 at 720 x 480, 2300 kbps (following DPC standard)
Derivatives 2: 32 mp4 at 320 x 240, 8 bit 1000 kbps
only provide access to derivatives 2?
Location pre-ingest: DPC-Work\derivatives\rhv
Total Data footprint: ~8.5 tb
Relationships between items
Programs that span files/cassettes:
These items are identified as Is_Part_Of English/Creole/French in the google spreadsheet.
In the interface, related items will be clickable from the bottom of an item’s page and labeled as “related items”.
Per meeting 12/7/16, LES will implement mock-up v. 1 (with the blocks - see below). Expected behavior: on mouse-over display additional metadata for item.
The trial which spans multiple cassettes and dates:
These items are identified as Is_Part_Of English/Creole/French in the google spreadsheet
In the interface, display 3 related items (per v. 1 mockup), and show a link to the rest of the items as a search result. The results will display in order assuming the following.
Laura will title the trial items consistently and numerically, ex: Title (1), Title (2)
Cory will fix numeric title sorting.
(will be changed to show only 3 results then a link)
Items that are identical to other items:
These items are identified as “is format of” in the google spreadsheet.
Copies of items will display just as related items do except they will be labeled as “other versions”. See copies.png v. 1 below.
Ingest
- Ingest Lead(s): Susan Ivey and Moira Downey
- Timeline:
- 1st Batch - 349 items, ingest May 2017
- Subsequent batches - TBD
Accessibility
Low Bandwidth Accessibility:
Ideally collection will be accessible via smart phone over low bandwidths for use by Haitian public. Haitian public uses smart phones and accesses content in 15 minute chunks. They use apps predominantly, especially YouTube.
LES team to implement new tools for A/V in 2017. Once the new system is in place, Radio Haiti team will need to assess if accessibility to DDR-Public remains an issue.
If accessibility remains an issue, Radio Haiti team will discuss uploading the low-resolution derivatives to YouTube with Haitian metadata.
Closed Captions / transcripts
Collection will not be transcribed or have closed captions in the immediate future owing the cost of transcribing/translating Haitian language materials.
Metadata
Timeline
Description Owner: Laura Wagner
Existing description for collection: https://docs.google.com/spreadsheets/d/1FDdviJHoLIR2ju3I-Akdoe23JLQSSty-EPLIIsln064/edit?usp=sharing
Timeline:
1st batch: ingested May 2017
Subsequent metadata batches: TBD
Special description features:
Metadata includes English, French and Creole (see below)
Time-coded metadata to be synced with media (see below)
Metadata Profile and Functionality
Title
Titles will be translated in English, French and Creole
Craig and Laura will indicate which version of any given item’s title is the main title. Main title will display as title, the other 2 titles in alternate title fields.
See mock-up: alt-title-item and result-alt-title below
Alternative Title
Will include alternate translations of the main title
Description
Will include Program_Description_Creole, Program_Description_French, and Program_Description_English from the spreadsheet.
When all three languages are present, they will be displayed as 3 different blocks of text. Some items include timecodes for use later in an OMHS-like interface. .
See alt-title-item for a mock up of the 3 descriptive paragraphs above.
Subject
Subject_Topic, Subject_Name, and Subject_Place from the spreadsheet will be mapped to subject.
Topics contain a mix of terms in various languages.
Topics and subjects are separate in the Radio Haiti Google sheet, but will be combined into one field in DDR.
Speaker
Speakers from the spreadsheet.
Stored in dc:creator field; displays as ‘Speaker’
Language
We will store the language codes but display the human-readable language terms.
Rights
Rights information will be available in English, French and Creole.
Recordings rights fall into roughly 3 categories:
Radio Haiti exclusive rights
Radio Haiti not exclusive - rights have been cleared
Radio Haiti not exclusive - rights not cleared
Small amount of copyrighted material (just a few seconds)
Not yet determined
determine if we can use CC licenses for some items, and which rights statement applies to other statements.
Metadata wireframe
- Complete wireframe approved during 2017-03-06 Radio Haiti Metadata Profile Finalization meeting:
Search and Display functionality in DDR-Public
French and Creole language metadata should be searchable
Question: can the interface pick up versions of words with and w/ out accents. For example, metadata is correct w/ accents, but users may not enter the accents. Side note: Laura has found that googling in French works with and w/ out accents.
Creole orthography (how it is spelled) has changed, so words that sound the same can be spelled in different ways and with different apostrophes:
M p ap ka ale -- now the correct version
M’ p’ap ka ale -- older version
M pap ka ale -- not correct, but this is how people spell things
Laura has experienced issues w/ Google where she enters a song name, but it is actually written differently than the way she googles.
DDR-Public can store and display multi-lingual metadata
Search functionality in DDR-Public enables searching with and w/out accents.
Time-coded Metadata
Some items will have sub-item description that will correspond to time-codes in the item.
Sub item metadata is in the description field of the metadata interface. It is coded for easy parsing later.
Previously this data has been crosswalked into XML for OHMS. OHMS doesn’t work w/ SRT captions.
Displaying Time-coded metadata
Ultimately, this metadata should be displayed in an “OHMS-like interface”
OHMS = Oral History Metadata Synchronizer (http://www.oralhistoryonline.org/). It enables time-coded markers in A/V that can be linked to metadata or transcripts. The end result is that the user can navigate through an A/V item using the markers. Duke digital collections have implemented this feature into the H. Lee Waters digital collection: http://library.duke.edu/digitalcollections/ohms-viewer/viewer.php?cachefile=hleewaters/rl10075dbcam0011010.xml
The tool has some limitations: cannot work with SRT/WebVTT, not as visually appealing as other tools, unclear development road map. There are likely others.
NOTE: Project stakeholders would rather have the right “OMHS-like interface” even if it comes after June 2017. They do not want to repeat the work.
Portal Configuration
Configured - details to be added following 5/9/17 email to champions
Facets:
Program Name (category);
Speaker (creator);
Date (date);
Program Type (this will still be format - Maggie is not splitting them out);
Subject (subject);
Location (spatial)
Language (language - should be configured in time for a pre-RH release so we will display language name and not just code)
List of fields that should be displayed on items:
See metadata profile above
Customization of metadata for display:
See metadata profile above
Collection Configuration
Non-public items:
Some items in the collection should not be launched for public access.
Non-public items are indicated by [what] in the google spreadsheet.
Collection highlights (supplied by champion)
[list 2 or more images file names to display in the collection banner]
[list 4 or more image file names to highlight in a grid -- optional]
Collection Level Metadata (supplied by champion)
Summary Capsule: [should describe the part of the collection which is digitized - will be supplied by champion]
“About the collection”: will come from finding aid abstract
Collection Preview with Champion
Preview Date:
Attendees:
Changes required:
[note any last minute changes requested during collection preview]