This API provides methods for creating technical metadata for files in the DOR. It persists the technical metadata and allows it to be queried.
The metadata creation process runs Siegfried to determine which kind of file this is and then runs appropriate tools depending on the file type (e.g. exiftool, poppler, etc.)
Before this service is invoked, the files must be on the /dor/workspace
NFS mounts. Then this technical metadata service is invoked by the accessionWF technical-metadata robot by making a REST request. In the near term, the technical metadata service will directly update the workflow system after it has completed generating the technical metadata. Once this happens, the accessionWF can proceed and remove the files from the workspace. In the longer term, we would like to do this update via a messaging service so that it does not require the robots or need to be tightly coupled to the workflow service.
This will only store technical metadata for files in the current version; technical metadata for files that were in earlier versions and are not in the current version will be deleted.
In addition to the web service, the technical metadata can also be generated by using a pair of rake tasks. To generate technical metadata for an item run this:
$ bundler exec rake techmd:generate['druid:bc123df4567','spec/fixtures/test/0001.html spec/fixtures/test/bar.txt spec/fixtures/test/brief.pdf spec/fixtures/test/foo.jpg spec/fixtures/test/max.webm spec/fixtures/test/noam.ogg', 'true']
Success
This happens synchronously and will not update the workflow service.
To generate for an item from a Moab (from preservation storage):
$ bundler exec rake techmd:generate_for_moab['druid:bc123df4567', 'true']
Queued
Or from a list of druids (druid.txt
):
$ bundler exec rake techmd:generate_for_moab_list
Queued druid:bc123df4567
Background processing is performed by Sidekiq.
Sidekiq can be monitored from /queues. For more information on configuring and deploying Sidekiq, see this doc.
Basic monitoring and statistics are available from /.
The service includes a Rake task that outputs CSV for files belonging to druids (as specified in an argument to the rake task) if and only if the file has a duration
value in its audiovisual metadata. It outputs the druid, the filename, the MIME type, and the duration (in seconds):
$ RAILS_ENV=production bin/rake techmd:reports:media_durations[/tmp/druids.txt]
druid:bk586kk6146,cb147tv8205_pm.wav,audio/x-wav,1683.739
druid:bk586kk6146,cb147tv8205_sh.wav,audio/x-wav,1646.118
druid:bk586kk6146,cb147tv8205_sl.m4a,application/mp4,1646.179
druid:cm856pm4228,gt507vy5436_sl.mp4,application/mp4,3816.201
druid:ck227dm7693,bb761mb4522_FV4298_eng_sl.mp4,application/mp4,621.0
druid:ck227dm7693,bb761mb4522_FV4298_ger_sl.mp4,application/mp4,621.0
druid:ck227dm7693,bb761mb4522_FV4298_v1_sl.mp4,application/mp4,620.72
druid:ck227dm7693,bb761mb4522_FV4298_v2_sl.mov,video/quicktime,620.96
druid:ck227dm7693,bb761mb4522_FV4298_v3_sl.mp4,application/mp4,621.014
druid:ck227dm7693,bb761mb4522_FV4298_v4_sl.mp4,application/mp4,620.96
druid:nr582tm3161,Redivis_GMT20220303-205959_Recording_1920x1186.mp4,application/mp4,3322.912
druid:nr582tm3161,Redivis_GMT20220303-205959_Recording.mp4,application/mp4,3322.912
druid:pf759xf5671,qf378nj5000_sh.mpeg,video/mpeg,2261.04
druid:pf759xf5671,qf378nj5000_sl.mp4,application/mp4,2294.956
druid:rz125dy0428,bw689yg2740_sl.mp4,application/mp4,5080.485
where /tmp/druids.txt
looks like:
druid:bk586kk6146
druid:cm856pm4228
foobar
druid:ck227dm7693
druid:nr582tm3161
druid:pf759xf5671
druid:rz125dy0428
druid:bf342vg1682
Siegfried (version 1.8.0+) is used for file identification.
To install on OS X:
brew install richardlehane/digipres/siegfried
Note that if you are using an earlier version, you may encounter problems as the output format has changed.
Exiftool is used for image characterization.
To install on OS X:
brew install exiftool
Poppler is used for PDF characterization.
To install on OS X:
brew install poppler
MediaInfo is used for A/V characterization.
To install on OS X:
brew install mediainfo
Spin up the database using docker-compose:
$ docker compose up db # use -d to run in background
Run the linters and the test suite:
$ bin/rake
Spin up all the docker-compose services for dev/testing:
$ docker compose up # use -d to run in background
Then create the accession workflow for the test object:
$ rails c
> client = Dor::Workflow::Client.new(url: 'http://localhost:3001')
> client.create_workflow_by_name('druid:bc123df4567', 'accessionWF', version: '1')
Get a JWT token for authentication
bundle exec rake generate_token
Hit the technical-metadata-service's HTTP API:
$ curl -i H "Authorization: Bearer #{TOKEN}" -H 'Content-Type: application/json' --data '{"druid":"druid:bc123df4567","files":["file:///app/README.md","file:///app/openapi.yml"]}' http://localhost:3000/v1/technical-metadata
Verify that technical metadata was created:
$ docker compose exec app rails c
> DroFile.pluck(:druid, :filename, :mimetype, :filetype)
# should look like: [["druid:bc123df4567", "openapi.yml", "text/plain", "x-fmt/111"], ["druid:bc123df4567", "README.md", "text/markdown", "fmt/1149"]]
And that the object's workflow was updated:
$ rails c
> client = Dor::Workflow::Client.new(url: 'http://localhost:3001')
> client.workflow_status({druid: 'druid:bc123df4567', workflow: 'accessionWF', process: 'technical-metadata'})
# should be "completed"
First install foreman (foreman is not supposed to be in the Gemfile, See this wiki article ):
gem install foreman
Then you can run
bin/dev
This starts css/js bundling and the development server
Note that this project's continuous integration build will automatically create and publish an updated image whenever there is a passing build from the main
branch. If you do need to manually create and publish an image, do the following:
Build image:
docker build -t suldlss/technical-metadata-service:latest -f docker/app/Dockerfile .
Publish:
docker push suldlss/technical-metadata-service:latest
For details, see https://github.com/sul-dlss/technical-metadata-service/wiki/Generating-techmd-from-preservation-storage
- Reset the database:
bin/rails -e p db:reset