Rails app to build and maintain Solr collections for the Penn Libraries catalog. The application serves as the means for processing MARC and writing to the Solr infrastructure used by the Find catalog frontend.
Data is processed via three means:
- Alma Export processing - Bulk files of MARC are published by Alma via Publishing Profiles and moved onto a local SFTP location. A an Alma webhook is handled by this application that will trigger the initialization of a
ProcessAlmaExport
job that will download and prepare aProcessBatchFileJob
for each downloaded file. This process builds a brand new Solr collection each run. - Bib Webhooks - Alma webhooks are handled for changes to Bib records. For each received and supported Bib event, a
IndexByBibEvent
job is initialized that creates or updates the record in the configured index. This does not run the Bibs through the Alma publishing enrichment process. - Index by Identifier - A web form can receive a list of MMS IDs that will be retrieved from the Alma API and pushed to the configured index. This also does not run the Bibs through the Alma publishing enrichment process.
The behavior of the application can be modified using the Settings
area in the UI. The currently available parameters are:
Adhoc Target Collection
- The selected Solr collections will receive updates via the "Index by Identifier" processProcess Bib Webhooks
- When this is "On", this app will handle AlmaBIB
webhooks.Process Job Webhook
- When this is "On", this app will handle AlmaJOB
webhooks for jobs that match theSettings.alma.publishing_job.name
value.Webhook Target Collections
- The selected Solr collections will receive updates via theBIB
webhook jobs.Incremental Target Collections
- The selected Solr collections will receive incremental updates viaJOB
webhooks for jobs matching the theSettings.alma.publishing_job.name
value indicating the presence of updated or deleted records.
To make these settings accessible from the interface, you must run the following rake task inside the application container:
rake tools:add_config_items
Our local development environment uses vagrant in order to set up a consistent environment with the required services. Please see the root README for instructions on how to set up this environment.
- The Rails application will be available at https://catalog-indexing-dev.library.upenn.edu.
- The Sidekiq Web UI will be available at http://catalog-indexing-dev.library.upenn.edu/sidekiq.
- The Solr admin console for the first instance will be available at http://catalog-indexing-dev.library.upenn.int/solr1/#/. Log-in with admin/password.
Once your local development environment is set up you can ssh into the vagrant box to interact with the application:
Enter the running Vagrant VM by running vagrant ssh
in the /vagrant
directory
Start a shell in the catalog-indexing container:
docker exec -it catalog-indexing_catalog_indexing.1.{whatever} bash
When developing with find, you may need to generate a configset or some sample solr data from this app. Run these commands from the application container:
rake tools:package_configset
rake tools:generate_solr_json_from_set
Using this JSONL file you can index records into your development instance of find
.
In order to run the test suite (currently):
- Start a shell in the app container, see interacting-with-the-application
- Run
rspec
command:RAILS_ENV=test bundle exec rspec
This app uses the PennMARC gem to handle most of the MARC parsing logic.
Sometimes you might want to use an unpublished version of the PennMARC gem in development. Modify the gemfile like so:
# Gemfile
gem 'pennmarc', git: 'https://gitlab.library.upenn.edu/dld/catalog/pennmarc.git', ref: 'some-remote-commit-sha'
# or
gem 'pennmarc', git: 'https://gitlab.library.upenn.edu/dld/catalog/pennmarc.git', branch: 'some-remote-branch'
You can instruct bundler to look at a local path for the pennmarc
gem. When using this, running bundle install
will
update your Gemfile.lock
file to point to the specified ref:
or branch:
in your local pennmarc
repo. Be very
careful to undo this when pushing to a remote branch.
bundle config set local.pennmarc ~/Projects/pennmarc/
# Gemfile
gem 'pennmarc', git: 'https://gitlab.library.upenn.edu/dld/catalog/pennmarc.git', branch: 'some-local-branch-name'
Running bundle install
should then show:
Using pennmarc 1.0.0 from https://gitlab.library.upenn.edu/dld/catalog/pennmarc.git (at /home/mk/Projects/pennmarc@ee38309)
Rubocop is used to enforce style and formatting rules in our Ruby code. This application uses a custom set of rules contained within the upennlib-rubocop gem.
bundle exec rubocop
bundle exec rubocop --auto-gen-config --auto-gen-only-exclude --exclude-limit 10000