From 646f227d7a1da26710759b218e19d9ab64463252 Mon Sep 17 00:00:00 2001 From: Rosie Le Faive Date: Wed, 20 Dec 2023 15:50:22 -0400 Subject: [PATCH 01/11] Create files with hyphens. --- docs/installation/component-overview.md | 68 +++ .../installation/manual/configuring-drupal.md | 176 ++++++ .../installing-composer-drush-and-drupal.md | 142 +++++ .../manual/installing-crayfish.md | 374 ++++++++++++ .../installing-fedora-syn-and-blazegraph.md | 548 ++++++++++++++++++ .../manual/installing-karaf-and-alpaca.md | 395 +++++++++++++ docs/installation/manual/installing-solr.md | 170 ++++++ .../installing-tomcat-and-cantaloupe.md | 157 +++++ .../manual/preparing-a-webserver.md | 114 ++++ .../adding-format-jsonld.md | 27 + .../adding_format_jsonld.md | 28 +- docs/technical-documentation/alpaca-tips.md | 113 ++++ docs/technical-documentation/resizing-vm.md | 30 + .../updating-drupal.md | 74 +++ docs/tutorials/create-update-views.md | 84 +++ docs/tutorials/switch-homepage-to-twig.md | 63 ++ docs/user-documentation/content-models.md | 208 +++++++ docs/user-documentation/content-types.md | 189 ++++++ docs/user-documentation/file-viewers.md | 67 +++ docs/user-documentation/linked-data.md | 340 +++++++++++ .../user-documentation/metadata-harvesting.md | 49 ++ .../recipes/alexa-search.md | 36 ++ docs/user-documentation/versioning.wiki | 24 + mkdocs.yml | 44 +- 24 files changed, 3471 insertions(+), 49 deletions(-) create mode 100644 docs/installation/component-overview.md create mode 100644 docs/installation/manual/configuring-drupal.md create mode 100644 docs/installation/manual/installing-composer-drush-and-drupal.md create mode 100644 docs/installation/manual/installing-crayfish.md create mode 100644 docs/installation/manual/installing-fedora-syn-and-blazegraph.md create mode 100644 docs/installation/manual/installing-karaf-and-alpaca.md create mode 100644 docs/installation/manual/installing-solr.md create mode 100644 docs/installation/manual/installing-tomcat-and-cantaloupe.md create mode 100644 docs/installation/manual/preparing-a-webserver.md create mode 100644 docs/technical-documentation/adding-format-jsonld.md create mode 100644 docs/technical-documentation/alpaca-tips.md create mode 100644 docs/technical-documentation/resizing-vm.md create mode 100644 docs/technical-documentation/updating-drupal.md create mode 100644 docs/tutorials/create-update-views.md create mode 100644 docs/tutorials/switch-homepage-to-twig.md create mode 100644 docs/user-documentation/content-models.md create mode 100644 docs/user-documentation/content-types.md create mode 100644 docs/user-documentation/file-viewers.md create mode 100644 docs/user-documentation/linked-data.md create mode 100644 docs/user-documentation/metadata-harvesting.md create mode 100644 docs/user-documentation/recipes/alexa-search.md create mode 100644 docs/user-documentation/versioning.wiki diff --git a/docs/installation/component-overview.md b/docs/installation/component-overview.md new file mode 100644 index 000000000..7e9e421a1 --- /dev/null +++ b/docs/installation/component-overview.md @@ -0,0 +1,68 @@ +# Component Overview + +A functioning Islandora Stack is made up of dozens of components working in synchronization with each other to store information in your repository, manage that information, and disseminate it intelligently to users. Whether running an installation using the provided Ansible playbook or installing the stack manually, it may be helpful to have a brief overview of all the components we're going to need, in the order we're going to install them, as well as a brief introduction to each component's installation and configuration process. + +This list includes four different kinds of components: + +- Components which are hard-required (such as Drupal and the Islandora module) +- Components for which defaults are provided but which can be swapped out (such as the software managing databases, or the repository's storage system) +- Components that can't easily be swapped out but are not necessarily required (such as using Solr as the site's internal search engine) +- Components which do not have official alternatives and are not necessarily required, but will likely exist on the vast majority of Islandora installations (such as Alpaca and Crayfish) + +## The Webserver Stack - Apache, PHP, and MySQL/PostgreSQL + +Combined together, Apache, PHP, and MySQL/PostgreSQL comprise a LAMP or LAPP server used to provide end-user-facing components - namely, the website. + +**Apache** is the webserver that will serve up webpages to the public. It will also manage some internal functionality provided by Crayfish, and will expose Cantaloupe to the public. We’ll be making changes to the VirtualHost entry, enabling some modules, and modifying the ports configuration. The VirtualHost entry will eventually be modified when we need to expose other services like Cantaloupe to the public. + +**PHP** is the runtime interpreter for all the code Drupal and Crayfish need to be processed. By default, installing PHP 7.2 will give us a command-line interpreter, as well as an interpreter for Apache. We’re going to install several PHP modules required and/or useful for the components that make use of PHP. + +**MySQL** and **PostgreSQL** are database management systems that we will use to store information for many different components like Drupal and Fedora. By default, the Ansible playbook installs MySQL, though this can be switched to PostgreSQL. The manual installation guide recommends and walks through installing and using PostgreSQL. + +## The Front-Facing CDM - Composer, Drush, and Drupal + +Composer will be used to install both Drupal and Drush simultaneously using Islandora's fork of the [drupal-project](https://github.com/Islandora/drupal-project) repository. + +**Composer** is an installer and dependency manager for PHP projects. We're going to need it to install components for any PHP code we need to make use of, including Drupal and Crayfish. + +**Drush** and **Drupal** are installed simultaneously using [drupal-project](https://github.com/Islandora/drupal-project). Drupal will serve up webpages and manage Islandora content, and Drush will help us get some things done from the command-line. + +## The Web Application Server - Tomcat and Cantaloupe + +Several applets will be deployed via their `.war` files into Tomcat, including Fedora and Cantaloupe. + +**Tomcat** serves up webpages and other kinds of content much like Apache, but is specifically designed to deploy Java applications as opposed to running PHP code. + +**Cantaloupe** is an image tileserver that Islandora will connect to and use to serve up extremely large images in a way that doesn't have an adverse effect on the overall system. + +## The Back-End File Management Repository - Fedora, Syn, and Blazegraph + +Fedora will be installed in its own section, rather than as part of the Tomcat installation, as the installation process is rather involved and requires some authorization pieces to be set up in order to connect them back to Drupal and other components. + +**Fedora** is the default backend repository that Islandora content will be synchronized with and stored in. A great deal of configuration will be required to get it up and running, including ensuring a database is created and accessible. + +**Syn** is the authorization piece that allows Fedora to connect to other components. + +**Blazegraph** will store representative graph data about the repository that can be queried using SPARQL. Some configuration will also be required to link it back to Fedora, as well as to ensure it is being properly indexed. + +## The Search Engine - Solr and search_api_solr + +The installation of Solr itself is rather straightforward, but a configuration will have to be generated and applied from the Drupal side. + +**Solr** will be installed as a standalone application. Nothing of particular importance needs to happen here; the configuration will be applied when `search_api_solr` is installed. + +**search_api_solr** is the Drupal module that implements the Solr API for Drupal-side searches. After installing and configuring the module, the `drush solr-gsc` command will be used to generate Solr configs, and these configs will be moved to the Solr configuration location. + +## The Asynchronous Background Services - Crayfish + +**Crayfish** is a series of microservices that perform different asynchronous tasks kicked off by Islandora. It contains a series of submodules that will be installed via Composer. Later, these configured components will be connected to Alpaca. + +## The Broker Connecting Everything - Karaf and Alpaca + +**Karaf**’s job is similar to Tomcat, except where Tomcat is a web-accessible endpoint for Java applets, Karaf is simply meant to be a container for system-level applets to communicate via its OSGI. Alpaca is one such applet; it will broker messages between Fedora and Drupal, and between Drupal and various derivative generation applications. + +**Alpaca** contains Karaf services to manage moving information between Islandora, Fedora, and Blazegraph as well as kicking off derivative services in Crayfish. These will be configured to broker between Drupal and Fedora using an ActiveMQ queue. + +## Finalized Drupal Configurations + +**Drupal configuration** exists as a series of .yaml files that can either be created in a feature, or exported from Drupal using the `content_sync` module. It can also be manually entered in via the UI. We're going to place configuration in a few different ways; Some content will be synchronized onto the site, and some core configurations from the main Islandora module will need to be run in order to facilitate ingest. diff --git a/docs/installation/manual/configuring-drupal.md b/docs/installation/manual/configuring-drupal.md new file mode 100644 index 000000000..af4fdd929 --- /dev/null +++ b/docs/installation/manual/configuring-drupal.md @@ -0,0 +1,176 @@ +# Configuring Drupal + +!!! warning "Needs Maintenance" + The manual installation documentation is in need of attention. We are aware that some components no longer work as documented here. If you are interested in helping us improve the documentation, please see [Contributing](../../../contributing/CONTRIBUTING). + +After all of the above pieces are in place, installed, configured, started, and otherwise prepared, the last thing we need to do is to finally configure the front-end Drupal instance to wire all the installed components together. + +## Drupal Pre-Configuration + +### `settings.php` + +!!! notice + By default, `settings.php` is read-only for all users. It should be made writable while this pre-configuration is being done, then set back to `444` afterwards. + +Some additional settings will need to be established in your default `settings.php` before Drupal-side configuration can occur. + +The below configuration will establish `localhost` as a trusted host pattern, but on production sites this will need to be expanded to include the actual host patterns used by the site. + +`/opt/drupal/web/sites/default/settings.php` + +**Before** (at around line 789): +``` +'driver' => 'pgsql', +); +``` + +**After**: +``` +'driver' => 'pgsql', +); + +$settings['trusted_host_patterns'] = [ + 'localhost', +]; + +$settings['flysystem'] = [ + 'fedora' => [ + 'driver' => 'fedora', + 'config' => [ + 'root' => 'http://localhost:8080/fcrepo/rest/', + ], + ], +]; +``` + +Once this is done, refresh the cache to take hold of the new settings. + +```bash +cd /opt/drupal +drush -y cr +``` + +## Islandora + +!!! note "Skip this by using the Islandora Starter Site" + The Islandora Starter Site, which was presented as an option in the ["Installing Composer, Drush, and Drupal"](installing_composer_drush_and_drupal) step, + installs Islandora and other modules and configures them, allowing you to skip this section. You may want to use this manual method in the case where you want + to pick and choose which Islandora modules you use. + +### Downloading Islandora + +The Islandora Drupal module contains the core code to create a repository ecosystem in a Drupal environment. It also includes several submodules; of importance to us is `islandora_core_feature`, which contains the key configurations that allow you to use Islandora features. + +Take note of some of the other comments in the below bash script for an idea of what the other components are expected, and which may be considered optional. + +```bash +cd /opt/drupal +# Since islandora_defaults is near the bottom of the dependency chain, requiring +# it will get most of the modules and libraries we need to deploy a standard +# Islandora site. +sudo -u www-data composer require "drupal/flysystem:^2.0@alpha" +sudo -u www-data composer require "islandora/islandora:^2.4" +sudo -u www-data composer require "islandora/controlled_access_terms:^2" +sudo -u www-data composer require "islandora/openseadragon:^2" + +# These can be considered important or required depending on your site's +# requirements; some of them represent dependencies of Islandora submodules. +sudo -u www-data composer require "drupal/pdf:1.1" +sudo -u www-data composer require "drupal/rest_oai_pmh:^2.0@beta" +sudo -u www-data composer require "drupal/search_api_solr:^4.2" +sudo -u www-data composer require "drupal/facets:^2" +sudo -u www-data composer require "drupal/content_browser:^1.0@alpha" ## TODO do we need this? +sudo -u www-data composer require "drupal/field_permissions:^1" +sudo -u www-data composer require "drupal/transliterate_filenames:^2.0" + +# These tend to be good to enable for a development environment, or just for a +# higher quality of life when managing Islandora. That being said, devel should +# NEVER be enabled on a production environment, as it intentionally gives the +# user tools that compromise the security of a site. +sudo -u www-data composer require drupal/restui:^1.21 +sudo -u www-data composer require drupal/console:~1.0 +sudo -u www-data composer require drupal/devel:^2.0 +sudo -u www-data composer require drupal/admin_toolbar:^2.0 +``` + +### Enabling Downloaded Components + +Components we've now downloaded using `composer require` can be enabled simultaneously via `drush`, which will ensure they are installed in the correct dependent order. Enabling `islandora_defaults` will also ensure all content types and configurations are set up in Islandora. The installation process for all of these modules will likely take some time. + +!!! notice + This list of modules assumes that all of the above components were downloaded using `composer require`; if this is not the case, you may need to pare down this list manually. It also includes `devel`, which again, should not be enabled on production sites. + +```bash +cd /opt/drupal +drush -y en rdf responsive_image devel syslog serialization basic_auth rest restui search_api_solr facets content_browser pdf admin_toolbar controlled_access_terms_defaults islandora_breadcrumbs islandora_iiif islandora_oaipmh +# After all of this, rebuild the cache. +drush -y cr +``` + +### Adding a JWT Configuration to Drupal + +To allow our installation to talk to other services via Syn, we need to establish a Drupal-side JWT configuration using the keys we generated at that time. + +Log onto your site as an administrator at `/user`, then navigate to `/admin/config/system/keys/add`. Some of the settings here are unimportant, but pay close attention to the **Key type**, which should match the key we created earlier (an RSA key), and the **File location**, which should be the ultimate location of the key we created for Syn on the filesystem, `/opt/keys/syn_private.key`. + +![Adding a JWT RSA Key](../../assets/adding_a_jwt_rsa_key.png) + +Click **Save** to create the key. + +Once this key is created, navigate to `/admin/config/system/jwt` to select the key you just created from the list. Note that before the key will show up in the **Private Key** list, you need to select that key's type in the **Algorithm** section, namely `RSASSA-PKCS1-v1_5 using SHA-256 (RS256)`. + +![Configuring the JWT RSA Key for Use](../../assets/configuring_the_jwt_rsa_key_for_use.png) + +Click **Save configuration** to establish this as the JWT key configuration. + +### Configuring Islandora + +Navigate to the Islandora core configuration page at `/admin/config/islandora/core` to set up the core configuration to connect to Gemini. Of note here, the **Gemini URL** will need to be established to facilitate the connection to Fedora, and the appropriate **Bundles with Gemini URI pseudo field** types will need to be checked off. + +!!! notice + Any other Drupal content types you wish to synchronize with Fedora should also be checked off here. + +![Configuring Islandora](../../assets/configuring_islandora.png) + +### Configuring Islandora IIIF + +Navigate to `/admin/config/islandora/iiif` to ensure that Islandora IIIF is pointing to our Cantaloupe server. + +![Configuring Islandora IIIF](../../assets/configuring_iiif.png) + +Next, configure OpenSeadragon by navigating to `/admin/config/media/openseadragon` and ensuring everything is set up properly. + +![Configuring OpenSeadragon](../../assets/configuring_openseadragon.png) + +### Establishing Flysystem as the Default Download Method + +Navigate to `/admin/config/media/file-system` to set the **Default download method** to the one we created in our `settings.php`. + +![Configuring Flysystem to Use Fedora](../../assets/configuring_flysystem_to_use_fedora.png) + +### Giving the Administrative User the `fedoraAdmin` Role + +In order for data to be pushed back to Fedora, the site administrative user needs the `fedoraAdmin` role. + +``` +cd /opt/drupal +sudo -u www-data drush -y urol "fedoraadmin" islandora +``` + +### Running Feature Migrations + +Finally, to get everything up and running, run the Islandora Core Features and Islandora Defaults migrations. + +```bash +cd /opt/drupal +sudo -u www-data drush -y -l localhost --userid=1 mim --group=islandora +``` + +### Enabling EVA Views + +Some views provided by Islandora are not enabled by default. + +```bash +cd /opt/drupal +drush -y views:enable display_media +``` diff --git a/docs/installation/manual/installing-composer-drush-and-drupal.md b/docs/installation/manual/installing-composer-drush-and-drupal.md new file mode 100644 index 000000000..78f16e611 --- /dev/null +++ b/docs/installation/manual/installing-composer-drush-and-drupal.md @@ -0,0 +1,142 @@ +# Installing Composer, Drush, and Drupal + +!!! warning "Needs Maintenance" + The manual installation documentation is in need of attention. We are aware that some components no longer work as documented here. If you are interested in helping us improve the documentation, please see [Contributing](../../../contributing/CONTRIBUTING). + +## In this section, we will install: + +- [Composer](https://getcomposer.org/) at its current latest version, the package manager that will allow us to install PHP applications +- Either the [Islandora Starter Site](https://github.com/Islandora/islandora-starter-site/), or the [Drupal recommended-project](https://www.drupal.org/docs/develop/using-composer/starting-a-site-using-drupal-composer-project-templates#s-drupalrecommended-project), which will install, among other things: + - [Drush 10](https://www.drush.org/) at its latest version, the command-line PHP application for running tasks in Drupal + - [Drupal 9](https://www.drupal.org/) at its latest version, the content management system Islandora uses for content modelling and front-end display + +## Install Composer + +### Download and install Composer 2.x + +Composer provides PHP code that we can use to install it. After downloading and running the installer, we’re going to move the generated executable to a place in `$PATH`, removing its extension: + +```bash +curl "https://getcomposer.org/installer" > composer-install.php +chmod +x composer-install.php +php composer-install.php +sudo mv composer.phar /usr/local/bin/composer +``` + + +## Download and Scaffold Drupal + +At this point, you have the option of using the [Islandora Starter Site](https://github.com/Islandora/islandora-starter-site/), with its pre-selected modules +and configurations which function "out of the box," or build a clean stock Drupal via the Drupal Recommended Project and install +Islandora modules as you desire. + +### Option 1: Create a project using the Islandora Starter Site + +Navigate to the folder where you want to put your Islandora project (in our case `/var/www`), and +create the Islandora Starter Site: + +```bash +cd /var/www +composer create-project islandora/islandora-starter-site +``` + +This will install all PHP dependencies, including Drush, and scaffold the site. + +Drush is not accessible via `$PATH`, but is available using the command `composer exec -- drush` + +### Option 2: Create a basic Drupal Recommended Project + +Navigate to the folder where you want to put your Drupal project (in our case `/var/www`), and +create the Drupal Recommended Project: + +```bash +cd /var/www +composer create-project drupal/recommended-project my-project +``` + + +## Make the new webroot accessible in Apache + +Before we can proceed with the actual site installation, we’re going to need to make our new Drupal installation the default web-accessible location Apache serves up. This will include an appropriate `ports.conf` file, and replacing the default enabled site. + +!!! notice + Out of the box, these files will contain support for SSL, which we will not be setting up in this guide (and therefore removing with these overwritten configurations), but which are **absolutely indispensable** to a production site. This guide does not recommend any particular SSL certificate authority or installation method, but you may find [DigitalOcean's tutorial](https://www.digitalocean.com/community/tutorials/how-to-install-an-ssl-certificate-from-a-commercial-certificate-authority) helpful. + +`/etc/apache2/ports.conf | root:root/644` +``` +Listen 80 +``` + +Remove everything but the "Listen 80" line. You can leave the comments in if you want. + +`/etc/apache2/sites-enabled/000-default.conf | root:root/777` +```xml + + ServerName SERVER_NAME + DocumentRoot "/opt/drupal/web" + + Options Indexes FollowSymLinks MultiViews + AllowOverride all + Require all granted + + # Ensure some logging is in place. + ErrorLog "/var/log/apache2/localhost_error.log" + CustomLog "/var/log/apache2/localhost_access.log" combined + +``` +- `SERVER_NAME`: `localhost` + - For a development environment hosted on your own machine or a VM, `localhost` should suffice. Realistically, this should be the domain or IP address the server will be accessed at. + +Restart the Apache 2 service to apply these changes: + +```bash +sudo systemctl restart apache2 +``` +## Prepare the PostgreSQL database + +PostgreSQL roles are directly tied to users. We’re going to ensure a user is in place, create a role for them in PostgreSQL, and create a database for them that we can use to install Drupal. + +```bash +# Run psql as the postgres user, the only user currently with any PostgreSQL +# access. +sudo -u postgres psql +# Then, run these commands within psql itself: +create database DRUPAL_DB encoding 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8' TEMPLATE template0; +create user DRUPAL_DB_USER with encrypted password 'DRUPAL_DB_PASSWORD'; +grant all privileges on database DRUPAL_DB to DRUPAL_DB_USER; +# Then, quit psql. +\q +``` +- `DRUPAL_DB`: `drupal9` + - This will be used as the core database that Drupal is installed into +- `DRUPAL_DB_USER`: `drupal` + - Specifically, this is the user that will connect to the PostgreSQL database being created, not the user that will be logging into Drupal +- `DRUPAL_DB_PASSWORD`: `drupal` + - This should be a secure password; it’s recommended to use a password generator to create this such as the one provided by [random.org](https://www.random.org/passwords/) + + +## Install Drupal using Drush + +The Drupal installation process can be done through the GUI in a series of form steps, or can be done quickly using Drush's `site-install` command. It can be invoked with the full list of parameters (such as `--db-url` and `--site-name`), but if parameters are missing, they will be asked of you interactively. + +### Option 1: Site install the Starter Site with existing configs + +Follow the instructions in the [README of the Islandora Starter Site](https://github.com/Islandora/islandora-starter-site/#usage). +The steps are not reproduced here to remove redundancy. When this installation is done, you'll have a starter site ready-to-go. Once you set up the external services in the next sections, you'll need to configure Drupal to know where they are. + +### Option 2: Site install the basic Drupal Recommended Project + +```bash +cd /var/www/drupal +drush -y site-install standard --db-url="pgsql://DRUPAL_DB_USER:DRUPAL_DB_PASSWORD@127.0.0.1:5432/DRUPAL_DB" --site-name="SITE_NAME" --account-name=DRUPAL_LOGIN --account-pass=DRUPAL_PASS +``` +This uses the same parameters from the above step, as well as: + +- `SITE_NAME`: Islandora 2.0 + - This is arbitrary, and is simply used to title the site on the home page +- `DRUPAL_LOGIN`: `islandora` + - The Drupal administrative username to use +- `DRUPAL_PASS`: `islandora` + - The password to use for the Drupal administrative user + +Congratulations, you have a Drupal site! It currently isn’t really configured to do anything, but we’ll get those portions set up in the coming sections. diff --git a/docs/installation/manual/installing-crayfish.md b/docs/installation/manual/installing-crayfish.md new file mode 100644 index 000000000..af9e113fb --- /dev/null +++ b/docs/installation/manual/installing-crayfish.md @@ -0,0 +1,374 @@ +# Installing Crayfish + +!!! warning "Needs Maintenance" + The manual installation documentation is in need of attention. We are aware that some components no longer work as documented here. If you are interested in helping us improve the documentation, please see [Contributing](../../../contributing/CONTRIBUTING). + +## In this section, we will install: +- [Islandora/Crayfish](https://github.com/islandora/crayfish), the suite of microservices that power the backend of Islandora 2.0 +- Indvidual microservices underneath Crayfish + +## Crayfish 2.0 + +### Installing Prerequisites + +Some packages need to be installed before we can proceed with installing Crayfish; these packages are used by the microservices within Crayfish. These include: + +- Imagemagick, which will be used for image processing. We'll be using the LYRASIS build of imagemagick here, which supports JP2 files. +- Tesseract, which will be used for optical character recognition; note that by default Tesseract can only understand English; several other individual Tesseract language packs can be installed using `apt-get`, and a list of available packs can be procured with `sudo apt-cache search tesseract-ocr` +- FFMPEG, which will be used for video processing +- Poppler, which will be used for generating PDFs + +```bash +sudo add-apt-repository -y ppa:lyrasis/imagemagick-jp2 +sudo apt-get update +sudo apt-get -y install imagemagick tesseract-ocr ffmpeg poppler-utils +``` + +**NOTICE:** If you get the `sudo: apt-add-repository: command not found`, run `sudo apt-get install software-properties-common` in order to make the command available. + +### Cloning and Installing Crayfish + +We’re going to clone Crayfish to `/opt`, and individually run `composer install` against each of the microservice subdirectories. + +```bash +cd /opt +sudo git clone https://github.com/Islandora/Crayfish.git crayfish +sudo chown -R www-data:www-data crayfish +sudo -u www-data composer install -d crayfish/Homarus +sudo -u www-data composer install -d crayfish/Houdini +sudo -u www-data composer install -d crayfish/Hypercube +sudo -u www-data composer install -d crayfish/Milliner +sudo -u www-data composer install -d crayfish/Recast +``` + +### Preparing Logging + +Not much needs to happen here; Crayfish opts for a simple logging approach, with one `.log` file for each component. We’ll create a folder where each logfile can live. + +```bash +sudo mkdir /var/log/islandora +sudo chown www-data:www-data /var/log/islandora +``` + +### Configuring Crayfish Components + +Each Crayfish component requires one or more `.yaml` file(s) to ensure everything is wired up correctly. + +**NOTICE** + +The following configuration files represent somewhat sensible defaults; you should take consideration of the logging levels in use, as this can vary in desirability from installation to installation. Also note that in all cases, `http` URLs are being used, as this guide does not deal with setting up https support. In a production installation, this should not be the case. These files also assume a connection to a PostgreSQL database; use a `pdo_mysql` driver and the appropriate `3306` port if using MySQL. + +#### Homarus (Audio/Video derivatives) + +`/opt/crayfish/Homarus/cfg/config.yaml | www-data:www-data/644` +```yaml +--- +homarus: + executable: ffmpeg + mime_types: + valid: + - video/mp4 + - video/x-msvideo + - video/ogg + - audio/x-wav + - audio/mpeg + - audio/aac + - image/jpeg + - image/png + default: video/mp4 + mime_to_format: + valid: + - video/mp4_mp4 + - video/x-msvideo_avi + - video/ogg_ogg + - audio/x-wav_wav + - audio/mpeg_mp3 + - audio/aac_m4a + - image/jpeg_image2pipe + - image/png_image2pipe + default: mp4 +fedora_resource: + base_url: http://localhost:8080/fcrepo/rest +log: + level: NOTICE + file: /var/log/islandora/homarus.log +syn: + enable: true + config: /opt/fcrepo/config/syn-settings.xml +``` + +#### Houdini (Image derivatives) + +Currently the Houdini microservice uses a different system (Symfony) than the other microservices, this requires different configuration. + +`/opt/crayfish/Houdini/config/services.yaml | www-data:www-data/644` +```yaml +# This file is the entry point to configure your own services. +# Files in the packages/ subdirectory configure your dependencies. +# Put parameters here that don't need to change on each machine where the app is deployed +# https://symfony.com/doc/current/best_practices/configuration.html#application-related-configuration +parameters: + app.executable: /usr/local/bin/convert + app.formats.valid: + - image/jpeg + - image/png + - image/tiff + - image/jp2 + app.formats.default: image/jpeg + +services: + # default configuration for services in *this* file + _defaults: + autowire: true # Automatically injects dependencies in your services. + autoconfigure: true # Automatically registers your services as commands, event subscribers, etc. + + # makes classes in src/ available to be used as services + # this creates a service per class whose id is the fully-qualified class name + App\Islandora\Houdini\: + resource: '../src/*' + exclude: '../src/{DependencyInjection,Entity,Migrations,Tests,Kernel.php}' + + # controllers are imported separately to make sure services can be injected + # as action arguments even if you don't extend any base controller class + App\Islandora\Houdini\Controller\HoudiniController: + public: false + bind: + $formats: '%app.formats.valid%' + $default_format: '%app.formats.default%' + $executable: '%app.executable%' + tags: ['controller.service_arguments'] + + # add more service definitions when explicit configuration is needed + # please note that last definitions always *replace* previous ones +``` + +`/opt/crayfish/Houdini/config/packages/crayfish_commons.yml | www-data:www-data/644` +```yaml +crayfish_commons: + fedora_base_uri: 'http://localhost:8080/fcrepo/rest' + syn_config: '/opt/fcrepo/config/syn-settings.xml' +``` + +`/opt/crayfish/Houdini/config/packages/monolog.yml | www-data:www-data/644` +```yaml +monolog: + + handlers: + + houdini: + type: rotating_file + path: /var/log/islandora/Houdini.log + level: DEBUG + max_files: 1 +``` + +The below files are two versions of the same file to enable or disable JWT token authentication. + +`/opt/crayfish/Houdini/config/packages/security.yml | www-data:www-data/644` + +Enabled JWT token authentication: +```yaml +security: + + # https://symfony.com/doc/current/security.html#where-do-users-come-from-user-providers + providers: + jwt_user_provider: + id: Islandora\Crayfish\Commons\Syn\JwtUserProvider + + firewalls: + dev: + pattern: ^/(_(profiler|wdt)|css|images|js)/ + security: false + main: + anonymous: false + # Need stateless or it reloads the User based on a token. + stateless: true + + provider: jwt_user_provider + guard: + authenticators: + - Islandora\Crayfish\Commons\Syn\JwtAuthenticator + + # activate different ways to authenticate + # https://symfony.com/doc/current/security.html#firewalls-authentication + + # https://symfony.com/doc/current/security/impersonating_user.html + # switch_user: true + + + # Easy way to control access for large sections of your site + # Note: Only the *first* access control that matches will be used + access_control: + # - { path: ^/admin, roles: ROLE_ADMIN } + # - { path: ^/profile, roles: ROLE_USER } +``` + +Disabled JWT token authentication: +```yaml +security: + + # https://symfony.com/doc/current/security.html#where-do-users-come-from-user-providers + providers: + jwt_user_provider: + id: Islandora\Crayfish\Commons\Syn\JwtUserProvider + + firewalls: + dev: + pattern: ^/(_(profiler|wdt)|css|images|js)/ + security: false + main: + anonymous: true + # Need stateless or it reloads the User based on a token. + stateless: true +``` + +#### Hypercube (OCR) + +`/opt/crayfish/Hypercube/cfg/config.yaml | www-data:www-data/644` +```yaml +--- +hypercube: + tesseract_executable: tesseract + pdftotext_executable: pdftotext +fedora_resource: + base_url: http://localhost:8080/fcrepo/rest +log: + level: NOTICE + file: /var/log/islandora/hypercube.log +syn: + enable: true + config: /opt/fcrepo/config/syn-settings.xml +``` + +#### Milliner (Fedora indexing) + +`/opt/crayfish/Milliner/cfg/config.yaml | www-data:www-data/644` +```yaml +--- +fedora_base_url: http://localhost:8080/fcrepo/rest +drupal_base_url: http://localhost +modified_date_predicate: http://schema.org/dateModified +strip_format_jsonld: true +debug: false +db.options: + driver: pdo_pgsql + host: 127.0.0.1 + port: 5432 + dbname: CRAYFISH_DB + user: CRAYFISH_DB_USER + password: CRAYFISH_DB_PASSWORD +log: + level: NOTICE + file: /var/log/islandora/milliner.log +syn: + enable: true + config: /opt/fcrepo/config/syn-settings.xml +``` + +#### Recast (Drupal to Fedora URI re-writing) + +`/opt/crayfish/Recast/cfg/config.yaml | www-data:www-data/644` +```yaml +--- +fedora_resource: + base_url: http://localhost:8080/fcrepo/rest +drupal_base_url: http://localhost +debug: false +log: + level: NOTICE + file: /var/log/islandora/recast.log +syn: + enable: true + config: /opt/fcrepo/config/syn-settings.xml +namespaces: +- + acl: "http://www.w3.org/ns/auth/acl#" + fedora: "http://fedora.info/definitions/v4/repository#" + ldp: "http://www.w3.org/ns/ldp#" + memento: "http://mementoweb.org/ns#" + pcdm: "http://pcdm.org/models#" + pcdmuse: "http://pcdm.org/use#" + webac: "http://fedora.info/definitions/v4/webac#" + vcard: "http://www.w3.org/2006/vcard/ns#" +``` + +### Creating Apache Configurations for Crayfish Components + +Finally, we need appropriate Apache configurations for Crayfish; these will allow other services to connect to Crayfish components via their HTTP endpoints. + +Each endpoint we need to be able to connect to will get its own `.conf` file, which we will then enable. + +**NOTICE** + +These configurations would potentially have collisions with Drupal routes, if any are created in Drupal with the same name. If this is a concern, it would likely be better to reserve a subdomain or another port specifically for Crayfish. For the purposes of this installation guide, these endpoints will suffice. + +`/etc/apache2/conf-available/Homarus.conf | root:root/644` +``` +Alias "/homarus" "/opt/crayfish/Homarus/src" + + FallbackResource /homarus/index.php + Require all granted + DirectoryIndex index.php + SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1 + +``` + +`/etc/apache2/conf-available/Houdini.conf | root:root/644` +``` +Alias "/houdini" "/opt/crayfish/Houdini/public" + + FallbackResource /houdini/index.php + Require all granted + DirectoryIndex index.php + SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1 + +``` + +`/etc/apache2/conf-available/Hypercube.conf | root:root/644` +``` +Alias "/hypercube" "/opt/crayfish/Hypercube/src" + + FallbackResource /hypercube/index.php + Require all granted + DirectoryIndex index.php + SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1 + +``` + +`/etc/apache2/conf-available/Milliner.conf | root:root/644` +``` +Alias "/milliner" "/opt/crayfish/Milliner/src" + + FallbackResource /milliner/index.php + Require all granted + DirectoryIndex index.php + SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1 + +``` + +`/etc/apache2/conf-available/Recast.conf | root:root/644` +``` +Alias "/recast" "/opt/crayfish/Recast/src" + + FallbackResource /recast/index.php + Require all granted + DirectoryIndex index.php + SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1 + +``` + +### Enabling Each Crayfish Component Apache Configuration + +Enabling each of these configurations involves creating a symlink to them in the `conf-enabled` directory; the standardized method of doing this in Apache is with `a2enconf`. + +```bash +sudo a2enconf Homarus Houdini Hypercube Milliner Recast +``` + +### Restarting the Apache Service + +Finally, to get these new endpoints up and running, we need to restart the Apache service. + +``` +sudo systemctl restart apache2 +``` diff --git a/docs/installation/manual/installing-fedora-syn-and-blazegraph.md b/docs/installation/manual/installing-fedora-syn-and-blazegraph.md new file mode 100644 index 000000000..4f1f6ebd7 --- /dev/null +++ b/docs/installation/manual/installing-fedora-syn-and-blazegraph.md @@ -0,0 +1,548 @@ +# Installing Fedora, Syn, and Blazegraph + +## In this section, we will install: + +- [Fedora 6](https://fedora.lyrasis.org/), the back-end repository that Islandora will use +- [Syn](https://github.com/Islandora/Syn), the authentication broker that will manage communication with Fedora +- [Blazegraph](https://blazegraph.com/), the resource index layer on top of Fedora for managing discoverability via RDF + +## Fedora 6 + +### Stop the Tomcat Service + +We're going to stop the Tomcat service while working on setting up Fedora to prevent any autodeploy misconfigurations. + +```bash +sudo systemctl stop tomcat +``` + +### Creating a Working Space for Fedora + +Fedora’s configuration and data won’t live with Tomcat itself; rather, we’re going to prepare a space for them to make them easier to manage. + +```bash +sudo mkdir -p /opt/fcrepo/data/objects +sudo mkdir /opt/fcrepo/config +sudo chown -R tomcat:tomcat /opt/fcrepo +``` + +### Creating a Database for Fedora + +The method for creating the database here will closely mimic the method we used to create our database for Drupal. + +```bash +sudo -u postgres psql +create database FEDORA_DB encoding 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8' TEMPLATE template0; +create user FEDORA_DB_USER with encrypted password 'FEDORA_DB_PASSWORD'; +grant all privileges on database FEDORA_DB to FEDORA_DB_USER; +\q +``` + +- `FEDORA_DB`: `fcrepo` + - This will be used as the database Fedora will store the repository in. +- `FEDORA_DB_USER`: `fedora` +- `FEDORA_DB_PASSWORD`: `fedora` + - Again, this should be a secure password of some kind; leaving it as `fedora` is not recommended. + +### Adding a Fedora Configuration + +The Fedora configuration is going to come in a few different chunks that need to be in place before Fedora will be functional. We’re going to place several files outright, with mildly modified parameters according to our configuration. + +The basics of these configuration files have been pulled largely from the templates in Islandora-Devops/islandora-playbook [internal Fedora role](https://github.com/Islandora-Devops/islandora-playbook/tree/dev/roles/internal/Islandora-Devops.fcrepo); you may consider referencing the playbook’s templates directory for more details. + +#### Namespace prefixes + +`i8_namespaces.yml` is a list of namespaces used by Islandora that may not necessarily be present in Fedora; we add them here to ensure we can use them in queries. + +`/opt/fcrepo/config/i8_namespaces.yml | tomcat:tomcat/644` +```{ .yaml .copy } +# Islandora 8/Fedora namespaces +# +# This file contains ALL the prefix mappings, if a URI +# does not appear in this file it will be displayed as +# the full URI in Fedora. +acl: http://www.w3.org/ns/auth/acl# +bf: http://id.loc.gov/ontologies/bibframe/ +cc: http://creativecommons.org/ns# +dc: http://purl.org/dc/elements/1.1/ +dcterms: http://purl.org/dc/terms/ +dwc: http://rs.tdwg.org/dwc/terms/ +ebucore: http://www.ebu.ch/metadata/ontologies/ebucore/ebucore# +exif: http://www.w3.org/2003/12/exif/ns# +fedoraconfig: http://fedora.info/definitions/v4/config# +fedoramodel: info:fedora/fedora-system:def/model# +foaf: http://xmlns.com/foaf/0.1/ +geo: http://www.w3.org/2003/01/geo/wgs84_pos# +gn: http://www.geonames.org/ontology# +iana: http://www.iana.org/assignments/relation/ +islandorarelsext: http://islandora.ca/ontology/relsext# +islandorarelsint: http://islandora.ca/ontology/relsint# +ldp: http://www.w3.org/ns/ldp# +memento: http://mementoweb.org/ns# +nfo: http://www.semanticdesktop.org/ontologies/2007/03/22/nfo# +ore: http://www.openarchives.org/ore/terms/ +owl: http://www.w3.org/2002/07/owl# +premis: http://www.loc.gov/premis/rdf/v1# +prov: http://www.w3.org/ns/prov# +rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# +rdfs: http://www.w3.org/2000/01/rdf-schema# +rel: http://id.loc.gov/vocabulary/relators/ +schema: http://schema.org/ +skos: http://www.w3.org/2004/02/skos/core# +test: info:fedora/test/ +vcard: http://www.w3.org/2006/vcard/ns# +webac: http://fedora.info/definitions/v4/webac# +xml: http://www.w3.org/XML/1998/namespace +xmlns: http://www.w3.org/2000/xmlns/ +xs: http://www.w3.org/2001/XMLSchema +xsi: http://www.w3.org/2001/XMLSchema-instance +``` + +#### Allowed External Content Hosts + +We have Fedora provide metadata for some resources that are contained in Drupal. Fedora needs to know to allow access to these External Content hosts. + +We create a file `/opt/fcrepo/config/allowed_external_hosts.txt | tomcat:tomcat/644` +``` +http://localhost:8000/ +``` + +**Note**: the trailing backslash is important here. For more information on Fedora's External Content and configuring it, see the [Fedora Wiki pages](https://wiki.lyrasis.org/display/FEDORA6x/External+Content) + +#### Fedora configuration properties file + +Fedora 6 now allows you to put all your configuration properties into a single file. We use `0640` permissions as you will want to put your database credentials in here. + +`/opt/fcrepo/config/fcrepo.properties | tomcat:tomcat/640` +```{ .text .copy } +fcrepo.home=FCREPO_HOME +# External content using path defined above. +fcrepo.external.content.allowed=/opt/fcrepo/config/allowed_external_hosts.txt +# Namespace registry using path defined above. +fcrepo.namespace.registry=/opt/fcrepo/config/i8_namespaces.yml +fcrepo.auth.principal.header.enabled=true +# The principal header is the syn-setting.xml "config" element's "header" attribute +fcrepo.auth.principal.header.name=X-Islandora +# false to use manual versioning, true to create a version on each change +fcrepo.autoversioning.enabled=true +fcrepo.db.url=FCREPO_DB_URL +fcrepo.db.user=FCREPO_DB_USERNAME +fcrepo.db.password=FCREPO_DB_PASSWORD +fcrepo.ocfl.root=FCREPO_OCFL_ROOT +fcrepo.ocfl.temp=FCREPO_TEMP_ROOT +fcrepo.ocfl.staging=FCREPO_STAGING_ROOT +# Can be sha512 or sha256 +fcrepo.persistence.defaultDigestAlgorithm=sha512 +# Jms moved from 61616 to allow external ActiveMQ to use that port +fcrepo.dynamic.jms.port=61626 +# Same as above +fcrepo.dynamic.stomp.port=61623 +fcrepo.velocity.runtime.log=FCREPO_VELOCITY_LOG +fcrepo.jms.baseUrl=FCREPO_JMS_BASE +``` + +* `FCREPO_HOME` - The home directory for all Fedora generated output and state. Unless otherwise specified, all logs, metadata, binaries, and internally generated indexes, etc. It would default to the Tomcat starting directory. A good default would be `/opt/fcrepo` +* `FCREPO_DB_URL` - This parameter allows you to set the database connection url. In general the format is as follows: + + `jdbc:://:/` + + Fedora currently supports H2, PostgresQL 12.3, MariaDB 10.5.3, and MySQL 8.0 + + So using the default ports for the supported databases here are the values we typically use: + + * PostgresQL: `jdbc:postgresql://localhost:5432/fcrepo` + * MariaDB: `jdbc:mariadb://localhost:3306/fcrepo` + * MySQL: `jdbc:mysql://localhost:3306/fcrepo` + +* `FCREPO_DB_USERNAME` - The database username +* `FCREPO_DB_PASSWORD` - The database password +* `FCREPO_OCFL_ROOT` - Sets the root directory of the OCFL. Defaults to `FCREPO_HOME/data/ocfl-root` if not set. +* `FCREPO_TEMP_ROOT` - Sets the temp directory used by OCFL. Defaults to `FCREPO_HOME/data/temp` if not set. +* `FCREPO_STAGING_ROOT` - Sets the staging directory used by OCFL. Defaults to `FCREPO_HOME/data/staging` if not set. +* `FCREPO_VELOCITY_LOG` - The Fedora HTML template code uses Apache Velocity, which generates a runtime log called velocity.log. Defaults to `FCREPO_HOME/logs/velocity`. A good choice might be /opt/tomcat/logs/velocity.log +* `FCREPO_JMS_BASE` - This specifies the baseUrl to use when generating JMS messages. You can specify the hostname with or without port and with or without path. If your system is behind a NAT firewall you may need this to avoid your message consumers trying to access the system on an invalid port. If this system property is not set, the host, port and context from the user's request will be used in the emitted JMS messages. If your Alpaca is on the same machine as your Fedora and you use the `islandora-indexing-fcrepo`, you could use http://localhost:8080/fcrepo/rest. + + +Check the Lyrasis Wiki to find all of [Fedora's properties](https://wiki.lyrasis.org/display/FEDORA6x/Properties) + +### Adding the Fedora Variables to `JAVA_OPTS` + +We need our Tomcat `JAVA_OPTS` to include references to our repository configuration. + +`/opt/tomcat/bin/setenv.sh` + +**Before**: +> 3 | export JAVA_OPTS="-Djava.awt.headless=true -server -Xmx1500m -Xms1000m" + +**After**: +> 3 | export JAVA_OPTS="-Djava.awt.headless=true -Dfcrepo.config.file=/opt/fcrepo/config/fcrepo.properties -DconnectionTimeout=-1 -server -Xmx1500m -Xms1000m" + +### Ensuring Tomcat Users Are In Place + +While not strictly necessary, we can use the `tomcat-users.xml` file to give us direct access to the Fedora endpoint. Fedora defines, out of the box, a `fedoraAdmin` and `fedoraUser` role that can be reflected in the users list for access. The following file will also include the base `tomcat` user. As always, these default passwords should likely not stay as the defaults. + +`/opt/tomcat/conf/tomcat-users.xml | tomcat:tomcat/600` +```{ .xml .copy } + + + + + + + + + +``` + +- `TOMCAT_PASSWORD`: `tomcat` +- `FEDORA_ADMIN_PASSWORD`: `islandora` +- `FEDORA_USER_PASSWORD`: `islandora` + +### Downloading and Placing the Latest Release + +Fedora `.war` files are packaged up as releases on the official GitHub repository. You should download the most recent stable release. + +```bash +sudo wget -O fcrepo.war FCREPO_WAR_URL +sudo mv fcrepo.war /opt/tomcat/webapps +sudo chown tomcat:tomcat /opt/tomcat/webapps/fcrepo.war +``` + +- `FCREPO_WAR_URL`: This can be found at the [fcrepo downloads page](https://github.com/fcrepo/fcrepo/releases); the file you're looking for is: + - Tagged in green as the 'Latest release' + - Named "fcrepo-webapp-VERSION.war" + +### Start the Tomcat Service + +As before, start the Tomcat service to get Fedora up and running. + +```bash +sudo systemctl start tomcat +``` + +**Note:** sometimes it takes a while for Fedora and Tomcat to start up, usually it shouldn't take longer than 5 minutes. + +Once it starts up, Fedora REST API should be available at http://localhost:8080/fcrepo/rest. The username is fedoraAdmin and we defined the password before as `FEDORA_ADMIN_PASSWORD` (default: "islandora"). + +## Syn + +### Downloading the Syn JAR File + +A compiled JAR of Syn can be found on the [Syn releases page](https://github.com/Islandora/Syn/releases). We’re going to add this to the list libraries accessible to Tomcat. + +``` +sudo wget -P /opt/tomcat/lib SYN_JAR_URL +# Ensure the library has the correct permissions. +sudo chown -R tomcat:tomcat /opt/tomcat/lib +sudo chmod -R 640 /opt/tomcat/lib +``` + +- `SYN_JAR_URL`: The latest stable release of the Syn JAR from the [releases page](https://github.com/Islandora/Syn/releases). Specifically, the JAR compiled as `-all.jar` is required. + +### Generating an SSL Key for Syn + +For Islandora and Fedora to talk to each other, an SSL key needs to be generated for use with Syn. We’re going to make a spot where such keys can live, and generate one. + +```bash +sudo mkdir /opt/keys +sudo openssl genrsa -out "/opt/keys/syn_private.key" 2048 +sudo openssl rsa -pubout -in "/opt/keys/syn_private.key" -out "/opt/keys/syn_public.key" +sudo chown www-data:www-data /opt/keys/syn* +``` + +### Placing the Syn Settings + +Syn sites and tokens belong in a settings file that we’re going to reference in Tomcat. + +`/opt/fcrepo/config/syn-settings.xml | tomcat:tomcat/600` +```{ .xml .copy } + + + ISLANDORA_SYN_TOKEN + +``` + +- `ISLANDORA_SYN_TOKEN`: `islandora` + - This should be a secure generated token rather than this default; it will be configured on the Drupal side later. + +### Adding the Syn Valve to Tomcat + +Referencing the valve we’ve created in our `syn-settings.xml` involves creating a `` entry in Tomcat’s `context.xml`: + +There are two options here: + +#### 1. Enable the Syn Valve for all of Tomcat. + +`/opt/tomcat/conf/context.xml` + +**Before**: +> 29 | `-->` + +> 30 | `` + +**After**: +> 29 | `-->` + +> 30 | `` + +> 31 | `` + +#### 2. Enable the Syn Valve for only Fedora. + +Create a new file at + +`/opt/tomcat/conf/Catalina/localhost/fcrepo.xml` + +```{ .xml .copy } + + + +``` + +Your Fedora web application needs to be deployed in Tomcat with the name `fcrepo.war`. Otherwise, change the name of the above XML file to match the deployed web application's name. + +### Restarting Tomcat + +Finally, restart tomcat to apply the new configurations. + +```bash +sudo systemctl restart tomcat +``` + +**Note:** sometimes it takes a while for Fedora and Tomcat to start up, usually it shouldn't take longer than 5 minutes. + +**Note:** after installing the Syn valve, you'll no longer be able to manually create/edit or delete objects via Fedora Web UI. All communication with Fedora will now be handled from the Islandora module in Drupal. + +### Redhat logging + +Redhat systems have stopped generating an all inclusive `catalina.out`, the `catalina..log` does not include web application's log statements. To get Fedora log statements flowing, you can create your own [LogBack](https://logback.qos.ch/) configuration file and point to it. + +`/opt/fcrepo/config/fcrepo-logback.xml | tomcat:tomcat/644` +```{ .xml .copy } + + + + + + %p %d{HH:mm:ss.SSS} [%thread] \(%c{0}\) %m%n + + + + + ${catalina.base}/logs/fcrepo.log + true + + ${catalina.base}/logs/fcrepo.%d{yyyy-MM-dd}.log.%i + 10MB + 30 + 2GB + + + %p %d{HH:mm:ss.SSS} [%thread] \(%c{0}\) %m%n + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +``` + +Then alter your `$JAVA_OPTS` like [above](#adding-the-fedora-variables-to-java_opts) to include +``` +-Dlogback.configurationFile=/opt/fcrepo/config/fcrepo-logback.xml +``` + +This will generate a log file at `${catalina.base}/logs/fcrepo.log` and will rotate each day or if the logs reaches 10MB. It will maintain 30 days of old logs, or 2GB whichever comes first. + +## Blazegraph 2 + +### Creating a Working Space for Blazegraph + +Blazegraph needs a space for configurations and data; we’re going to create this space in `/opt`. + +```bash +sudo mkdir -p /opt/blazegraph/data +sudo mkdir /opt/blazegraph/conf +sudo chown -R tomcat:tomcat /opt/blazegraph +``` + +### Downloading and Placing the Blazegraph WAR + +The Blazegraph `.war` file can be found in a few different places, but to ensure we’re able to easily `wget` it, we’re going to use the [maven.org](https://search.maven.org/) repository link to grab it. + +```bash +cd /opt +sudo wget -O blazegraph.war BLAZEGRAPH_WARFILE_LINK +sudo mv blazegraph.war /opt/tomcat/webapps +sudo chown tomcat:tomcat /opt/tomcat/webapps/blazegraph.war +``` + +- BLAZEGRAPH_WAR_URL: You can find a link to this at the [Maven repository for Blazegraph](https://repo1.maven.org/maven2/com/blazegraph/bigdata-war/); you’ll want to click the link for the latest version of Blazegraph 2.1.x, then get the link to the `.war` file within that version folder. + +Once this is downloaded, give it a moment to expand before moving on to the next step. + +### Configuring Logging + +We would like to have an appropriate logging configuration for Blazegraph, which can be useful for looking at incoming traffic and determining if anything has gone wrong with Blazegraph. Our logger isn’t going to be much different than the default logger; it can be made more or less verbose by changing the default `WARN` levels. There are several other loggers that can be enabled, like a SPARQL query trace or summary query evaluation log; if these are desired they should be added in. Consult the Blazegraph documentation for more details. + +`/opt/blazegraph/conf/log4j.properties | tomcat:tomcat/644` +```{ .text .copy } +log4j.rootCategory=WARN, dest1 + +# Loggers. +log4j.logger.com.bigdata=WARN +log4j.logger.com.bigdata.btree=WARN + +# Normal data loader (single threaded). +#log4j.logger.com.bigdata.rdf.store.DataLoader=INFO + +# dest1 +log4j.appender.dest1=org.apache.log4j.ConsoleAppender +log4j.appender.dest1.layout=org.apache.log4j.PatternLayout +log4j.appender.dest1.layout.ConversionPattern=%-5p: %F:%L: %m%n +#log4j.appender.dest1.layout.ConversionPattern=%-5p: %r %l: %m%n +#log4j.appender.dest1.layout.ConversionPattern=%-5p: %m%n +#log4j.appender.dest1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n +#log4j.appender.dest1.layout.ConversionPattern=%-4r(%d) [%t] %-5p %c(%l:%M) %x - %m%n + +# Rule execution log. This is a formatted log file (comma delimited). +log4j.logger.com.bigdata.relation.rule.eval.RuleLog=INFO,ruleLog +log4j.additivity.com.bigdata.relation.rule.eval.RuleLog=false +log4j.appender.ruleLog=org.apache.log4j.FileAppender +log4j.appender.ruleLog.Threshold=ALL +log4j.appender.ruleLog.File=/var/log/blazegraph/rules.log +log4j.appender.ruleLog.Append=true +log4j.appender.ruleLog.BufferedIO=false +log4j.appender.ruleLog.layout=org.apache.log4j.PatternLayout +log4j.appender.ruleLog.layout.ConversionPattern=%m +``` + +### Adding a Blazegraph Configuration + +Our configuration will be built from a few different files that we will eventually reference in `JAVA_OPTS` and directly apply to Blazegraph; these include most of the functional pieces Blazegraph requires, as well as a generalized configuration for the `islandora` namespace it will use. As with most large configurations like this, these should likely be tuned to your preferences, and the following files only represent sensible defaults. + +`/opt/blazegraph/conf/RWStore.properties | tomcat:tomcat/644` +``` { .text .copy } +com.bigdata.journal.AbstractJournal.file=/opt/blazegraph/data/blazegraph.jnl +com.bigdata.journal.AbstractJournal.bufferMode=DiskRW +com.bigdata.service.AbstractTransactionService.minReleaseAge=1 +com.bigdata.journal.Journal.groupCommit=false +com.bigdata.btree.writeRetentionQueue.capacity=4000 +com.bigdata.btree.BTree.branchingFactor=128 +com.bigdata.journal.AbstractJournal.initialExtent=209715200 +com.bigdata.journal.AbstractJournal.maximumExtent=209715200 +com.bigdata.rdf.sail.truthMaintenance=false +com.bigdata.rdf.store.AbstractTripleStore.quads=true +com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false +com.bigdata.rdf.store.AbstractTripleStore.textIndex=false +com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms +com.bigdata.namespace.kb.lex.com.bigdata.btree.BTree.branchingFactor=400 +com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=1024 +com.bigdata.journal.Journal.collectPlatformStatistics=false +``` + +`/opt/blazegraph/conf/blazegraph.properties | tomcat:tomcat/644` +```{ .text .copy } +com.bigdata.rdf.store.AbstractTripleStore.textIndex=false +com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.OwlAxioms +com.bigdata.rdf.sail.isolatableIndices=false +com.bigdata.rdf.store.AbstractTripleStore.justify=true +com.bigdata.rdf.sail.truthMaintenance=true +com.bigdata.rdf.sail.namespace=islandora +com.bigdata.rdf.store.AbstractTripleStore.quads=false +com.bigdata.namespace.islandora.lex.com.bigdata.btree.BTree.branchingFactor=400 +com.bigdata.journal.Journal.groupCommit=false +com.bigdata.namespace.islandora.spo.com.bigdata.btree.BTree.branchingFactor=1024 +com.bigdata.rdf.store.AbstractTripleStore.geoSpatial=false +com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false +``` + +`/opt/blazegraph/conf/inference.nt | tomcat:tomcat/644` +```{ .text .copy } + . + . +``` + +### Specifying the `RWStore.properties` in `JAVA_OPTS` + +In order to enable our configuration when Tomcat starts, we need to reference the location of `RWStore.properties` in the `JAVA_OPTS` environment variable that Tomcat uses. + +`/opt/tomcat/bin/setenv.sh` + +**Before**: +> 3 | export JAVA_OPTS="-Djava.awt.headless=true -Dfcrepo.config.file=/opt/fcrepo/config/fcrepo.properties -DconnectionTimeout=-1 -server -Xmx1500m -Xms1000m" + +**After**: +> 3 | export JAVA_OPTS="-Djava.awt.headless=true -Dfcrepo.config.file=/opt/fcrepo/config/fcrepo.properties -DconnectionTimeout=-1 -Dcom.bigdata.rdf.sail.webapp.ConfigParams.propertyFile=/opt/blazegraph/conf/RWStore.properties -Dlog4j.configuration=file:/opt/blazegraph/conf/log4j.properties -server -Xmx1500m -Xms1000m" + + +### Restarting Tomcat + +Finally, restart Tomcat to pick up the changes we’ve made. + +```bash +sudo systemctl restart tomcat +``` + +### Installing Blazegraph Namespaces and Inference + +The two other files we created, `blazegraph.properties` and `inference.nt`, contain information that Blazegraph requires in order to establish and correctly use the datasets Islandora will send to it. First, we need to create a dataset - contained in `blazegraph.properties` - and then we need to inform that dataset of the inference set we have contained in `inference.nt`. + +``` { .bash .copy } +curl -X POST -H "Content-Type: text/plain" --data-binary @/opt/blazegraph/conf/blazegraph.properties http://localhost:8080/blazegraph/namespace +``` +If this worked correctly, Blazegraph should respond with "CREATED: islandora" to let us know it created the islandora namespace. +``` { .bash .copy } +curl -X POST -H "Content-Type: text/plain" --data-binary @/opt/blazegraph/conf/inference.nt http://localhost:8080/blazegraph/namespace/islandora/sparql +``` +If this worked correctly, Blazegraph should respond with some XML letting us know it added the 2 entries from inference.nt to the namespace. diff --git a/docs/installation/manual/installing-karaf-and-alpaca.md b/docs/installation/manual/installing-karaf-and-alpaca.md new file mode 100644 index 000000000..31f39d223 --- /dev/null +++ b/docs/installation/manual/installing-karaf-and-alpaca.md @@ -0,0 +1,395 @@ +# Installing Karaf and Alpaca + +!!! warning "Needs Maintenance" + The manual installation documentation is in need of attention. We are aware that some components no longer work as documented here. If you are interested in helping us improve the documentation, please see [Contributing](../../../contributing/CONTRIBUTING). + +## In this section, we will install: + +- [Apache ActiveMQ](https://activemq.apache.org/), a messaging server that will be used to handle communication between Alpaca and other components +- [Apache Karaf](https://karaf.apache.org/), the Java application runtime that Alpaca will be deployed in +- [Islandora/Alpaca](https://github.com/Islandora/Alpaca), a suite of Java middleware applications that will handle communication between various components of Islandora. + +## ActiveMQ 5 + +### Installing ActiveMQ + +In our case, the default installation method for ActiveMQ via `apt-get` will suffice. + +```bash +sudo apt-get -y install activemq +``` + +This will give us: + +- A base configuration at `/var/lib/activemq/conf` +- A data storage directory at `/var/lib/activemq/data` +- The base ActiveMQ installation at `/usr/share/activemq` +- An `activemq` service that will be run on boot +- A user, `activemq`, who will be in charge of the ActiveMQ service + +Take note of the version of ActiveMQ we're going to be installing. It needs to match a Karaf blueprint we'll create later. Check the version with + +```bash +sudo apt-cache policy activemq +``` + +Write down the version listed under `Installed: `. + +## Karaf 4 + +## Creating a Karaf User + +Karaf, as well as its processes and service, will be owned by a user in charge of ensuring this portion of the stack is segregated and that the service is running. + +```bash +sudo addgroup karaf +sudo adduser karaf --ingroup karaf --home /opt/karaf --shell /usr/bin +``` + +As always, you will be prompted for a password, which you should create at this time. All other options can be left blank. + +### Downloading and Placing Karaf + +Since there’s no `apt-get` installer for Karaf, we’re going to manually download and install it directly from its binary installer. + +```bash +cd /opt +sudo wget -O karaf.tar.gz KARAF_TARBALL_LINK +sudo tar -xzvf karaf.tar.gz +sudo chown -R karaf:karaf KARAF_DIRECTORY +sudo mv KARAF_DIRECTORY/* /opt/karaf +``` +- `KARAF_TARBALL_LINK`: It’s recommended to get the most recent version of Karaf 4.2.x. This will depend on the current version of Karaf, which can be found on the [Karaf downloads page](https://karaf.apache.org/download.html) under “Karaf Runtime”. Like Solr, you can’t directly `wget` these links, but clicking on the `.tar.gz` link for the binary distribution will bring you to a list of mirrors, as well as provide you with a recommended mirror you can use here. +- `KARAF_DIRECTORY`: This will depend on the exact version being used, but will likely be `/opt/apache-karaf-VERSION`, where `VERSION` is the current Karaf version number. + +### Configuring Karaf Logging + +We’re going to apply some basic logging to our Karaf installation that should suffice for an example. In a production installation, you may want to play around with some of these values for more personally useful logging. + +```bash +sudo mkdir /var/log/karaf +sudo chown karaf:karaf /var/log/karaf +``` + +`/opt/karaf/etc/org.pos4j.pax.logging.cfg | karaf:karaf/644` +``` +# Root logger +log4j.rootLogger=INFO, out, osgi:* +log4j.throwableRenderer=org.apache.log4j.OsgiThrowableRenderer + +# File appender +log4j.appender.out=org.apache.log4j.RollingFileAppender +log4j.appender.out.layout=org.apache.log4j.PatternLayout +log4j.appender.out.layout.ConversionPattern=%d{ISO8601} | %-5.5p | %-16.16t | %-32.32c{1} | %X{bundle.id} - %X{bundle.name } - %X{bundle.version} | %m%n +log4j.appender.out.file=/var/log/karaf/karaf.log +log4j.appender.out.append=true +log4j.appender.out.maxFileSize=1MB +log4j.appender.out.maxBackupIndex=10 + +# Camel Logger +log4j.appender.camel=org.apache.log4j.RollingFileAppender +log4j.appender.camel.layout=org.apache.log4j.PatternLayout +log4j.appender.camel.layout.ConversionPattern=%d{ISO8601} | %-5.5p | %-16.16t | %-32.32c{1} | %X{bundle.id} - %X{bundle.na me} - %X{bundle.version} | %m%n +log4j.appender.camel.file=/var/log/karaf/camel.log +log4j.appender.camel.append=false +log4j.appender.camel.maxFileSize=1MB +log4j.appender.camel.maxBackupIndex=10 + +log4j.logger.org.apache.camel=INFO, camel + +# Islandora Logger +log4j.appender.islandora=org.apache.log4j.RollingFileAppender +log4j.appender.islandora.layout=org.apache.log4j.PatternLayout +log4j.appender.islandora.layout.ConversionPattern=%d{ISO8601} | %-5.5p | %-16.16t | %-32.32c{1} | %X{bundle.id} - %X{bundl e.name} - %X{bundle.version} | %m%n +log4j.appender.islandora.file=/var/log/karaf/islandora.log +log4j.appender.islandora.append=false +log4j.appender.islandora.maxFileSize=1MB +log4j.appender.islandora.maxBackupIndex=10 + +log4j.logger.ca.islandora.camel=INFO, islandora +``` + +### Creating a `setenv.sh` Script for Karaf + +Similar to Tomcat, our Karaf service is going to rely on a `setenv` shell script to determine environment variables Karaf needs in place when running. For now, this will simply be the path to `JAVA_HOME`, but this also accepts many other parameters you can find in the default `setenv` script. + +`/opt/karaf/bin/setenv | karaf:karaf/755` +``` +#!/bin/sh +export JAVA_HOME="PATH_TO_JAVA_HOME" +``` +- `PATH_TO_JAVA_HOME`: This will be the same `JAVA_HOME` we used when installing Tomcat , and can be found using the same method (i.e., still `/usr/lib/jvm/java-11-openjdk-amd64` if that's what it was before). + +### Initializing Karaf + +We’re going to start Karaf, then run the installer to put our configurations in place and generate a Karaf service. Once these are installed, we’re going to stop Karaf, as from there on out its start/stop management should be handled via that service. + +First we need to enable the default Karaf user in `/opt/karaf/etc/users.properties`: + +**Before**: +> 32 | # karaf = karaf,\_g\_:admingroup + +> 33 | # \_g\_\\:admingroup = group,admin,manager,viewer,systembundles,ssh + +**After**: +> 32 | karaf = karaf,\_g\_:admingroup + +> 33 | \_g\_\\:admingroup = group,admin,manager,viewer,systembundles,ssh + +Save the file and close it, then: + +```bash +sudo -u karaf /opt/karaf/bin/start +# You may want to wait a bit for Karaf to start. +# If you're not sure whether or not it's running, you can always run: +# ps aux | grep karaf +# to see if the server is up and running. +/opt/karaf/bin/client feature:install wrapper +/opt/karaf/bin/client wrapper:install +/opt/karaf/bin/stop +``` + +### Creating and Starting the Karaf Service + +Installing the Karaf wrapper generates several service files that can be used on different types of systems. For Debian and Ubuntu installation we want to enable the `karaf.service` service so that Karaf is properly started on boot. + +```bash +sudo systemctl enable /opt/karaf/bin/karaf.service +sudo systemctl start karaf +``` + +We can check if the service started correctly with: + +```bash +sudo systemctl status karaf +``` + +Press Q to close the status. + +## Alpaca 1.0.x + +### Adding the Required Karaf Repositories + +Karaf features can be installed from several different types of sources, but the fastest and easiest way to do so is from existing repository URLs that we can just plug into Karaf to provide us feature lists prepared and ready for installation. Like most interactions with Karaf, we can add these repositories using its built-in `client`. + +!!! notice + These repositories are updated consistently, and their updates include revised dependency lists. Commonly, when repositories are out of date or otherwise mismatched, feature installation can result in an `Unable to resolve root: missing requirement` error; for this reason, this guide recommends using recently-updated versions of these repositories. That being said, if such errors occur despite installing the latest versions of these features, the maintainer of the features repository should be informed. + +For the Karaf features we’re going to install, we need a few different repositories to be added to the list: + +```bash +/opt/karaf/bin/client repo-add mvn:org.apache.activemq/activemq-karaf/ACTIVEMQ_KARAF_VERSION/xml/features +/opt/karaf/bin/client repo-add mvn:org.apache.camel.karaf/apache-camel/APACHE_CAMEL_VERSION/xml/features +/opt/karaf/bin/client repo-add mvn:ca.islandora.alpaca/islandora-karaf/ISLANDORA_KARAF_VERSION/xml/features +# XXX: This shouldn't be strictly necessary, but appears to be a missing +# upstream dependency for some fcrepo features. +/opt/karaf/bin/client repo-add mvn:org.apache.jena/jena-osgi-features/JENA_OSGI_VERSION/xml/features +``` +- `ACTIVEMQ_KARAF_VERSION`: The version of ActiveMQ we wrote down at the beginning of this chapter when installing ActiveMQ via `apt-get` +- `APACHE_CAMEL_VERSION`: The latest version of Apache Camel 2.x.x; you can find this listed at the [apache-camel repository page](https://mvnrepository.com/artifact/org.apache.camel.karaf/apache-camel) (e.g., 2.25.4 at the time of writing) +- `ISLANDORA_KARAF_VERSION`: The latest version of Islandora Karaf 1.x; you can find this listed at the [islandora-karaf repository page](https://mvnrepository.com/artifact/ca.islandora.alpaca/islandora-karaf) (e.g., 1.0.5 at the time of writing) +- `JENA_OSGI_VERSION`: The latest version of the Apache Jena 3.x OSGi features; you can find this listed at the [jena-osgi-features repository page](https://mvnrepository.com/artifact/org.apache.jena/jena-osgi-features) (e.g., 3.17.0 at the time of writing) + +### Configuring Karaf Features + +Our installed Karaf features require configuration files to know exactly where to route things coming and going from them. + +`/opt/karaf/etc/ca.islandora.alpaca.http.client.cfg | karaf:karaf/644` +``` +token.value=ISLANDORA_SYN_TOKEN +``` +- `ISLANDORA_SYN_TOKEN`: This should be the same token that was established during the installation of Syn in your `syn-settings.xml` file + +`/opt/karaf/etc/org.fcrepo.camel.indexing.triplestore.cfg | karaf:karaf/644` +``` +input.stream=activemq:topic:fedora +triplestore.reindex.stream=activemq:queue:triplestore.reindex +triplestore.baseUrl=http://localhost:8080/blazegraph/namespace/islandora/sparql +``` + +`/opt/karaf/etc/ca.islandora.alpaca.indexing.triplestore.cfg | karaf:karaf/644` +``` +error.maxRedeliveries=10 +index.stream=activemq:queue:islandora-indexing-triplestore-index +delete.stream=activemq:queue:islandora-indexing-triplestore-delete +triplestore.baseUrl=http://localhost:8080/blazegraph/namespace/islandora/sparql +``` + +`/opt/karaf/etc/ca.islandora.alpaca.indexing.fcrepo.cfg | karaf:karaf/644` +``` +error.maxRedeliveries=5 +node.stream=activemq:queue:islandora-indexing-fcrepo-content +node.delete.stream=activemq:queue:islandora-indexing-fcrepo-delete +media.stream=activemq:queue:islandora-indexing-fcrepo-media +file.stream=activemq:queue:islandora-indexing-fcrepo-file +file.delete.stream=activemq:queue:islandora-indexing-fcrepo-file-delete +milliner.baseUrl=http://localhost/milliner +``` + +### Blueprinting Karaf Derivative Connectors + +For those services in Crayfish we have set up to provide derivatives to Islandora resources, we need connector blueprints to tell the derivative connector how to route incoming requests, run conversions, and return outgoing derivatives. + +Our blueprints are going to look largely similar between services, with only a few properties changing between them. Largely, these mainly just need to match the ActiveMQ queues we established in the previous configuration, and route to the correct Crayfish service. + +`/opt/karaf/deploy/ca.islandora.alpaca.connector.ocr.blueprint.xml | karaf:karaf/644` +```xml + + + + + + + + + + + + + + + + + + ca.islandora.alpaca.connector.derivative + + + +``` + +`/opt/karaf/deploy/ca.islandora.alpaca.connector.houdini.blueprint.xml | karaf:karaf/644` +```xml + + + + + + + + + + + + + + + + + + ca.islandora.alpaca.connector.derivative + + + +``` + +`/opt/karaf/deploy/ca.islandora.alpaca.connector.homarus.blueprint.xml | karaf:karaf/644` +```xml + + + + + + + + + + + + + + + + + + ca.islandora.alpaca.connector.derivative + + + +``` + +`/opt/karaf/deploy/ca.islandora.alpaca.connector.fits.blueprint.xml | karaf:karaf/644` +```xml + + + + + + + + + + + + + + + + + + ca.islandora.alpaca.connector.derivative + + + +``` + +### Installing the Required Karaf Features + +Before we can configure the features we’re going to use, they need to be installed. Some of these installations may take some time. + +```bash +/opt/karaf/bin/client feature:install camel-blueprint +/opt/karaf/bin/client feature:install activemq-blueprint +/opt/karaf/bin/client feature:install fcrepo-service-activemq +# This again should not be strictly necessary, since this isn't the triplestore +# we're using, but is being included here to resolve the aforementioned +# missing link in the dependency chain. +/opt/karaf/bin/client feature:install jena +/opt/karaf/bin/client feature:install fcrepo-camel +/opt/karaf/bin/client feature:install fcrepo-indexing-triplestore +/opt/karaf/bin/client feature:install islandora-http-client +/opt/karaf/bin/client feature:install islandora-indexing-triplestore +/opt/karaf/bin/client feature:install islandora-indexing-fcrepo +/opt/karaf/bin/client feature:install islandora-connector-derivative +``` + +### Verifying Karaf Components are Running (Optional But Recommended) + +At this point, Karaf components should be up and running, but it's a good idea to double-check that this is the case. We can do this from within the Karaf client by taking a look at its component list. + +```bash +# Until this point, we've been running Karaf commands from outside; we can hop +# into the client, however, and run commands from directly within. +/opt/karaf/bin/client +# This takes us into the Karaf client so we can run commands. +la | grep islandora +la | grep fcrepo +# It may be a good idea to use this to look up to the other components we +# installed. +logout +``` + +For the above `la | grep` commands, components that are running should be listed as `Active`. diff --git a/docs/installation/manual/installing-solr.md b/docs/installation/manual/installing-solr.md new file mode 100644 index 000000000..99c448a51 --- /dev/null +++ b/docs/installation/manual/installing-solr.md @@ -0,0 +1,170 @@ +# Installing Solr + +!!! warning "Needs Maintenance" + The manual installation documentation is in need of attention. We are aware that some components no longer work as documented here. If you are interested in helping us improve the documentation, please see [Contributing](../../../contributing/CONTRIBUTING). + +## In this section, we will install: +- [Apache Solr 8](https://lucene.apache.org/solr/), the search engine used to index and find Drupal content +- [search_api_solr](https://www.drupal.org/project/search_api_solr), the Solr implementation of Drupal's search API + +## Solr 8 + +### Downloading and Placing Solr + +The Solr binaries can be found at the [Solr downloads page](https://solr.apache.org/downloads.html); the most recent stable release of Solr 8 should be used. + +```bash +# While generally we download tarballs as .tar.gz files without version +# information, the Solr installer is a bit particular in that it expects a .tgz +# file with the same name as the extracted folder it contains. It's odd, and we +# can't really get around it. +cd +wget SOLR_DOWNLOAD_LINK +tar -xzvf SOLR_TARBALL +``` +- `SOLR_DOWNLOAD_LINK`: **NOTICE**: This will depend on a few different things, not least of all the current version of Solr. The link to the `.tgz` for the binary on the downloads page will take you to a list of mirrors that Solr can be downloaded from, and provide you with a preferred mirror at the top. This preferred mirror should be used as the `SOLR_DOWNLOAD_LINK`. +- `SOLR_TARBALL`: The filename that was downloaded, e.g., `solr-8.9.0.tgz` + +### Running the Solr Installer + +Solr includes an installer that does most of the heavy lifting of ensuring we have a Solr user, a location where Solr lives, and configurations in place to ensure it’s running on boot. + +```bash +sudo UNTARRED_SOLR_FOLDER/bin/install_solr_service.sh SOLR_TARBALL +``` +- `UNTARRED_SOLR_FOLDER`: This will likely simply be `solr-VERSION`, where `VERSION` is the version number that was downloaded. + +The port that Solr runs on can potentially be configured at this point, but we'll expect it to be running on `8983`. + +Wait until the command output reaches: + +``` +Started Solr server on port 8983 (pid=****). Happy searching! +systemd[1]: Started LSB: Controls Apache Solr as a Service. +``` + +After which you can press `q` to quit the output (this won't kill Solr so it's safe). + +You can check if Solr is running correctly by going to http://localhost:8983/solr + + +### Increasing the Open File Limit (Optional) + +Solr's installation guide recommends that you increase the open file limit so that operations aren't disrupted while Solr is trying to access things in its index. This limit can be increased while the system is running, but doing so won't persist after a reboot. You can hard-increase this limit using your system's `sysctl` file: + +`/etc/sysctl.conf` + +Add the following line to the end of the file: + +``` +fs.file-max = 65535 +``` + +Then apply your new configuration. + +```bash +sudo sysctl -p +``` + +### Creating a New Solr Core + +Initially, our new Solr core will contain a configuration copied from the example included with the installation, so that we have something to work with when we configure this on the Drupal side. We’ll later update this with generated configurations we create in Drupal. + +```bash +cd /opt/solr +sudo mkdir -p /var/solr/data/SOLR_CORE/conf +sudo cp -r example/files/conf/* /var/solr/data/SOLR_CORE/conf +sudo chown -R solr:solr /var/solr +sudo -u solr bin/solr create -c SOLR_CORE -p 8983 +``` +- `SOLR_CORE`: `islandora8` + +You should see an output similar to this: +``` +WARNING: Using _default configset with data driven schema functionality. NOT RECOMMENDED for production use. + To turn off: bin/solr config -c islandora8 -p 8983 -action set-user-property -property update.autoCreateFields -value false + +Created new core 'islandora8' +``` + +### Installing `search_api_solr` + +Rather than use an out-of-the-box configuration that won’t be suitable for our purposes, we’re going to use the Drupal `search_api_solr` module to generate one for us. This will also require us to install the module so we can create these configurations using Drush. + +```bash +cd /opt/drupal +sudo -u www-data composer require drupal/search_api_solr:^4.2 +drush -y en search_api_solr +``` + +You should see an output similar to this: +``` +The following module(s) will be enabled: search_api_solr, language, search_api + + // Do you want to continue?: yes. + + [success] Successfully enabled: search_api_solr, language, search_api + +``` + +### Configuring search_api_solr + +Before we can create configurations to use with Solr, the core we created earlier needs to be referenced in Drupal. + +Log in to the Drupal site at `/user` using the sitewide administrator username and password (if using defaults from previous chapters this should be `islandora` and `islandora`), then navigate to `/admin/config/search/search-api/add-server`. + +Fill out the server addition form using the following options: + +![Adding a Solr Search Server](../../assets/adding_a_solr_search_server.png) + +![Configuring the Standard Solr Connector](../../assets/configuring_standard_solr_connector.png) + +![Setting the Solr Install Directory](../../assets/setting_the_solr_install_directory.png) + +- `SERVER_NAME`: `islandora8` + - This is completely arbitrary, and is simply used to differentiate this search server configuration from all others. **Write down** or otherwise pay attention to the `machine_name` generated next to the server name you type in; this will be used in the next step. + +As a recap for this configuration: + +- **Server name** should be an arbitrary identifier for this server +- **Enabled** should be checked +- **Backend** should be set to **Solr** +- Under **CONFIGURE SOLR BACKEND**, **Solr Connector** should be set to **Standard** +- Under **CONFIGURE STANDARD SOLR CONNECTOR**: + - **HTTP protocol** is simply set to **http** since we've set this up on the same machine Drupal lives on. On a production installation, Solr should likely be installed behind an HTTPS connection. + - **Solr host** can be set to **localhost** since, again, this is set up on the same machine Drupal lives on. On a production installation, this may vary, especially if parts of the installation live on different severs + - **Solr port** should be set to the port Solr was installed on, which is **8983** by default + - **Solr path** should be set to the configured path to the instance of Solr; in a default installation, there is only one Solr instance, and it lives at **/** + - **Solr core** should be the name of the Solr core you created earlier, which is why it's listed as **SOLR_CORE** here +- Under **ADVANCED SERVER CONFIGURATION**, **solr.install.dir** should be set to the path where we installed Solr, which this guide has established at **/opt/solr** + +Click **Save** to create the server configuration. + +**NOTICE** + You can ignore the error about an incompatible Solr schema; we're going to set this up in the next step. In fact, if you refresh the page after restarting Solr in the next step, you should see the error disappear. + +### Generating and Applying Solr Configurations + +Now that our core is in place and our Drupal-side configurations exist, we’re ready to generate Solr configuration files to connect this site to our search engine. + +```bash +cd /opt/drupal +drush solr-gsc SERVER_MACHINE_NAME /opt/drupal/solrconfig.zip +unzip -d ~/solrconfig solrconfig.zip +sudo cp ~/solrconfig/* /var/solr/data/SOLR_CORE/conf +sudo systemctl restart solr +``` +- `SERVER_MACHINE_NAME`: This should be the `machine_name` that was automatically generated when creating the configuration in the above step. + +### Adding an Index + +In order for content to be indexed back into Solr, a search index needs to be added to our server. Navigate to `/admin/config/search/search-api/add-index` and check off the things you'd like to be indexed. + +**NOTICE** + You should come back here later and reconfigure this after completing the last step in this guide. The default indexing configuration is pretty permissive, and you may want to restrict, for example, indexed content to just Islandora-centric bundles. This guide doesn't set up the index's fields either, which are going to be almost wholly dependent on the needs of your installation. Once you complete that configuration later on, re-index Solr from the configuration page of the index we're creating here. + +![Adding a Search Index](../../assets/adding_a_search_index.png) + +![Specifying the Solr Server](../../assets/specifying_the_solr_server.png) + +Click **Save** to add your index and kick off indexing of existing items. diff --git a/docs/installation/manual/installing-tomcat-and-cantaloupe.md b/docs/installation/manual/installing-tomcat-and-cantaloupe.md new file mode 100644 index 000000000..f3c8a515d --- /dev/null +++ b/docs/installation/manual/installing-tomcat-and-cantaloupe.md @@ -0,0 +1,157 @@ +# Installing Tomcat and Cantaloupe + +!!! warning "Needs Maintenance" + The manual installation documentation is in need of attention. We are aware that some components no longer work as documented here. If you are interested in helping us improve the documentation, please see [Contributing](../../../contributing/CONTRIBUTING). + +## In this section, we will install: +- [Tomcat 9](https://tomcat.apache.org/download-90.cgi), the Java servlet container that will serve up some Java applications on various endpoints, including, importantly, Fedora +- [Cantaloupe 5](https://cantaloupe-project.github.io/), the image tileserver - running in Tomcat - that will be used to serve up large images in a web-accessible fashion + +## Tomcat 9 + +### Installing OpenJDK 11 + +Tomcat runs in a Java runtime environment, so we'll need one to continue. In our case, OpenJDK 11 is open-source, free to use, and can fairly simply be installed using `apt-get`: + +```bash +sudo apt-get -y install openjdk-11-jdk openjdk-11-jre +``` + +The installation of OpenJDK via `apt-get` establishes it as the de-facto Java runtime environment to be used on the system, so no further configuration is required. + +The resultant location of the java JRE binary (and therefore, the correct value of `JAVA_HOME` when it’s referenced) will vary based on the specifics of the machine it’s being installed on; that being said, you can find its exact location using `update-alternatives`: + +```bash +update-alternatives --list java +``` +Take a note of this path as we will need it later. + +### Creating a `tomcat` User + +Apache Tomcat, and all its processes, will be owned and managed by a specific user for the purposes of keeping parts of the stack segregated and accountable. + +```bash +sudo addgroup tomcat +sudo adduser tomcat --ingroup tomcat --home /opt/tomcat --shell /usr/bin +``` + +You will be prompted to create a password for the `tomcat` user; all the other information as part of the `adduser` command can be ignored. + +### Downloading and Placing Tomcat 9 + +Tomcat 9 itself can be installed in several different ways; while it’s possible to install via `apt-get`, this doesn’t give us a great deal of control over exactly how we’re going to run and manage it; as a critical part of the stack, it is beneficial for our purposes to have a good frame of reference for the inner workings of Tomcat. + +We’re going to download the latest version of Tomcat to `/opt` and set it up so that it runs automatically. Bear in mind that with the following commands, this is going to be entirely relative to the current version of Tomcat 9, which we’ll try to mitigate as we go. + +```bash +cd /opt +sudo wget -O tomcat.tar.gz TOMCAT_TARBALL_LINK +sudo tar -zxvf tomcat.tar.gz +sudo mv /opt/TOMCAT_DIRECTORY/* /opt/tomcat +sudo chown -R tomcat:tomcat /opt/tomcat +``` +- `TOMCAT_TARBALL_LINK`: No default can be provided here; you should navigate to the [Tomcat 9 downloads page](https://tomcat.apache.org/download-90.cgi) and grab the link to the latest `.tar.gz` file under the “Core” section of “Binary Distributions”. It is highly recommended to grab the latest version of Tomcat 9, as it will come with associated security patches and fixes. +- `TOMCAT_DIRECTORY`: This will also depend entirely on the exact version of tomcat downloaded - for example, `apache-tomcat-9.0.50`. Again, `ls /opt` can be used to find this. + +### Creating a setenv.sh Script + +When Tomcat runs, some configuration needs to be pre-established as a series of environment variables that will be used by the script that runs it. + +`/opt/tomcat/bin/setenv.sh | tomcat:tomcat/755` +``` +export CATALINA_HOME="/opt/tomcat" +export JAVA_HOME="PATH_TO_JAVA_HOME" +export JAVA_OPTS="-Djava.awt.headless=true -server -Xmx1500m -Xms1000m" +``` +- `PATH_TO_JAVA_HOME`: This will vary a bit depending on the environment, but will likely live in `/usr/lib/jvm` somewhere (e.g., `/usr/lib/jvm/java-11-openjdk-amd64`); again, in an Ubunutu environment you can check a part of this using `update-alternatives --list java`, which will give you the path to the JRE binary within the Java home. Note that `update-alternatives --list java` will give you the path to the binary, so for `PATH_TO_JAVA_HOME` delete the `/bin/java` at the end to get the Java home directory, so it should look something like this: +``` +export JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64" +``` + +### Creating the Tomcat Service + +Tomcat includes two shell scripts we’re going to make use of - `startup.sh` and `shutdown.sh` - which are light wrappers on top of a third script, `catalina.sh`, which manages spinning up and shutting down the Tomcat server. + +Debian and Ubuntu use `systemctl` to manage services; we’re going to create a .service file that can run these shell scripts. + +`/etc/systemd/system/tomcat.service | root:root/755` +``` +[Unit] +Description=Tomcat + +[Service] +Type=forking +ExecStart=/opt/tomcat/bin/startup.sh +ExecStop=/opt/tomcat/bin/shutdown.sh +SyslogIdentifier=tomcat + +[Install] +WantedBy=multi-user.target +``` + +### Enabling and Starting Tomcat + +We’re going to both `enable` and `start` Tomcat. Enabling Tomcat will ensure that it starts on boot, the timing of which is defined by the `[Install]` section’s `WantedBy` statement, which specifies what it should start after. This is separate from starting it, which we need to do now in order to get Tomcat up and running without requiring a reboot. + +```bash +sudo systemctl enable tomcat +sudo systemctl start tomcat +``` + +We can check that Tomcat has started by running `sudo systemctl status tomcat | grep Active`; we should see that Tomcat is `active (running)`, which is the correct result of startup.sh finishing its run successfully. + +## Installing Cantaloupe 5 + +Since version 5, Cantaloupe is released as a standalone Java application and is no longer deployed in Tomcat via a .war file. Even so, we can still fine-tune how it runs and even install it as a service. + +### Downloading Cantaloupe + +Releases of Cantaloupe live on the [Cantaloupe release page](https://github.com/cantaloupe-project/cantaloupe/releases); the latest version can be found here as a `.zip` file. + +```bash +sudo wget -O /opt/cantaloupe.zip CANTALOUPE_RELEASE_URL +sudo unzip /opt/cantaloupe.zip +``` +- `CANTALOUPE_RELEASE_URL`: It’s recommended we grab the latest version of Cantaloupe 5. This can be found on the above-linked release page, as the `.zip` version; for example, https://github.com/cantaloupe-project/cantaloupe/releases/download/v5.0.3/cantaloupe-5.0.3.zip - make sure **not** to download the source code zip file as that isn't compiled for running out-of-the-box. + +### Creating a Cantaloupe Configuration + +Cantaloupe pulls its configuration from a file called `cantaloupe.properties`; there are also some other files that can contain instructions for Cantaloupe while it’s running; specifically, we’re going to copy over the `delegates.rb` file, which can also contain custom configuration. We won’t make use of this file; we’re just copying it over for demonstration purposes. + +Creating these files from scratch is *not* recommended; rather, we’re going to take the default cantaloupe configurations and plop them into their own folder so we can work with them. + +```bash +sudo mkdir /opt/cantaloupe_config +sudo cp CANTALOUPE_VER/cantaloupe.properties.sample /opt/cantaloupe_config/cantaloupe.properties +sudo cp CANTALOUPE_VER/delegates.rb.sample /opt/cantaloupe_config/delegates.rb +``` +- `CANTALOUPE_VER`: This will depend on the exact version of Cantaloupe downloaded; in the above example release, this would be `cantaloupe-5.0.3` + +The out-of-the-box configuration will work fine for our purposes, but it’s highly recommended that you take a look through the `cantaloupe.properties` and see what changes can be made; specifically, logging to actual logfiles isn’t set up by default, so you may want to take a peek at the `log.application.SyslogAppender` or `log.application.RollingFileAppender`, as well as changing the logging level. + +### Installing and configuring Cantaloupe as a service + +Since it is a standalone application, we can configure Cantaloupe as a systemd service like we did with Tomcat, so it can start on boot: + +`/etc/systemd/system/cantaloupe.service | root:root/755` +``` +[Unit] +Description=Cantaloupe + +[Service] +ExecStart=java -cp /opt/CANTALOUPE_VER/CANTALOUPE_VER.jar -Dcantaloupe.config=/opt/cantaloupe_config/cantaloupe.properties -Xmx1500m -Xms1000m edu.illinois.library.cantaloupe.StandaloneEntry +SyslogIdentifier=cantaloupe + +[Install] +WantedBy=multi-user.target +``` +- `CANTALOUPE_VER`: This will depend on the exact version of Cantaloupe downloaded; in the above example release, this would be `cantaloupe-5.0.3` + +We can now enable the service and run it: + +```bash +sudo systemctl enable cantaloupe +sudo systemctl start cantaloupe +``` + +We can check the service status with `sudo systemctl status cantaloupe | grep Active` and the splash screen of Cantaloupe should be available at http://localhost:8182 diff --git a/docs/installation/manual/preparing-a-webserver.md b/docs/installation/manual/preparing-a-webserver.md new file mode 100644 index 000000000..b4e3fc63e --- /dev/null +++ b/docs/installation/manual/preparing-a-webserver.md @@ -0,0 +1,114 @@ +# Preparing a LAPP Server + +!!! warning "Needs Maintenance" + The manual installation documentation is in need of attention. We are aware that some components no longer work as documented here. If you are interested in helping us improve the documentation, please see [Contributing](../../../contributing/CONTRIBUTING). + +## In this section, we will install: + +- [Apache 2](https://httpd.apache.org/), the webserver that will deliver webpages to end users +- [PHP 7](https://www.php.net/), the runtime code interpreter that Drupal will use to generate webpages and other services via apache, as well as that Drush and Composer will use to run tasks from the command line +- Several modules for PHP 7 which are required to run the PHP code that Drupal and other applications will be executing +- [PostgreSQL 10](https://www.postgresql.org/), the database that Drupal will use for storage (as well as other applications down the line) + +## Apache 2 + +### Install Apache 2 + +Apache can typically be installed and configured outright by your operating system’s package manager: + +```bash +sudo apt-get -y install apache2 apache2-utils +``` + +This will install: + +- A `systemd` service that will ensure Apache can be stopped and started, and will run when the machine is powered on +- A set of Apache configurations in `/etc/apache2`, including the basic configuration, ports configuration, enabled mods, and enabled sites +- An Apache webroot in `/var/www/html`, configured to be the provided server on port `:80` in `/etc/apache2/sites-enabled/000-default.conf`; we’ll make changes and additions to this file later +- A user and group, `www-data`, which we will use to read/write web documents. + +### Enable Apache Mods + +We’re going to enable a couple of Apache mods that Drupal highly recommends installing, and which are de-facto considered required by Islandora: + +```bash +sudo a2enmod ssl +sudo a2enmod rewrite +sudo systemctl restart apache2 +``` + +### Add the Current User to the `www-data` Group + +Since the user we are currently logged in as is going to work quite a bit inside the Drupal directory, we want to give it group permissions to anything the `www-data` group has access to. When we run `composer`, `www-data` will also be caching data in our own home directory, so we want this group modification to go in both directions. + +**N.B.** This code block uses **backticks**, not single quotes; this is an important distinction as backticks have special meaning in `bash`. + +**Note** If doing this in the terminal, replace "whoami" with your username and remove the backticks + +```bash +sudo usermod -a -G www-data `whoami` +sudo usermod -a -G `whoami` www-data +# Immediately log back in to apply the new group. +sudo su `whoami` +``` + +## PHP 7.4 + +### Install PHP 7.4 + +If you're running Debian 11 you should be able to install PHP 7.4 from the apt packages directly: + +```bash +sudo apt-get -y install php7.4 php7.4-cli php7.4-common php7.4-curl php7.4-dev php7.4-gd php7.4-imap php7.4-json php7.4-mbstring php7.4-opcache php7.4-xml php7.4-yaml php7.4-zip libapache2-mod-php7.4 php-pgsql php-redis php-xdebug unzip +``` + +If you're running Debian 10, the repository for the PHP 7.4 packages needs to be installed first: + +```bash +sudo apt-get -y install lsb-release apt-transport-https ca-certificates +sudo wget -O /etc/apt/trusted.gpg.d/php.gpg https://packages.sury.org/php/apt.gpg +echo "deb https://packages.sury.org/php/ $(lsb_release -sc) main" | sudo tee /etc/apt/sources.list.d/php.list +sudo apt-get update +sudo apt-get -y install php7.4 php7.4-cli php7.4-common php7.4-curl php7.4-dev php7.4-gd php7.4-imap php7.4-json php7.4-mbstring php7.4-opcache php7.4-xml php7.4-yaml php7.4-zip libapache2-mod-php7.4 php-pgsql php-redis php-xdebug unzip +``` + +This will install a series of PHP configurations and mods in `/etc/php/7.4`, including: + +- A `mods-available` folder (from which everything is typically enabled by default) +- A configuration for PHP when run from Apache in the `apache2` folder +- A configuration for PHP when run from the command line - including when run via Drush - in the `cli` folder +- `unzip`, which is important for PHP’s zip module to function correctly despite it not being a direct dependency of the module. We will also need to unzip some things later, so this is convenient to have in place early in the installation process. + +## PostgreSQL 11 + +### Install PostgreSQL 11 + +PostgreSQL can generally be easily installed using your operating system’s package manager. It is typically sensible to install the version the system recognizes as up-to-date. We’re simply going to install the database software: + +```bash +sudo apt-get -y install postgresql +``` + +This will install: + +- A user at the system level named `postgres`; this will be the only user, by default, that has permission to run the `psql` binary and have access to Postgres configurations +- A binary executable at `/usr/bin/psql`, which anyone - even `root` - will get kicked out of the moment they run it, since only the `postgres` user has permission to run any Postgres commands +- A series of configurations that live in `/etc/postgresql/11/main` which can be used to modify how PostgreSQL works. + +### Configure Postgresql 11 For Use With Drupal + +A modification needs to be made to the PostgreSQL configuration in order for Drupal to properly install and function. This change can be made to the main configuration file at `/etc/postgresql/11/main/postgresql.conf`: + +**Before**: +> 558 | #bytea_output = ‘hex’ # hex, escape + +**After**: +> 558 | bytea_output = ‘escape’ + +(Remove the "# hex, escape" comment and change the value from "hex" to "escape") + +The `postgresql` service should be restarted to accept the new configuration: + +```bash +sudo systemctl restart postgresql +``` diff --git a/docs/technical-documentation/adding-format-jsonld.md b/docs/technical-documentation/adding-format-jsonld.md new file mode 100644 index 000000000..2dd5165e6 --- /dev/null +++ b/docs/technical-documentation/adding-format-jsonld.md @@ -0,0 +1,27 @@ +Drupal requires the use of a `_format` query parameter to get alternate representations of a node/media. + +By default, Islandora deploys with the [jsonld](https://github.com/Islandora/jsonld) module and the [Milliner](https://github.com/Islandora/Crayfish/tree/main/Milliner) microservice. These two components are configured to strip this `_format` query parameter off of the end of URIs. + +This means that when your content is indexed in Fedora, the triplestore, etc... it's URI will +be something like `http://localhost:8000/node/1` and not `http://localhost:8000/node/1?_format=jsonld`. + +## Pre-1.0 installations. + +If you are using a __very__ early version of Islandora "8" (pre-release), then you may have URIs with `_format=jsonld` at the end of them. + +If you update to newer code, you will need to ensure that your site is configured to add `?_format=jsonld` +back to the URLs if you want to maintain consistency. + +If you **don't** do this, you can end up with two copies of your objects in your Fedora repository (one with and one without `?_format=jsonld`). You will also have two sets of triples in your triplestore. + +## Adding ?_format=jsonld to your URIs + +To turn the `?_format` parameter back on: + +- Go to `admin/config/search/jsonld` and confirm the *"Remove jsonld parameter from @ids"* checkbox is **unchecked**. +- Add `strip_format_jsonld: false` to your Milliner config. If you deployed using the default Islandora-playbook this file would be located at `/var/www/html/Crayfish/Milliner/cfg/config.yaml`. + +If you are using [Islandora-playbook](https://github.com/Islandora-Devops/Islandora-playbook) and are provisioning new environments for your older Islandora, you'll want to lock down the variables in your inventory that control this config. + +- `crayfish_milliner_strip_format_jsonld: true` +- `webserver_app_jsonld_remove_format: 1` diff --git a/docs/technical-documentation/adding_format_jsonld.md b/docs/technical-documentation/adding_format_jsonld.md index 2dd5165e6..ce4fe39be 100644 --- a/docs/technical-documentation/adding_format_jsonld.md +++ b/docs/technical-documentation/adding_format_jsonld.md @@ -1,27 +1 @@ -Drupal requires the use of a `_format` query parameter to get alternate representations of a node/media. - -By default, Islandora deploys with the [jsonld](https://github.com/Islandora/jsonld) module and the [Milliner](https://github.com/Islandora/Crayfish/tree/main/Milliner) microservice. These two components are configured to strip this `_format` query parameter off of the end of URIs. - -This means that when your content is indexed in Fedora, the triplestore, etc... it's URI will -be something like `http://localhost:8000/node/1` and not `http://localhost:8000/node/1?_format=jsonld`. - -## Pre-1.0 installations. - -If you are using a __very__ early version of Islandora "8" (pre-release), then you may have URIs with `_format=jsonld` at the end of them. - -If you update to newer code, you will need to ensure that your site is configured to add `?_format=jsonld` -back to the URLs if you want to maintain consistency. - -If you **don't** do this, you can end up with two copies of your objects in your Fedora repository (one with and one without `?_format=jsonld`). You will also have two sets of triples in your triplestore. - -## Adding ?_format=jsonld to your URIs - -To turn the `?_format` parameter back on: - -- Go to `admin/config/search/jsonld` and confirm the *"Remove jsonld parameter from @ids"* checkbox is **unchecked**. -- Add `strip_format_jsonld: false` to your Milliner config. If you deployed using the default Islandora-playbook this file would be located at `/var/www/html/Crayfish/Milliner/cfg/config.yaml`. - -If you are using [Islandora-playbook](https://github.com/Islandora-Devops/Islandora-playbook) and are provisioning new environments for your older Islandora, you'll want to lock down the variables in your inventory that control this config. - -- `crayfish_milliner_strip_format_jsonld: true` -- `webserver_app_jsonld_remove_format: 1` +This content has been moved to [adding-format-jsonld.md](adding-format-jsonld.md). diff --git a/docs/technical-documentation/alpaca-tips.md b/docs/technical-documentation/alpaca-tips.md new file mode 100644 index 000000000..acf0fa15b --- /dev/null +++ b/docs/technical-documentation/alpaca-tips.md @@ -0,0 +1,113 @@ +# Alpaca Tips + +[Alpaca](https://github.com/Islandora/Alpaca) is event-driven middleware based on [Apache Camel](https://camel.apache.org/) for Islandora + +Currently, Alpaca ships with four event-driven components + +- [islandora-connector-derivative](#islandora-connector-derivative) +- [islandora-http-client](#islandora-http-client) +- [islandora-indexing-fcrepo](#islandora-indexing-fcrepo) +- [islandora-indexing-triplestore](#islandora-indexing-triplestore) + +## islandora-connector-derivative +This service receives requests from Drupal when it wants to create derivatives and passes that request along to a microservice in [Crayfish](https://github.com/Islandora/Crayfish). When it receives the derivative file back from the microservice, it passes the file back to Drupal. + +## islandora-http-client +This service overrides the default http client with Islandora specific configuration. + +## islandora-indexing-fcrepo +This service receives requests from Drupal in response to write operations on entities. These requests are passed along to [Milliner](https://github.com/Islandora/Crayfish/tree/dev/Milliner) microservice in [Crayfish](https://github.com/Islandora/Crayfish) to convert Drupal entities into Fedora resources and communicate with Fedora (via [Chullo](https://github.com/Islandora/chullo)). + +## islandora-indexing-triplestore +This service receives requests from Drupal on indexing and deleting in order to persist/delete content in the triplestore. + + +## Steps for developing with Alpaca +Alpaca now runs as a single executable jar which can enable none, some or all of the available services. + +To develop your own module, start by cloning the Alpaca code base. + +Then create a new directory (for example `my-new-module`) along side the `islandora-indexing-fcrepo`, `islandora-indexing-triplestore` directories + +Add your new directory to the `settings.gradle` file, following the pattern of the others. +```shell + include ':islandora-support' + include ':islandora-indexing-triplestore' + include ':islandora-indexing-fcrepo' + include ':islandora-connector-derivative' + include ':islandora-http-client' + include ':islandora-alpaca-app' ++ include ':my-new-module' + + project(':islandora-alpaca-app').setProjectDir("$rootDir/islandora-alpaca-app" as File) + project(':islandora-support').setProjectDir("$rootDir/islandora-support" as File) + project(':islandora-indexing-triplestore').setProjectDir("$rootDir/islandora-indexing-triplestore" as File) + project(':islandora-indexing-fcrepo').setProjectDir("$rootDir/islandora-indexing-fcrepo" as File) + project(':islandora-connector-derivative').setProjectDir("$rootDir/islandora-connector-derivative" as File) + project(':islandora-http-client').setProjectDir("$rootDir/islandora-http-client" as File) ++ project(':my-new-module').setProjectDir("$rootDir/my-new-module" as File) +``` + +You can explore the `islandora-indexing-fcrepo` module to see the pattern to develop your own module. + +This module contains three classes. + +You can ignore the `CommonProcessor` class, that is just some processing that is split out for reusability. + +The first class is the `FcrepoIndexer`, this class extends the Apache Camel `RouteBuilder` and requires a `configure` method which defines the processing elements of your workflow. This is the Camel "route". + +The second class is the `FcrepoIndexerOptions`, this class extends the Alpaca `PropertyConfig` base class which gets common configuration parameters into your module. It also contains any custom configuration parameters needed for your route. + +Lastly it uses the `@Conditional(FcrepoIndexerOptions.FcrepoIndexerEnabled.class)` to define when this module is enabled. + +`FcrepoIndexerOptions.FcrepoIndexerEnabled.class` refers to the static inner class. + +This class is inside of `FcrepoIndexerOptions` and works like this: +``` +[1] static class FcrepoIndexerEnabled extends ConditionOnPropertyTrue { +[2] FcrepoIndexerEnabled() { +[3] super(FcrepoIndexerOptions.FCREPO_INDEXER_ENABLED, false); +[4] } +[5] } +``` +Line 1 extends the class that will register (enable) this module when a defined property is "TRUE" + +Line 2 is the constructor for this static class + +Line 3 passes to the parent constructor two things. + +1. the property name to check for enabling this module. +2. the default value to use if the property (above) is not found. + +So in this case we check for the property `fcrepo.indexer.enabled` and if we don't find it, we pass `false`. So this module is assumed to be "off" unless the property `fcrepo.indexer.enabled=true` is located. + +The last thing is to add your new module to the `islandora-alpaca-app` `build.gradle` file as a dependencies, like the existing modules. +i.e. +``` +dependencies { + implementation "info.picocli:picocli:${versions.picocli}" + implementation "org.apache.camel:camel-spring-javaconfig:${versions.camel}" + implementation "org.slf4j:slf4j-api:${versions.slf4j}" + implementation "org.springframework:spring-context:${versions.spring}" + implementation project(':islandora-support') + implementation project(':islandora-connector-derivative') + implementation project(':islandora-indexing-fcrepo') + implementation project(':islandora-indexing-triplestore') ++ implementation project(':my-new-module') + + runtimeOnly "ch.qos.logback:logback-classic:${versions.logback}" + +} +``` + +Finally from the top-level directory of Alpaca execute +``` +./gradlew clean build shadowJar +``` + +This tells Gradle to clean the modules, then build the modules and finally create a single jar with all needed code (the shadow jar). + +The final executable jar is: +``` +/islandora-alpaca-app/build/libs/islandora-alpaca--all.jar +``` diff --git a/docs/technical-documentation/resizing-vm.md b/docs/technical-documentation/resizing-vm.md new file mode 100644 index 000000000..ba7237c7b --- /dev/null +++ b/docs/technical-documentation/resizing-vm.md @@ -0,0 +1,30 @@ +## Resize vagrant machine +To expand virtual machine's hard drive for testing of larger files. Once the VM has started, you'll need to `halt` the VM, download and run the script, tell it what size (in MB) and then start the VM. +The last step `vagrant ssh --command "sudo resize2fs /dev/sda1"` is a check. It should return there was nothing to do. If you already provisioned you VM you can skip the 2 steps with provisioning in them. + +```shell +# Skip this if you VM is already provisioned. +$ vagrant up --no-provision <-- Exclude if already running and provisioned. + +$ vagrant halt + +# Download and run. This will default to the correct name (just press enter) then give the size. +# Example: `350000` is equal to 350GB + +$ wget https://gist.githubusercontent.com/DonRichards/6dc6c81ae9fc22cba8d7a57b90ab1509/raw/45017e07a3b93657f8822dfbbe4fc690169cdabc/expand_disk.py +$ chmod +x expand_disk.py +$ python expand_disk.py +$ vagrant up --no-provision + +# This step isn't needed but acts as a check to verify it worked. +$ vagrant ssh --command "sudo resize2fs /dev/sda1" + +# Skip this if you VM is already provisioned. +$ vagrant provision <-- Exclude if already provisioned. +``` + +### Troubleshooting expand_disk.py +You may need to remove the "resized" version. Assuming your VM location is `~/VirtualBox\ VMs` +```shell +$ rm -rf ~/VirtualBox\ VMs/Islandora\ CLAW\ Ansible_resized +``` diff --git a/docs/technical-documentation/updating-drupal.md b/docs/technical-documentation/updating-drupal.md new file mode 100644 index 000000000..976409976 --- /dev/null +++ b/docs/technical-documentation/updating-drupal.md @@ -0,0 +1,74 @@ +# Updating Drupal + +## Introduction + +This section describes how to update Drupal and its modules using Composer. If you installed Islandora using the Islandora Playbook or ISLE, then your Drupal was installed by Composer, so it is best practice to continue using Composer for updates. The method on this section is not specific to Islandora, and does not (yet) include how to update Islandora Features. + +!!! tip "How to upgrade Drupal in ISLE" + For specific instructions on how to upgrade Drupal core and the Drupal modules installed within ISLE, please refer to the documentation page: [Maintaining Your Drupal Site](https://islandora.github.io/documentation/installation/docker-maintain-drupal/) + +### What is Composer +It is recommended by Drupal.org and the Islandora community to use Composer with Drupal for various tasks. + +"[Composer](https://getcomposer.org/) is a [dependency manager](https://en.wikipedia.org/wiki/Package_manager) for PHP. Drupal core uses Composer to manage core dependencies like Symfony components and Guzzle." [[Source](https://www.drupal.org/docs/develop/using-composer/using-composer-with-drupal)] + +## Always create backs ups (DB and files) before updating + +**Before updating either Drupal core or Drupal modules:** + +* Back up both your files and database. Having a complete backup makes it easy to revert to the prior version if the update fails. +* Optionally, if you made manual modifications to files like .htaccess, composer.json, or robots.txt, copy them somewhere easy to find. Because after you've installed the new Drupal core, you will need to re-apply the changes. For example, Acquia Dev Desktop places a .htaccess file in the top-level directory and without it, only the homepage on your site will work. + +**Warning:** Always revert to a backup if you get a fatal error in the update process. + +## Updating Drupal Core +Over time new versions of Drupal “core” are released, and Islandora users are encouraged to install official Drupal core updates and security patches. On the other hand “alpha” and “beta" versions of Drupal core should only be installed by advanced users for testing purposes. + +The Islandora community STRONGLY recommends that the "Composer" method of upgrading Drupal core be used with Islandora as mentioned [here](https://www.drupal.org/docs/8/update/update-core-via-composer). + +### Here is an overview of the steps for updating Drupal core using Composer + +!!! note "Back Up" + First make sure you have made database and file back ups. + +1) First, verify that an update of Drupal core actually is available: + +`composer outdated "drupal/*"` + +If there is no line starting with drupal/core, Composer isn't aware of any update. If there is an update, continue with the commands below. + + +2) Assuming you are used to updating Drupal and know all the precautions that you should take, the update is as simple as: + +`composer update drupal/core --with-dependencies` + +If you want to know all packages that will be updated by the update command, use the --dry-run option first. + +!!! note "Alternate syntax for Islandora 8 needed" + If you are running the older Islandora 8 codebase that predates the Islandora 2 release, note that Islandora 8 is configured to use a fork of drupal-composer/drupal-project which requires this specific composer syntax compared to other Drupal 8+ sites: + + `composer update drupal/core webflo/drupal-core-require-dev "symfony/*" --with-dependencies` + + In addition, if you are upgrading from 8.5 to 8.7, you need to replace "~8.5.x" with "^8.7.0" for drupal/core and webflo/drupal-core-require-dev in composer.json. [[Source](https://www.drupal.org/docs/8/update/update-core-via-composer#s-one-step-update-instruction)] + +3) Apply any required database updates using ``drush updatedb``, or use the web admin user interface. + +`drush updatedb` + +4) Clear the cache using drush ``cache:rebuild``, or use the web admin user interface. + +`drush cache:rebuild` + +For stepwise update instructions visit this page: +https://www.drupal.org/docs/8/update/update-core-via-composer#s-stepwise-update-instructions + +## Updating Drupal Modules + +Islandora uses several general Drupal modules and some specialized Islandora Drupal modules, and over time new versions of these modules are released. There are two approaches to updating Drupal modules in Islandora: using Composer or updating modules individually. Islandora uses Composer to determine which Drupal module versions should be installed for each release of Islandora. Therefore if you update the Islandora specific Drupal modules using Composer you will also update any dependent general Drupal modules as well. The second method is to individually update Drupal modules. + +For more information about how to update Drupal modules visit: + +https://www.drupal.org/docs/8/extending-drupal-8/updating-modules + +!!! note "Back Up" + First make sure you have made database and file back ups. diff --git a/docs/tutorials/create-update-views.md b/docs/tutorials/create-update-views.md new file mode 100644 index 000000000..3f06d47d1 --- /dev/null +++ b/docs/tutorials/create-update-views.md @@ -0,0 +1,84 @@ +# Create or Update a View + +## Overview + +Views are powerful content filters that enable you to present Islandora (and other) content in interesting and exciting ways. For more documentation on views: + +- [Drupal.org documentation on Views](https://www.drupal.org/docs/8/core/modules/views) + + +## Before you start + +- The following How-To assumes that you are using the (optional) **[Islandora Starter Site](https://github.com/Islandora/islandora-starter-site)** configuration. This configuration is deployed automatically if you build your Islandora site using the [Ansible Playbook](../installation/playbook), [ISLE with Docker-Compose](../installation/docker-compose), or are using the [sandbox or a Virtual Machine Image](https://sandbox.islandora.ca/) +- This How-To assumes familiarity with Drupal terms such as [Node](https://www.drupal.org/docs/7/nodes-content-types-and-fields/about-nodes), [Content Type](https://www.drupal.org/docs/7/nodes-content-types-and-fields/working-with-content-types-and-fields-drupal-7-and-later), and [Media](https://www.drupal.org/docs/8/core/modules/media). + +## How to modify an existing view + +Islandora Starter Site ships with some views already created and turned on. The Islandora home page displays content items that have been added to Islandora. This view is named _Frontpage,_ and it lists items that meet the following _filter criteria_: + +- The item is in the _published_ state. +- The checkbox Promoted to front page is selected. + +This view will display all content items added to Islandora, as the checkbox _Promoted to front page_ is on by default. + +As you develop your Islandora Website it is likely that you will need to change the default behaviour of the _Frontpage_ View. As an example, the following describes how to edit the _Frontpage_ page view to only show content items and not collections. + +For this example, we added six collection items to Islandora. In total there are eight items in the repository. In addition to the six collection items, there is one audio item and one image item. + +1. Using your Web browser, open the Islandora front page. +2. To edit the front page view, hover over the view (_Frontpage_ view) and select **Edit view** when displayed. + + ![Frontpage view](../assets/frontpage_view_all_eight.png) + +3. Select **Add** under the _filter criteria_ section. + + ![Frontpage view add filter](../assets/frontpage_view_add_filter.png) + +4. We do not want to display collections, so we need to add a _filter criteria_ that does not filter for the Islandora model type 'Collection'. + 1. Select _Model_ from the list and then **Apply (all displays**). + + ![Frontpage view filter select model](../assets/frontpage_view_add_filter_select_model.png) + + 2. Select _Islandora Model_ to select filters on Islandora model types and select **Apply and continue**. + + ![Frontpage view filter islandora model](../assets/frontpage_view_add_filter_select_model_islandora.png) + + 3. Select the operator _Is none of_ and the _Collection_ model (autocomplete should work here to help you). To finish click **Apply (all displays)**. + + ![Frontpage views filter collection](../assets/frontpage_view_add_filter_collection.png) + + 4. **Save** the view. Now the 'Frontpage' View does not display collections. + + ![Frontpage views no collections](../assets/frontpage_view_no_collections.png) + +## How to create a new view + +For this example, we create a new view that only shows collections. It will be created as a [Block](https://www.drupal.org/docs/core-modules-and-themes/core-modules/block-module/managing-blocks) (also see the tutorial on [Configuring Blocks](../tutorials/blocks.md)) that will only display on the front page. We will add the new collection list block below the existing frontpage view that lists items. + +1. Using your Web browser, open the Islandora front page +2. Navigate to **Administration** >> **Structure** >> **Views** +3. Create a new view by selecting **Add view** +4. Name the view and select **Create a block**. Give the block a title and decide how you want it to display (Grid, Table, List, Paging). To progress, select **Save and edit**. + + ![Frontpage view collection list information](../assets/frontpage_view_collection_list_info.png) +5. Customize the view format and sorting as required. +6. Add a _filter criteria_ to only show the Islandora model type of 'Collection' and **Save** the view. + + ![Frontpage view collection list details](../assets/frontpage_view_collection_list_details.png) +7. To place the view on front page, the new block must be added to the 'Main page content' area (using 'Block layout') and set to display on the front page. + 1. Navigate to **Administration** >> **Structure** >> **Block layout** (/admin/structure/block). Under _Main content_ select **Place block** + + ![Frontpage view collection list place block](../assets/frontpage_view_collection_list_place_block.png) + 2. Find the new block, 'Collection List' and select **Place block**. + 3. Restrict the block to only display on the frontpage by adding the text '' to the _Page_ vertical tab. Then select **Save block**. + + ![Frontpage view collection list place block configure](../assets/frontpage_view_collection_list_place_block_configure.png) + 4. Review the block placement and move if required. + + ![Frontpage view collection list block placement](../assets/frontpage_view_collection_list_block_placement.png) +8. The 'Collection list' now only displays on the front page. It displays below the _Main page content_. + + ![Frontpage view collection list](../assets/frontpage_view_collection_list.png) + +!!! Tip "Islandora Quick Lessons" + Learn more with videos on [Basic Views](https://youtu.be/Ge14g8nBUBQ) and [Advanced Views](https://youtu.be/inPRZeQGnKI). diff --git a/docs/tutorials/switch-homepage-to-twig.md b/docs/tutorials/switch-homepage-to-twig.md new file mode 100644 index 000000000..7336bd22c --- /dev/null +++ b/docs/tutorials/switch-homepage-to-twig.md @@ -0,0 +1,63 @@ +# Format Homepage with TWIG + +## TWIG Debugging +It's helpful to identify which TWIGs are available to use and where they're stored but not required use TWIGs to format the homepage. + +![Screenshot from 2021-12-10 11-22-31](https://user-images.githubusercontent.com/2738244/145607034-967cc164-9d24-4f6d-aac7-9b3b93c87c4e.png) + +```shell +# Copy the default service. +$ cp web/sites/default/default.services.yml web/sites/default/services.yml + +# fix permissions (just in case) +$ chown nginx:nginx web/sites/default/services.yml + +# I use nano to edit but you can pick whichever editor you want. +# For this example we'll install the editor +$ apk add nano + +# Now open the newly created service file and set these 3 values under the TWIG config section. +$ nano web/sites/default/services.yml + +...yml +twig.config: + debug: true + auto_reload: true + cache: true + + # Now save and exit (in NANO it's CTRL + x) + +``` +For a video tutorial on this, see [Enabling Twig Debugging in Drupal 8/9](https://youtu.be/6WMr5V_LQ1w) + +## Copying Templates +Copy the default TWIG into your theme's template directory. + +```shell +$ cp web/themes/contrib/bootstrap/templates/node/node.html.twig web/themes/contrib/solid/templates/node--6--full.html.twig + +# Clear cache +$ drush cr +``` +And now if you view the home page's source code you should now see the `X` next to the loaded TWIG file. Please note that the file name corresponds to the node number. To use the URL alias instead of the node ID requires additional work. [Here](https://www.lehelmatyus.com/1064/drupal-8-page-template-suggestion-by-path-alias)'s a tutorial on this topic. +```html + +``` + +Now edit the TWIG file (web/themes/contrib/solid/templates/node--6--full.html.twig) to say whatever you want, and it should show up immediately without needing to clear cache. + +## Clean up +Don't forget to turn off TWIG debugging in config file (web/sites/default/services.yml). This will likely have unexpected consequences on production system performance. + +```yml +twig.config: + debug: false + auto_reload: false +``` \ No newline at end of file diff --git a/docs/user-documentation/content-models.md b/docs/user-documentation/content-models.md new file mode 100644 index 000000000..4314bb96d --- /dev/null +++ b/docs/user-documentation/content-models.md @@ -0,0 +1,208 @@ +# Content models in Islandora + +## Resource Nodes + +This section describes the Islandora concept of a Resource Node. For a step-by-step demonstration, see the tutorial [Create a resource node](../tutorials/create-a-resource-node.md). + +A resource node holds the descriptive metadata for an Islandora object, as well as groups together +the various files that are part of the object for preservation or display, such as the original file + and various derivative files generated from it. + +The model for exactly what constitutes an object in Islandora is flexible and can be adapted to the needs of specific users. For example, the Islandora Starter Site configuration considers an object as a resource node of the type "Repository Item" which contains descriptive metadata about the object. Attached to that Node are one or more Media, each representing a file that is part of this object, such as "Original File", "Thumbnail", "Preservation Master", etc. With this model, every original file uploaded into Islandora has its own resource node. + +Multi-file Media configurations also attach Media to a parent node, but allow for that node to be represented by multiple "Original File"s. In this model, a Media contains the original file as well as any derivative files created from it (thumbnail, service file, etc.). + +For an example of where these two different approaches could apply, the basic configuration might make sense for a book that has rich page-level metadata, so that each page would be its own Node with its own metadata record; the multi-file media configuration might be a better fit for a book that does not have page-level metadata (except an ordering or page numbers), so that each Media would represent one page, and all pages (Media) would be attached to a single parent Node/metadata record for the entire book. + + +As we learned in the [introduction](user-intro.md), objects in an Islandora repository are +represented as a combination of resource nodes, media, and files in Drupal. +Because of this, their metadata profile, display, form (and much more) are configurable through +the Drupal UI. This gives repository administrators a huge degree of control over their repository +without any need for coding. Much more so than ever before. And since we're using a core Drupal +solution for modeling our resource nodes and media, compatibility with third-party modules is virtually guaranteed. +This opens up a plethora of solutions from the Drupal community that will save you untold time +and effort when implementing your repository with Islandora. + +### Properties + +Resource nodes, as Drupal nodes, have some common basic properties regardless +of content type. These properties are not fields. This means that they +cannot be removed and have limited configurability. Their name, what type of +data they hold, etc... are all baked in. +Here's an example of the basic properties on nodes: + +``` +nid: 1 +uid: 1 +title: "I am an Islandora object" +created: 1550703004 +changed: 1550703512 +uuid: 02932f2c-e4c2-4b7e-95e1-4aceab78c638 +type: islandora_object +status: 1 +``` + +As you can see, it's all system data used at the Drupal level to track the basics. + +Property | Value +------------ | ------------- +nid | The local ID for the node +uid | The ID of the Drupal user who created the node +uuid | The global ID for any entity +title | The title for the node +created | Timestamp of when the node was created +changed | Timestamp of when the node was last updated +type | Content type (e.g. which group of fields are present on the node) +status | Published, unpublished, etc... + +!!! note "Compared to Islandora Legacy" + These node properties are analogous to following Islandora Legacy object properties: + + Islandora Legacy | Islandora + ----------- | ----------- + owner | uid + dc.title | title + PID | uuid + status | status + +The small amount of configurability available for these properties is found on the +content type editing form where a user can choose to change the label of the +title field, whether to display author information on the node's page, +etcetera. These settings will only apply to nodes of that particular content type. + +![The Repository Item content type edit form.](../assets/resource_nodes_repo_item_edit_form.png) + +To view all of a node's property and field values administrators can use the 'Devel' +tab's 'Load' section: + +![Screenshot of a Repository Item node's properties as seen on its 'Devel' tab.](../assets/resource_nodes_properties_devel.png) + +### Fields + +In addition to the basic node properties identified above, resource nodes (like all Drupal nodes) can have fields. +Most of what we would think of as descriptive metadata is stored as fields. Resource nodes use 'content types' to define a specific set of required and optional +fields it has; we can think of content types as metadata profiles for our objects. +For example, you might have a content type for a set of repository objects that have very specialized metadata requirements but +another content type for generic repository objects that share a more general set of metadata fields. +A resource node's content type is set on its creation and is immutable. +The section on [metadata](metadata.md) describes in more detail how fields on Islandora objects work. + +Configuring fields (adding, editing, removing) is usually done through the Manage > Content types interface, as is described in the tutorial, [Create/Update a Content Type](content_types.md). + +Islandora has a notion of a _content model_, which is used to identify what type of content is +being represented by a node (e.g. an image, a video, a collection of other items, etc...). This is done +using a special field, _Model_, which accepts taxonomy terms from the _Islandora Models_ vocabulary. +By applying a term from the Islandora Models vocabulary to a node, Islandora will become aware +of how to handle the node in response to certain events, like choosing a viewer or generating derivatives. + +![Model tags](../assets/resource_nodes_model_tags.png) + +!!! note "Compared to Islandora Legacy" + Content models in Islandora Legacy were immutable and contained restrictions as to what + types of datastreams could be associated with an object. Islandora imposes no such + restrictions. Content models can be changed at any time, and they in no way dictate what + types of media can be associated with a node. + +### Media + +All resource nodes can be linked to any number of media. The media associated with a resource node can be managed using the "Media" tab when viewing a node. Much like +the "Members" tab, Actions can be performed in bulk using the checkboxes and Actions dropdown. + +![Media tab](../assets/resource_nodes_media_tab.png) + +See [the media section](media.md) for more details. + +### Display modes + +Drupal uses "display modes" (also called "view modes") as alternative ways to present content to users. You may be familiar with the "full" and "teaser" versions of nodes, which are rendered using two corresponding display modes. Islandora makes use of display modes to control how media content is displayed. Islandora Starter Site provides two display modes for Media, one which renders the OpenSeadragon viewer and the other which renders the pdf.js viewer. These two display modes can be enabled by using "Display hints" in the node edit form, or you can configure Islandora to use a specific display mode for all media based on the file's Mime type. Both methods make use of [Contexts](context.md). + +To set the display mode on the resource node's edit form, select the display mode you want to use for that node in the _Display hints_ field: + +![Display hints](../assets/resource_nodes_display_hints.png) + +Due to the associated Context configurations ("OpenSeadragon" and "PDFjs Viewer") that are shipped with the Islandora Starter Site, the selected display mode will then be used when the resource node's page is rendered. + +At a global level, there are a couple of ways to tell Drupal to use the PDFjs viewer to render the content of the media field whenever the media has a Mime type of `application/pdf`. + +The first way is to edit the "PDFjs Viewer" Context. By default, this Context tells Drupal to use the PDFjs viewer if the node has the term "PDFjs" (yes, that's a taxonomy term): + +![Default PDFjs Context](../assets/resource_nodes_pdfjs_context_default.png) + +If you add the Condition "Node has Media with Mime type" and configure it to use `application/pdf` as the Mime type, like this: + +![PDFjs Context with Mimetype Condition](../assets/resource_nodes_pdfjs_context_with_mimetype.png) + +Context will use whichever Condition applies (as long as you don't check "Require all conditions"). That is, if the "PDFjs" display hint option in the node edit form is checked, *or* if the node's media has a Mime type of `application/pdf`, the media content will be rendered using the PDFjs viewer. + +The second way to use the media's Mime type to render its content with the PDFjs viewer is to create a separate Context that will detect the media's Mime type and use the configured display mode automatically. To do this, create a new Context. Add a "Node has Media with Mime type" condition and specify the Mime type, and then add a "Change View mode" Reaction that selects the desired display mode: + +![Display hints](../assets/resource_nodes_view_mode_context.png) + +Finally, save your Context. From that point on, whenever the media for a node has the configured Mime type, Drupal will render the media using the corresponding display mode. + +The node-level and global approaches are not exclusive to one another. One Context can override another depending on the order of execution. Contexts are applied in the order they are displayed on the Contexts page, which is editable through a drag-and-drop interface. Whichever Condition appears last in the list of Contexts between the node-level Condition (which in this case is the "Node has term" condition) the global Condition (which is "Node has Media with Mime type"), that one will override the other. An example of having the display mode specified in the node edit form intentionally override the display mode based on Mime type is to have media with the `image/jp2` mime-type configured to use the OpenSeadragon viewer, but to manually select the OpenSeadragon display mode for nodes with JPEG media (for example, a very large JPEG image of a map, where the OpenSeadragon's pan and zoom features would be useful). + +### Members + +Islandora has a notion of _membership_, which is used to create a parent/child relationship between +nodes. Membership is denoted using another special field, "Member Of". This is used to create the link +between members and their parent collection, pages and their book ("paged content"), or members of a +compound object and the compound object itself. + +Any two nodes can be related in this way, though typically, the parent node has a content +model of [_Collection_](../concepts/collection.md) or [_Paged Content_](paged-content.md) (see their respective pages for more details). +The "Member Of" field _can_ hold multiple references, so it is possible for a +single child to belong to multiple parents, but may also complicate the creation of breadcrumbs. + +!!! Note "Compared to Islandora Legacy" + In Islandora Legacy, there was a distinction between belonging to a collection and belonging to + a compound object. In Islandora, this distinction is not present. Since all nodes can have members + , essentially every node has the potential to be a compound object or collection. + +!!! Note "Child v. Member" + Islandora uses the "child" and "member" descriptor for resource nodes that + store a reference to another resource node in the "Member Of" field interchangeably. + Administrators will more often see the "member" terminology more often while + front-end users will usually see "child" terminology. + +For any node, its **Children** tab can be used to see all its members. You can also perform Actions in +bulk on members using the checkboxes and the Actions dropdown as well as clicking +on the **Reorder Children** tab to adjust the order in which they display. + +![Members tab](../assets/paged_content_reorder_children_button.png) + +### More information + +The following pages expand on the concepts discussed above: + +- [Media](media.md) +- Content Types: [Metadata](metadata.md#content-types) -- [Create / Update a Content Type](content_types.md) + +### Copyright and Usage + +This document was originally developed by [Alex Kent](https://github.com/alexkent0) and has been adapted for general use by the Islandora community. + +[![CC BY-NC 4.0](https://mirrors.creativecommons.org/presskit/buttons/88x31/svg/by-nc.svg)](https://creativecommons.org/licenses/by-nc/4.0/) + +[^1] In the Islandora Starter Site, this is the `field_model` field, which is populated by taxonomy terms in the `islandora_models` taxonomy vocabulary provided by the `islandora_core_feature` submodule of `Islandora/islandora` + +[^2] In the Islandora Starter Site, this is the `field_member_of` field. + +## Islandora Legacy Objects versus Islandora Resource Nodes + +The conventional Islandora Legacy definition of an object is a file loaded in the repository with associated derivatives. In Islandora Legacy, objects (video files, audio files, PDFs, etc.) are loaded through the user interface, and Datastreams are generated automatically. These consist of access and display copies, the metadata, OCH/HOCR, technical metadata, and more. All of these Datastreams are directly connected to the object and accessed through the admin interface. + +In Islandora, the traditional Islandora Legacy objects (video files, audio files, etc. that were represented in different content models) are now Drupal nodes. Islandora object nodes are a special kind of Drupal node, distinct from nodes that exist for other content types such as a blog post, an article, a page (like the About page on a site), and others. These Islandora objects are still loaded through the interface and described with the data entry form, and derivatives are still generated. However, the Datastreams are no longer connected to the original object in the same immutable way. Each of these Datastreams can be manipulated through Drupal by non-developers. You can create a variety of ways to view this metadata and information related to the objects. Doing so requires knowledge of Drupal 8, but this essentially means that there are many ways to view the metadata and access the related objects in Islandora. + +In Islandora it is therefore helpful to think of objects as resource nodes. The term reflects the new nature of objects in Islandora. A resource node does not just refer to the individual object file, but encompasses multiple elements that all relate to each other, even if they are no longer directly connected like objects in Islandora Legacy. + +The typical elements of a resource node: + +- A content type defining metadata fields defined for the node. A content type may include any number of custom fields defined to store descriptive metadata about the object represented by the node. To function as an Islandora resource node, a content type must define two further fields: + - A field denoting the 'type' of thing represented by the node (image, book, newspaper, etc.). The value of this field is used by Islandora to control views, derivative processing, and other behavior.[^1] + - A field in which to record the node's [membership](content_models.md#members) in another node. If populated, this field creates a hierarchical relationship between parent (the node recorded in the field) and child (the node in which the parent is recorded). This may be left empty, but is required for building hierarchies for collections, subcollections, and members of collections, as well as objects (books, "compound objects", etc.) consisting of [paged content](paged-content.md).[^2] +- Media files (the actual files of JPEGs, MP3s, .zip, etc.) that get loaded through the form +- Derivative files (thumbnails, web-friendly service files, technical metadata, and more) + +These resource nodes are what the librarian, student, archivist, technician, or general non-developer creates through the data entry form. It is possible to configure all elements of a resource node in Islandora through Drupal. This fact allows control over how one accesses the node and how nodes are displayed and discovered online by non-developers. It also allows a repository to take full advantage of all third-party Drupal modules, themes, and distributions available. diff --git a/docs/user-documentation/content-types.md b/docs/user-documentation/content-types.md new file mode 100644 index 000000000..e012c7087 --- /dev/null +++ b/docs/user-documentation/content-types.md @@ -0,0 +1,189 @@ +# Creating and updating content types + +## Overview + +Since metadata in Islandora is stored as fields in Nodes, the standard Drupal Content Types system provides our 'ingest forms'. For more information about Content Types in general, please see [Content Types in Drupal](https://www.drupal.org/docs/administering-a-drupal-site/managing-content-0/working-with-content-types-and-fields). If you are already familiar with Drupal Field UI, you’re already well-equipped to create and modify your own ingest forms in Islandora. + +This page will address how to create and modify ingest forms by editing fields and form display settings on Content Types via the graphical user interface (GUI). This page will also cover editing the RDF mapping to accommodate changes to fields. + +Islandora forms are Drupal forms, and for help working with forms via the API, please check out the _Further Reading_ section for links to more advanced Drupal documentation. + +## Before you start + +- The following How-To assumes that you are using the (optional) **[Islandora Starter Site](https://github.com/Islandora/islandora-starter-site)** configuration. This configuration is deployed automatically if you build your Islandora site using the [Ansible Playbook](https://www.islandora.ca/get-islandora), [ISLE with Docker-Compose](https://www.islandora.ca/get-islandora), or are using the [sandbox or a Virtual Machine Image](https://www.islandora.ca/get-islandora) +- This How-To assumes familiarity with Drupal terms such as [Node](https://www.drupal.org/docs/7/nodes-content-types-and-fields/about-nodes), [Content Type](https://www.drupal.org/docs/7/nodes-content-types-and-fields/working-with-content-types-and-fields-drupal-7-and-later), and [Media](https://www.drupal.org/docs/8/core/modules/media). + +## How to modify a Content Type + +If you have deployed your Islandora with the Islandora Starter Site configuration, you will already have a Repository Item content type available, with pre-configured fields and repository behaviours. + +1. In the Admin menu, go to **Structure** >> **Content Types** and find the _Repository Item_ content type. +1. Select *Manage Fields*. + +![a screenshot of the Add Content Type page](../assets/content_types_managefields.png) + +There are multiple tabs with different options to configure your Content Type: + +![a screenshot of the Add Content Type page](../assets/content_types_managefields_tabs.png) + +- _Manage Fields_: A list of the fields available in this form. This is where you can add new fields and make adjustments to existing fields, such as whether the field has access restrictions or is required. +- _Manage form display_: Set the order in which fields appear in a form, including nesting; set how the user will enter data into a field (i.e., text field, drop-down list, radio buttons, etc.); set fields to be hidden in the form. +- _Manage display_: Set how the data stored in the fields will be displayed on the Node. Custom display settings can be set for different "view modes." For instance, a different view mode is applied for items using the OpenSeadragon viewer, which includes a field that displays the Media in OpenSeadragon instead of the standard Drupal image viewer. + +!!! note "Changes not displaying?" + If you make changes under _Manage display_ and don't see them reflected in your Node, double check that you have edited the right _view mode_ +- _Devel_: This tab is generated by an optional module that is useful for development and troubleshooting; it can be ignored in this How-To. For more information, see [Devel](https://www.drupal.org/project/devel). + +### Add a field + +This example adds a new field where a user can indicate if the repository item needs to be reviewed: + +1. Click **Add Field** +1. In some cases an existing field may be available to use instead of creating a new one. The dropdown box labeled _Re-use an existing field_ has a list of available fields. For this example we will create a brand-new field. Since the example field is a “yes/no” decision (whether the item needs review or not), choose "Boolean" from the dropdown menu and give the Label field a name. [See the list of Drupal 8 FieldTypes, FieldWidgets, and FieldFormatters](https://www.drupal.org/docs/8/api/entity-api/fieldtypes-fieldwidgets-and-fieldformatters) for descriptions of the different types available by default. Additional modules, such as the controlled_access_terms module, can provide their own Field types to choose from as well. +1. Click **Save and continue.** +1. Next, configure how the field is stored in the Drupal database. For this field type you can select how many values will be allowed. The default settings, "Limited" in the dropdown box and "1" for the allowed number of values works for our example. +1. Click **Save field settings.** +1. Configure how the field is described (including its display label and the help text for when it appears on a form) and constraints on its use. In this screenshot, the field will be required for this Content Type, and will be set to “on” by default. In the _Default Value_ section, click the checkbox next to _Needs Review_ to indicate all new repository items need review by default. +1. Click **Save settings.** + +![a screenshot of the field settings page](../assets/content_types_fieldsettings.png) + +The new field has been added: + +![a screenshot of a "Needs Review?" field in the Drupal field UI](../assets/content_types_newfield.png) + +It appears in the ingest form when creating a new repository object. To test this, go to **Content** >> **Add content** >> **Repository item**: + +![a screenshot of a "Needs Review?" field appearing at the bottom of a new node form](../assets/content_types_newfieldinform.png) + +!!! note "RDF Mappings" + New fields, except for Typed Relation fields, are not automatically indexed in Fedora and the triple-store. Update the Content Type's RDF Mapping to enable indexing the field (see below). + +!!! note "Search" + New fields will not automatically be searchable. They need to be added to the Solr index configuration. See the ['Setup and Configure Search'](searching.md) page for more information. + +!!! note "Context" + To add new behavior based on the results of this new field, check out [Context](context.md). + +### Change the form display + +To change where in the form a field is displayed, go to the Admin menu, return to **Structure** >> **Content Types**, and find the _Repository Item_ content type again. Select _Manage form display_ from the dropdown menu or select the _Manage form display_ tab. + +1. All the fields in this content type are available, in a list, with a simple drag-and-place UI. Drag the new field to the top of the form. You can also change the way the Boolean options are displayed, with radio buttons as opposed to a single checkbox. Different display options will be available from the dropdown menu depending on field type. For more information, please check out [List of Drupal 8 FieldTypes, FieldWidgets, and FieldFormatters](https://www.drupal.org/docs/8/api/entity-api/fieldtypes-fieldwidgets-and-fieldformatters) +1. Click **Save**. + +When creating a new Repository Item, the new field appears at the top, as a set of radio buttons. + +### Change the content display + +Finally, change how the results of this example field are displayed. Initially the new field shows up at the bottom of repository object pages: + +![a screenshot of a "Needs Review?" field in the node display](../assets/content_types_fieldindisplay.png) + +In the Admin menu, return to **Structure** >> **Content Types** and find the _Repository Item_ content type again. Select _Manage display_ from the dropdown menu or select the _Manage display_ tab. + +1. Find the new field. You can change how the field title or label is displayed. +1. Click the dropdown menu to choose from inline/above/hidden/visually hidden. + - You can also replace the options displayed with variations on a binary choice. Click the gear to choose from the following: _On/Off_, _Yes/No_, _Enabled/Disabled_, _1/0_, _checkmark/X_, or hide the field completely. + - You can also drag the field into the _Disabled_ section so that neither its label nor its contents appear in the display, although the field is saved on the Node. +1. Drag the field to "Disabled" and save. +1. The contents of the field are no longer displayed on the Node, but it is available when editing the node. + +## Create a Content Type + +To create your own custom content type from scratch, please refer to [this guide](https://www.drupal.org/docs/8/administering-drupal-8-site/managing-content-0/create-a-custom-content-type) on Drupal.org. + +Your custom content types can contain whatever fields you like, but there are two mandatory fields that all Islandora content types should contain: + +1. In order for a custom content type to be considered an Islandora Object, it needs to have the field "Member of" ('field_member_of'). This allows it to be included in contexts that have the "Node is an Islandora node" condition. Nodes that have this field will automatically be synced to Fedora and indexed by the triple store if you are using the context provided by the Islandora Starter Site. Having this field present in your content type also gives you tabs for adding children and media when viewing an item of that content type. + +2. The other mandatory field is "Model" ('field_model'). This is used in several of the contexts that the Islandora Starter Site provides. This field determines how Islandora objects are displayed, and how media derivatives are created. + +## Updating and creating an RDF Mapping + +RDF mapping aligns Drupal fields with RDF ontology properties. For example, the title field of a content model can be mapped to `dcterms:title` and/or `schema:title`. In Islandora, triples expressed by these mappings get synced to Fedora and indexed in the Blazegraph triplestore. RDF mappings are defined/stored in Drupal as a [YAML](https://yaml.org/) file (to learn more about YAML, there are [several tutorials on the web](https://duckduckgo.com/?q=yaml+tutorial). Currently, Drupal 8 does not have a UI to create/update RDF mappings to ontologies other than Schema.org. This requires repository managers to update the configuration files themselves. Consider using the RDF mappings included in the [Islandora Starter Site](https://github.com/Islandora/islandora-starter-site) as templates by copying and modifying one to meet your needs. + +The Drupal 8 Configuration Synchronization export (e.g. `http://localhost:8000/admin/config/development/configuration/single/export`) and import (e.g. `http://localhost:8000/admin/config/development/configuration/single/import`) can be used to get a copy of the mappings for editing in a text editor before being uploaded again. Alternatively, a repository manager can update the configuration on the server and use [Features](https://www.drupal.org/project/features) to import the edits. + +An RDF mapping configuration file has two main areas: the mapping's metadata and the mapping itself. Most of the mapping's metadata should be left alone unless you are creating a brand-new mapping for a new Content Type or Taxonomy Vocabulary. A _partial_ example from [islandora_default's islandora_object (Repository Item)](https://github.com/Islandora/islandora-starter-site/blob/main/config/sync/rdf.mapping.node.islandora_object.yml) is included below: + +``` +langcode: en +status: true +dependencies: + config: + - node.type.islandora_object + enforced: + module: + - islandora_demo + module: + - node +id: node.islandora_object +targetEntityType: node +bundle: islandora_object +types: + - 'pcdm:Object' +fieldMappings: + title: + properties: + - 'dc:title' + field_alternative_title: + properties: + - 'dc:alternative' + field_edtf_date: + properties: + - 'dc:date' + datatype_callback: + callable: 'Drupal\controlled_access_terms\EDTFConverter::dateIso8601Value' + field_description: + properties: + - 'dc:description' +``` + +The required mapping metadata fields when creating a brand-new mapping include the `id`, `status`, `targetEntityType`, and `bundle`. (`uuid` and `_core`, not seen in the example above but may be present in exported copies, will be added by Drupal automatically.) `bundle` is the machine name for the Content Type or Taxonomy Vocabulary you are creating the mapping for. `targetEntityType` is `node` for Content Types or `taxonomy_term` for Taxonomy Vocabularies. The `id` configuration is a concatenation of target entity type and bundle ('node' and 'islandora_object' in the example above). The `id` is also used to name the configuration file: e.g. `rdf.mapping.node.islandora_object.yml` is `rdf.mapping.` plus the id (`node.islandora_object`) and then `.yml`. + +The mapping itself consists of the `types`' and `fieldMappings` configurations. + +All the mappings use RDF namespaces instead of fully-qualified URIs. For example, the type for islandora_object is entered in the RDF config as `pcdm:Object` instead of `http://pcdm.org/models#Object`. The available namespaces are defined in module hooks (hook_rdf_namespaces) but can also be entered manually in a configuration interface. Repository managers wanting to add additional namespaces need to go to Configuration > Search and Metadata > JSONLD and enter their desired namespaces in the "Additional RDF Namespaces" box. + +Namespaces currently supported (ordered by the module that supplies them) include: + +- rdf + - content: http://purl.org/rss/1.0/modules/content/ + - dc: http://purl.org/dc/terms/ + - foaf: http://xmlns.com/foaf/0.1/ + - og: http://ogp.me/ns# + - rdfs: http://www.w3.org/2000/01/rdf-schema# + - schema: http://schema.org/ + - sioc: http://rdfs.org/sioc/ns# + - sioct: http://rdfs.org/sioc/types# + - skos: http://www.w3.org/2004/02/skos/core# + - xsd: http://www.w3.org/2001/XMLSchema# +- islandora + - ldp: http://www.w3.org/ns/ldp# + - dc11: http://purl.org/dc/elements/1.1/ + - nfo: http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/v1.1/ + - ebucore: http://www.ebu.ch/metadata/ontologies/ebucore/ebucore# + - fedora: http://fedora.info/definitions/v4/repository# + - owl: http://www.w3.org/2002/07/owl# + - ore: http://www.openarchives.org/ore/terms/ + - rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# + - islandora: http://islandora.ca + - pcdm: http://pcdm.org/models# + - use: http://pcdm.org/use# + - iana: http://www.iana.org/assignments/relation/ +- islandora-starter-site + - relators: http://id.loc.gov/vocabulary/relators/ +- controlled_access_terms + - wgs84_pos: http://www.w3.org/2003/01/geo/wgs84_pos# + - org: https://www.w3.org/TR/vocab-org/#org: + - xs: http://www.w3.org/2001/XMLSchema# + +The `types` corresponds to the `rdf:type` predicate (which corresponds to JSON-LD's `@type`) and can have multiple values. This type value will be applied to every node or taxonomy term using the mapped content type or vocabulary. + +In some cases a repository may want a node or taxonomy term's `rdf:type` to be configurable. For example, the Corporate Body Vocabulary (provided by the Controlled Access Terms Default Configuration module) has `schema:Organization` set as the default type in the RDF mapping. However, more granular types may apply to one organization and not another, such as `schema:GovernmentOrganization` or `schema:Corporation`. The `alter_jsonld_type` Context reaction allows Content Types and Taxonomy Vocabularies to add a field's values as `rdf:types` to its JSON-LD serialization (the format used to index a node or taxonomy term in Fedora and the triple-store). + +`fieldMappings` specifies the fields to be included, their RDF property mappings, and any necessary data converters (the `datatype_callback`). One field can be mapped to more than one RDF property by adding them to the field's properties list. The `datatype_callback` is defined by the 'callable' key and the fully qualified static method used to convert it to the desired data format. For example, fields of the Drupal datetime type need to be converted to ISO 8601 values, so we use the `Drupal\rdf\CommonDataConverter::dateIso8601Value` function to perform the conversion. + +!!! Tip "Islandora Quick Lessons" + Learn more with this video on [Customizing a Form](https://youtu.be/tOW27DZY9hs). diff --git a/docs/user-documentation/file-viewers.md b/docs/user-documentation/file-viewers.md new file mode 100644 index 000000000..a2b1d96ef --- /dev/null +++ b/docs/user-documentation/file-viewers.md @@ -0,0 +1,67 @@ +# File Viewers + +## What are viewers? + +[Viewers](../user-documentation/glossary#viewer) allow site builders to display files in interactive JavaScript-based widgets, that provide functionality like zooming in/out, turning pages, playing/pausing, viewing in full screen, etc. + +In Drupal, a common way to implement a viewer is through a [module](glossary.md#module) that provides a Drupal field formatter that interfaces with the appropriate JavaScript library. The field formatter will work with specific types of Drupal fields (e.g. file fields or image fields, some may even provide their own fields). Some viewer modules in Islandora also provide a block, that can display appropriate files based on the context. + +Viewers that are known to work with Islandora include: + +* [OpenSeadragon](https://openseadragon.github.io/), via the Drupal module [OpenSeadragon](https://github.com/Islandora/openseadragon) (maintained by the Islandora community). +* [Mirador](https://projectmirador.org/), via the Drupal module [Islandora Mirador](https://github.com/Islandora/islandora_mirador/) (maintained by the Islandora community). +* [pdf.js](https://github.com/mozilla/pdf.js), via the Drupal contrib module [PDF](https://www.drupal.org/project/pdf) +* Islandora Image, via the Islandora module +* Audio with captions, via the Islandora module +* Video with captions, via the Islandora module + + +## Configuring Field Formatters as Viewers + +The simplest Drupal-y way of making a viewer appear is to configure a Media to render. You can do this by configuring a View Mode that shows the desired file field, displayed in a field formatter that invokes the desired viewer. + +In the Starter Site: + +1. On all Media Types, there is a "Source" view mode which is configured to show only the main ("source") file of that Media in a reasonable default viewer. +1. By default, on a node's page, a Block is configured to appear that shows an attached "Service File" Media, or an "Original File" if no Service File is present. This block displays the media in the "Source" view mode, i.e. in its default viewer. This block placement is done using a Context. The block itself is a rendering of a View. +1. On a node-by-node-basis, you can override the viewer used by setting the "Viewer Override" field to a different viewer (such as PDF.js). This will cause a different Context to be activated instead, which will render the Service File or Original File media in a different view mode, where a different viewer is used. + +!!! note + Formerly, this field was called "Display Hints". That field name has been retired in order to reduce confusion, since this uses a different mechanism. This mechanism no longer relies on Node View Modes, or EVA views. However, the basic EVA view still persists in the starter site as it is part of the Islandora Core Feature. Again, it will first look for a Service File, then fall back to the Original File. + + +### Changing a Viewer for all media of a media type + +With the above configuration: + +* Navigate to the "Manage Display" page for that media type +* Select the "Source" view mode (the secondary tabs along the top) +* Make sure that only the appropriate fields are being rendered +* For the "main" file field (it's named different things in different media types: `field_media_file`, `field_media_image`... as appropriate), select a different field formatter and configure it how you like it. + +### Configuring an "optional" viewer + +Suppose you have a new viewer available, for example, for zip files. You could either: + +* create a new media type specially for zip files, and configure this viewer in the "Source" view mode, or, +* configure an alternative viewer for the File media type. + +Either would work! The choice is yours to make. They're honestly both good. + +Should you choose the latter: + +* create a new Display mode for media at Structure > Display Modes > View modes. Make sure you select a "Media" view mode. +* Configure the relevant (File) Media Type to display your file in your viewer. In the File media type, go to Manage Display, and on the Default tab, enable this view mode for "Custom Display Settings" (it's all the way at the bottom). A tab for new display mode should have appeared. Go there and set up your field so that only the file field displays, and it displays using your viewer. +* in the "Media Display" view, create a new Block (or pair of Blocks) that (just for this Block) render the Media in your new view mode. If desired, create a pair with one selecting a Service File and one selecting a Original File, and use "No results behaviour" to place a fallback. +* in the Islandora Display taxonomy, add a new term, with an external URI. +* create a Context that finds Islandora Nodes that have a term with that URI. In that context, place the block you created in the Media Display view. +* Finally, edit the Default Media display Context to not be in effect if the node has a term with the URI that you set. + + +## Configuring Viewers that use Blocks + +Both OpenSeadragon and Mirador provide blocks that act as multi-page viewers. To configure one of these viewers: + +* Place the block on relevant pages. Usually this is a node page. In the Starter Site this is done by a Context ("Openseadragon Block - Multipaged items"). Other methods of placing blocks include the standard Block interface, and Layout Builder. +* While placing the block, it will ask you to configure the "IIIF manifest URL". In the Starter Site, we have a IIIF Manifest view configured to create a manifest based on the "original file" media attached to the pages (children) of a given node. In the view, it is configured with path `node/%node/book-manifest-original`; in the block, we enter this as node/[node:nid]/book-manifest-original. When the block is rendered on a node page, such as `node/18`, then the nid (18) will be passed into the view. +* If placing a block using Contexts, make sure that "Include blocks from block layout" is selected. (If you find yourself missing normal page elements, this may be why). diff --git a/docs/user-documentation/linked-data.md b/docs/user-documentation/linked-data.md new file mode 100644 index 000000000..67541f8a1 --- /dev/null +++ b/docs/user-documentation/linked-data.md @@ -0,0 +1,340 @@ +# Linked data in Islandora +The purpose of this page is to provide a guided reading list to anyone who wants to get up to speed on the basics of linked data within the Islandora community. Those who make their way through the readings will be able to talk competently about linked data and better understand the design decisions made in Islandora. The list starts with the fundamentals of linked data (RDF, SPARQL, serializations and ontologies) and moves toward more advanced topics specific to the use cases of a Fedora 4 based digital repository system. + +## Reading list + +### Basics of linked data +This section seeks to give the reader a foundational understanding of what linked data is, why it is useful, and a very superficial understanding of how it works. + +- [Tim Berners-Lee’s description of Linked Data](https://www.w3.org/DesignIssues/LinkedData.html) +- [Manu Sporny's "What is Linked Data?" YouTube Video](https://www.youtube.com/watch?v=4x_xzT5eF5Q) +- [Wikipedia article on Linked Data](https://en.wikipedia.org/wiki/Linked_data) +- [Wikipedia article on Semantic Web](https://en.wikipedia.org/wiki/Semantic_Web) +- [Wikipedia article on URIs](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier) +- [Wikipedia article on the W3C](https://en.wikipedia.org/wiki/World_Wide_Web_Consortium) +- [W3C’s description of Linked Data](https://www.w3.org/standards/semanticweb/data) +- [W3C’s Linked Data Glossary](https://www.w3.org/TR/ld-glossary/) +- [W3C’s Architecture of the World Wide Web](https://www.w3.org/TR/webarch/) + +### Understanding RDF +This section is all about RDF, the Resource Description Framework, which defines the way linked data is structured. + +- [Wikipedia article on RDF](https://en.wikipedia.org/wiki/Resource_Description_Framework) +- [D-Lib’s Intro to RDF](http://www.dlib.org/dlib/may98/miller/05miller.html) +- [W3C’s RDF 1.1 Primer](https://www.w3.org/TR/rdf11-primer/) +- [W3C’s RDF 1.1 Concepts](https://www.w3.org/TR/rdf11-concepts/) + +### Querying linked data with SPARQL +This section takes a look at SPARQL, the query language that allows you to ask linked data very specific questions. The queryable nature of linked data is one of the things that makes it so special. Try some SPARQL queries on DBpedia's endpoint to get some hands-on experience. + +- [Wikipedia article on SPARQL](https://en.wikipedia.org/wiki/SPARQL) +- [W3C’s SPARQL 1.1 Overview](https://www.w3.org/TR/sparql11-overview/) +- [W3C’s SPARQL 1.1 Query Language](https://www.w3.org/TR/sparql11-query/) +- [DBpedia's SPARQL Endpoint](https://dbpedia.org/sparql) + +### RDF serialization formats +RDF data can be translated into many different formats. RDF/XML is the original way that RDF data was shared, but there are much more human-friendly serialization formats like Turtle which is great for beginners. JSON-LD is the easiest format for applications to use, and is the serialization format that Islandora uses internally. Make sure to check out the [JSON-LD Playground](http://json-ld.org/playground/) for an interactive learning experience. + +- [Wikipedia article on Serialization](https://en.wikipedia.org/wiki/Serialization) +- [W3C’s RDF/XML Syntax Specification](https://www.w3.org/TR/REC-rdf-syntax/) +- [W3C’s RDF 1.1 Turtle](https://www.w3.org/TR/turtle/) +- [W3C’s JSON-LD 1.0](https://www.w3.org/TR/json-ld/) +- [JSON-LD Website](http://json-ld.org/) +- [JSON-LD Playground](http://json-ld.org/playground/) + +### Ontology and vocabulary basics +Ontologies and vocabularies are created by communities of people to describe things, and once created, anyone can use an ontology or vocabulary to describe their resources. This section goes over some of the more popular ontologies & vocabularies in use. + +- [Wikipedia article on Ontologies](https://en.wikipedia.org/wiki/Ontology_(information_science)) +- [W3C’s description of Ontologies/Vocabularies (sameish thing)](https://www.w3.org/standards/semanticweb/ontology) +- [Wikipedia article on Friend of a Friend (FOAF) ontology](https://en.wikipedia.org/wiki/FOAF_(ontology)) +- [FOAF 0.99 Vocabulary Specification](http://xmlns.com/foaf/spec/) +- [Socially Interconnected Online Communities Ontology (SIOC)](http://sioc-project.org/) +- [Dublin Core in RDF](http://dublincore.org/documents/dc-rdf/) + +### Building ontologies +One isn't limited to the ontologies & vocabularies that already exist in the world, anyone is free to create their own. This section goes over ontologies that exist to help those trying to create their own ontologies. + +- [Wikipedia article on RDF Schema (RDFS)](https://en.wikipedia.org/wiki/RDF_Schema) +- [W3C’s RDF Schema (RDFS) 1.1](https://www.w3.org/TR/rdf-schema/) +- [Wikipedia article on Simple Knowledge Organization System (SKOS)](https://en.wikipedia.org/wiki/Simple_Knowledge_Organization_System) +- [ALA’s SKOS: A Guide for Information Professionals](http://www.ala.org/alcts/resources/z687/skos) +- [Wikipedia article on Web Ontology Language (OWL)](https://en.wikipedia.org/wiki/Web_Ontology_Language) +- [W3C’s OWL 2 Primer](https://www.w3.org/TR/owl2-primer/) +- [W3C’s OWL 2 Quick Reference](https://www.w3.org/TR/owl2-quick-reference/) + +### Repository-specific ontologies +Most ontologies are very specific to certain use cases, and digital repository systems are no different. This section covers ontologies that are of specific interest to users of Islandora, or any Fedora 4 based digital repository system. + +- [MODS RDF Namespace Document](http://www.loc.gov/standards/mods/modsrdf/v1/) +- [MODS RDF Ontology Primer](https://www.loc.gov/standards/mods/modsrdf/primer.html) +- [MODS RDF Ontology Primer 2: MODS XML to RDF Conversion](https://www.loc.gov/standards/mods/modsrdf/primer-2.html) +- [PREMIS RDF Namespace Document](http://id.loc.gov/ontologies/premis.html) +- [Linked Data Platform (LDP) 1.0 Primer](https://www.w3.org/TR/ldp-primer/) +- [LDP 1.0 Specification](https://www.w3.org/TR/ldp/) +- [Portland Common Data Model (PCDM) wiki)](https://github.com/duraspace/pcdm/wiki) +- [PCDM ontologies list](http://pcdm.org/) +- [PCDM Models ontology (defines Collections, Objects & Files)](http://pcdm.org/2016/04/18/models) +- [Fedora ontologies](http://fedora.info/) + +## RDF generation +### Summary +In Islandora, the **JSON-LD Module** transforms nodes (or media, or taxonomy terms) into the RDF that is synced into Fedora and the Triplestore. It uses RDF mappings, a concept defined by the **RDF Module**, and exposes them through the **REST API** at `?_format=jsonld`. + +### Background + +A quick overview of JSON-LD, the RDF module, and the REST API. + +#### The JSON-LD syntax +[JSON-LD](https://www.w3.org/2013/dwbp/wiki/RDF_AND_JSON-LD_UseCases) is a syntax which can be used to express RDF (like Turtle, or RDF XML), that is written in JSON, because devs like JSON and it's web-friendly. The JSON-LD syntax was designed for including Linked Data within HTML of web pages (similar to microdata or RDFa). Instead of nesting the RDF predicates within _existing_ HTML tags as RDFa does, JSON-LD lets you put a solid blob of Linked Data inside a `