From 1125c95dd1536c6d57b6b386e2266e8dd6c4753e Mon Sep 17 00:00:00 2001 From: Mark Sandford Date: Wed, 9 Oct 2019 16:03:19 -0400 Subject: [PATCH 1/7] documents Colgate's AWS setup with ingest storage --- .../example-aws-configuration.md | 73 +++++++++++++++++++ 1 file changed, 73 insertions(+) create mode 100644 docs/cookbook-recipes/example-aws-configuration.md diff --git a/docs/cookbook-recipes/example-aws-configuration.md b/docs/cookbook-recipes/example-aws-configuration.md new file mode 100644 index 000000000..038daa87d --- /dev/null +++ b/docs/cookbook-recipes/example-aws-configuration.md @@ -0,0 +1,73 @@ + + +# Example AWS configuration + +This is the current (as of October 2019) configuration used by Colgate University on AWS. It's purpose is to provide one example of a working ISLE environment, or as a starting point for institutions with similar needs. + +## Overview + +Colgate University Libraries' Digital Collections currently holds over 115000 individual objects/pages. The collection uses just under 5 Tb of storage, which includes high resolution TIFFs for the majority of the objects. + +## AWS configuration + + - Production: + - m4.xlarge EC2 Reserved Instance **Note**: have a 3 year contract for the m4.xlarge. Amazon offers newer m5 instances for this tier which would be preferred. + - 75 GB EBS storage, type gp2, for the operating system, docker images, etc. + - 8 TB EBS storage, tpe st1, for the Fedora datastore. This is where all digital objects, derivates, and metadata are stored. + - 300 GB EBS storage, type gp2, used as a temporary holding location for objects to be ingested. After successful ingest, the objects are deleted from this volume. + - Staging: + - Staging differs from Production in 2 ways: + - m4.large instance rather than xlarge, as performance is less of a concern on staging. The system works, but can be sluggish compared to production. + - No 300 GB holding location permanently attached. This can be added fairly easily if a need to test a large ingest arose. + +Rather than a separate 300 EB volume, it would be possible to simply increase the size of the OS disk from 75 GB to something greater to allow room for object prior to ingestion. Hwoever, having it separate provides a few advantages: + - Volumes cannot be resized on the fly. If temporary storage needs exceed what is available, the server would need to be shut down, a new volume created, and the existing volume copied over to it. + - If we are not ingesting anything for a period of time, we can easily turn off the 300 GB volume. Because all data on it is meant to be temporary, we could delete it entirely to avoid being charged for storage we are not using. A new volume could be quickly and easily re-attached as needed at a later date with minimal interruption to the production site. + +## AWS Setup + +Adding volumes in AWS is a fairly simple process, and is well documented [on their site](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html) Be sure to check the AWS site for current pricing. EBS volumes are billed based on the size allotted, not used. + +Colgate's ISLE host server' fstab has the following entry: + + >UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx /mnt/tempstorage xfs defaults,nofail 0 2 + +which mounts the 300 GB volume at /mnt/tempstorage + +In order for the Apache docker container to be able to access /mnt/tempstorage, it must be added as a bind mount. If ISLE is currently running, issue `docker-compose down` to shut it down before making changes to the confguration file. + +Colgate's docker-compose.production.yml contains the following line under the "apache" "volumes" section: + + >\- /mnt/tempstorage:/mnt/ingest + +Note that either of these paths can point to any location as long as it is not otherwise in use (e.g do not bind this to /var/www/ on the docker container, as that directory already contains the Drupal files). /mnt is commonly used as the default directory for mounting volumes in Linux. + +Any files or directories added to /mnt/tempstorage/ on the host server will be immediately available to the Apache docker container aft er bringing ISLE back up with `docker-compose up -d`. + +## Workflow + +Colgate primarily uses the Islandora Multi Importer (IMI) module for ingesting new objects. The basic workflow is as follows: + + - An archivist has a directory of recently scanned student newspapers in a directory on their computer + - TIFFs are uploaded via SFTP to /mnt/tempstorage/studentnews + - The metadata spreadsheet has a column for object location called "filepath" that refers to where Islandora will find it within the Docker container, e.g. /mnt/ingest/studentnews/page1.tif + - In the IMI GUI, the "local" location is selected, and object is mapped to the spreadsheet column "filepath" + - Files are ingested. Upon completion and verification that the ingest was successful, the archivist deletes /studentnews subdirectory. This can be done at any time after ingest so long as there is still capacity on the volume. Because AWS charges for the GB allocated rather than used, there is no cost savings for deleting the files quickly. Only deleting the volume entirely via the AWS console would avoid charges. + +## Removing or resizing the ingest volume + +**Note**: These instructions assume that the *only* data stored in the temporary volume is meant to be ephemeral and everything on it can be safely deleted. + + - If no ingests are planned, the ingest volume can easily be removed to avoid unnecessary charges. + - As always, bring the docker containers down with `docker-compose down` before editing docker-compose.production.yml + - If you plan on provisioning the volume again later, comment out the lines added to /etc/fstab and docker-compose.production.yml by adding a \# to the front. ex: + + >\#UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx /mnt/tempstorage xfs defaults,nofail 0 2 + + And + + >\#\- /mnt/tempstorage:/mnt/ingest + - Or delete both lines if you are sure you will not need them again. + - Unmount the drive from the host server: + - `sudo umount -d /mnt/tempstorage` replacing /mnt/tempstorage with the path you used on the host server. + - See the AWS site for further instructions for [detaching](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-detaching-volume.html) then [deleting](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-deleting-volume.html) volumes. From eecc2e85b2386d454b84115470a9a26641815ab2 Mon Sep 17 00:00:00 2001 From: Mark Sandford Date: Wed, 9 Oct 2019 16:08:01 -0400 Subject: [PATCH 2/7] aspell ftw --- docs/cookbook-recipes/example-aws-configuration.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/cookbook-recipes/example-aws-configuration.md b/docs/cookbook-recipes/example-aws-configuration.md index 038daa87d..aff947504 100644 --- a/docs/cookbook-recipes/example-aws-configuration.md +++ b/docs/cookbook-recipes/example-aws-configuration.md @@ -13,14 +13,14 @@ Colgate University Libraries' Digital Collections currently holds over 115000 in - Production: - m4.xlarge EC2 Reserved Instance **Note**: have a 3 year contract for the m4.xlarge. Amazon offers newer m5 instances for this tier which would be preferred. - 75 GB EBS storage, type gp2, for the operating system, docker images, etc. - - 8 TB EBS storage, tpe st1, for the Fedora datastore. This is where all digital objects, derivates, and metadata are stored. + - 8 TB EBS storage, toe st1, for the Fedora datastore. This is where all digital objects, derives, and metadata are stored. - 300 GB EBS storage, type gp2, used as a temporary holding location for objects to be ingested. After successful ingest, the objects are deleted from this volume. - Staging: - Staging differs from Production in 2 ways: - m4.large instance rather than xlarge, as performance is less of a concern on staging. The system works, but can be sluggish compared to production. - No 300 GB holding location permanently attached. This can be added fairly easily if a need to test a large ingest arose. -Rather than a separate 300 EB volume, it would be possible to simply increase the size of the OS disk from 75 GB to something greater to allow room for object prior to ingestion. Hwoever, having it separate provides a few advantages: +Rather than a separate 300 EB volume, it would be possible to simply increase the size of the OS disk from 75 GB to something greater to allow room for object prior to ingestion. However, having it separate provides a few advantages: - Volumes cannot be resized on the fly. If temporary storage needs exceed what is available, the server would need to be shut down, a new volume created, and the existing volume copied over to it. - If we are not ingesting anything for a period of time, we can easily turn off the 300 GB volume. Because all data on it is meant to be temporary, we could delete it entirely to avoid being charged for storage we are not using. A new volume could be quickly and easily re-attached as needed at a later date with minimal interruption to the production site. @@ -34,7 +34,7 @@ Colgate's ISLE host server' fstab has the following entry: which mounts the 300 GB volume at /mnt/tempstorage -In order for the Apache docker container to be able to access /mnt/tempstorage, it must be added as a bind mount. If ISLE is currently running, issue `docker-compose down` to shut it down before making changes to the confguration file. +In order for the Apache docker container to be able to access /mnt/tempstorage, it must be added as a bind mount. If ISLE is currently running, issue `docker-compose down` to shut it down before making changes to the configuration file. Colgate's docker-compose.production.yml contains the following line under the "apache" "volumes" section: From 19e318245eae0fbd7c6185f68d7689af84776944 Mon Sep 17 00:00:00 2001 From: Mark Sandford Date: Wed, 9 Oct 2019 16:08:55 -0400 Subject: [PATCH 3/7] aspell ftw --- docs/cookbook-recipes/example-aws-configuration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/cookbook-recipes/example-aws-configuration.md b/docs/cookbook-recipes/example-aws-configuration.md index aff947504..0009775b1 100644 --- a/docs/cookbook-recipes/example-aws-configuration.md +++ b/docs/cookbook-recipes/example-aws-configuration.md @@ -20,7 +20,7 @@ Colgate University Libraries' Digital Collections currently holds over 115000 in - m4.large instance rather than xlarge, as performance is less of a concern on staging. The system works, but can be sluggish compared to production. - No 300 GB holding location permanently attached. This can be added fairly easily if a need to test a large ingest arose. -Rather than a separate 300 EB volume, it would be possible to simply increase the size of the OS disk from 75 GB to something greater to allow room for object prior to ingestion. However, having it separate provides a few advantages: +Rather than a separate 300 EBS volume, it would be possible to simply increase the size of the OS disk from 75 GB to something greater to allow room for object prior to ingestion. However, having it separate provides a few advantages: - Volumes cannot be resized on the fly. If temporary storage needs exceed what is available, the server would need to be shut down, a new volume created, and the existing volume copied over to it. - If we are not ingesting anything for a period of time, we can easily turn off the 300 GB volume. Because all data on it is meant to be temporary, we could delete it entirely to avoid being charged for storage we are not using. A new volume could be quickly and easily re-attached as needed at a later date with minimal interruption to the production site. From 3833a32d99be1d748f04c06054d2a03e94bda840 Mon Sep 17 00:00:00 2001 From: Mark Sandford Date: Wed, 9 Oct 2019 16:09:57 -0400 Subject: [PATCH 4/7] aspell ftw --- docs/cookbook-recipes/example-aws-configuration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/cookbook-recipes/example-aws-configuration.md b/docs/cookbook-recipes/example-aws-configuration.md index 0009775b1..483563608 100644 --- a/docs/cookbook-recipes/example-aws-configuration.md +++ b/docs/cookbook-recipes/example-aws-configuration.md @@ -13,7 +13,7 @@ Colgate University Libraries' Digital Collections currently holds over 115000 in - Production: - m4.xlarge EC2 Reserved Instance **Note**: have a 3 year contract for the m4.xlarge. Amazon offers newer m5 instances for this tier which would be preferred. - 75 GB EBS storage, type gp2, for the operating system, docker images, etc. - - 8 TB EBS storage, toe st1, for the Fedora datastore. This is where all digital objects, derives, and metadata are stored. + - 8 TB EBS storage, toe st1, for the Fedora datastore. This is where all digital objects, derivatives, and metadata are stored. - 300 GB EBS storage, type gp2, used as a temporary holding location for objects to be ingested. After successful ingest, the objects are deleted from this volume. - Staging: - Staging differs from Production in 2 ways: From 45ca9c7b4f96d7ee3e3974d48c0c6599168e1927 Mon Sep 17 00:00:00 2001 From: Mark Sandford Date: Wed, 9 Oct 2019 16:24:53 -0400 Subject: [PATCH 5/7] added a some more details --- .../example-aws-configuration.md | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/docs/cookbook-recipes/example-aws-configuration.md b/docs/cookbook-recipes/example-aws-configuration.md index 483563608..a86d87d56 100644 --- a/docs/cookbook-recipes/example-aws-configuration.md +++ b/docs/cookbook-recipes/example-aws-configuration.md @@ -11,14 +11,14 @@ Colgate University Libraries' Digital Collections currently holds over 115000 in ## AWS configuration - Production: - - m4.xlarge EC2 Reserved Instance **Note**: have a 3 year contract for the m4.xlarge. Amazon offers newer m5 instances for this tier which would be preferred. + - m4.xlarge EC2 Reserved Instance **Note**: We have a 3 year contract for the m4.xlarge. Amazon offers newer m5 instances for this tier which would be preferred. - 75 GB EBS storage, type gp2, for the operating system, docker images, etc. - 8 TB EBS storage, toe st1, for the Fedora datastore. This is where all digital objects, derivatives, and metadata are stored. - - 300 GB EBS storage, type gp2, used as a temporary holding location for objects to be ingested. After successful ingest, the objects are deleted from this volume. + - 300 GB EBS storage, type gp2, used as a temporary holding location for objects to be ingested. After successful ingest, the objects are deleted from this volume. - Staging: - Staging differs from Production in 2 ways: - m4.large instance rather than xlarge, as performance is less of a concern on staging. The system works, but can be sluggish compared to production. - - No 300 GB holding location permanently attached. This can be added fairly easily if a need to test a large ingest arose. + - No 300 GB holding location permanently attached. This can be added fairly easily if a need to test a large ingest arose (see below). Rather than a separate 300 EBS volume, it would be possible to simply increase the size of the OS disk from 75 GB to something greater to allow room for object prior to ingestion. However, having it separate provides a few advantages: - Volumes cannot be resized on the fly. If temporary storage needs exceed what is available, the server would need to be shut down, a new volume created, and the existing volume copied over to it. @@ -40,9 +40,9 @@ Colgate's docker-compose.production.yml contains the following line under the "a >\- /mnt/tempstorage:/mnt/ingest -Note that either of these paths can point to any location as long as it is not otherwise in use (e.g do not bind this to /var/www/ on the docker container, as that directory already contains the Drupal files). /mnt is commonly used as the default directory for mounting volumes in Linux. +Note that either of these paths can point to any location as long as it is not otherwise in use (e.g do not bind this to /var/www/ on the docker container, as that directory already contains the Drupal files). /mnt is commonly used as the default directory for mounting volumes in Linux. It is also possible to mount them in the same place, eg "/mnt/ingest:/mnt/ingest" but that may make it difficult to tell them apart later on. -Any files or directories added to /mnt/tempstorage/ on the host server will be immediately available to the Apache docker container aft er bringing ISLE back up with `docker-compose up -d`. +Any files or directories added to /mnt/tempstorage/ on the host server will be immediately available to the Apache docker container after bringing ISLE back up with `docker-compose up -d`. ## Workflow @@ -54,6 +54,13 @@ Colgate primarily uses the Islandora Multi Importer (IMI) module for ingesting n - In the IMI GUI, the "local" location is selected, and object is mapped to the spreadsheet column "filepath" - Files are ingested. Upon completion and verification that the ingest was successful, the archivist deletes /studentnews subdirectory. This can be done at any time after ingest so long as there is still capacity on the volume. Because AWS charges for the GB allocated rather than used, there is no cost savings for deleting the files quickly. Only deleting the volume entirely via the AWS console would avoid charges. +## Accessing the server + +The above workflow assumes the archivist has access to the AWS server + - AWS block all ports by default. A static IP address for anyone moving files to the server would be ideal. Barring that, limiting the range to a library staff vlan would be better than opening the SSH port to all of campus. + - SSH keys are required to connect to the AWS server. There are various tools to generate these for Windows and Mac. Amazon has [documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) on this process. + - At Colgate, provisions were made for a remote worker without a static IP address by setting up an rsync script to move files from a server on campus that the worker did have access to, to the AWS server. This was preferred over whitelisting the entire VPN IP range, but setting that up is outside the scope of this document. + ## Removing or resizing the ingest volume **Note**: These instructions assume that the *only* data stored in the temporary volume is meant to be ephemeral and everything on it can be safely deleted. @@ -71,3 +78,4 @@ Colgate primarily uses the Islandora Multi Importer (IMI) module for ingesting n - Unmount the drive from the host server: - `sudo umount -d /mnt/tempstorage` replacing /mnt/tempstorage with the path you used on the host server. - See the AWS site for further instructions for [detaching](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-detaching-volume.html) then [deleting](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-deleting-volume.html) volumes. + - To resize the volume, create a new EBS volume of the desired size and add it to your AWS instance and fstab as described above. Note you will need to change the UUID in /etc/fstab to match the newly created volume. If you mount it to the same directory as the previous volume, you should not need to change the docker-compose.production.yml entry. From da2ec7a5b2b143cf49bf4cfb070799fed0b86707 Mon Sep 17 00:00:00 2001 From: Mark Sandford Date: Fri, 25 Oct 2019 11:17:49 -0400 Subject: [PATCH 6/7] Update example-aws-configuration.md --- docs/cookbook-recipes/example-aws-configuration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/cookbook-recipes/example-aws-configuration.md b/docs/cookbook-recipes/example-aws-configuration.md index a86d87d56..ecbe20eb3 100644 --- a/docs/cookbook-recipes/example-aws-configuration.md +++ b/docs/cookbook-recipes/example-aws-configuration.md @@ -13,7 +13,7 @@ Colgate University Libraries' Digital Collections currently holds over 115000 in - Production: - m4.xlarge EC2 Reserved Instance **Note**: We have a 3 year contract for the m4.xlarge. Amazon offers newer m5 instances for this tier which would be preferred. - 75 GB EBS storage, type gp2, for the operating system, docker images, etc. - - 8 TB EBS storage, toe st1, for the Fedora datastore. This is where all digital objects, derivatives, and metadata are stored. + - 8 TB EBS storage, type st1, for the Fedora datastore. This is where all digital objects, derivatives, and metadata are stored. - 300 GB EBS storage, type gp2, used as a temporary holding location for objects to be ingested. After successful ingest, the objects are deleted from this volume. - Staging: - Staging differs from Production in 2 ways: From 6d268488fde7a053f71993bff738e082a6b044f5 Mon Sep 17 00:00:00 2001 From: Mark Sandford Date: Wed, 20 Nov 2019 15:49:13 -0500 Subject: [PATCH 7/7] edits --- docs/cookbook-recipes/example-aws-configuration.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/cookbook-recipes/example-aws-configuration.md b/docs/cookbook-recipes/example-aws-configuration.md index ecbe20eb3..1f616cca5 100644 --- a/docs/cookbook-recipes/example-aws-configuration.md +++ b/docs/cookbook-recipes/example-aws-configuration.md @@ -8,7 +8,7 @@ This is the current (as of October 2019) configuration used by Colgate Universit Colgate University Libraries' Digital Collections currently holds over 115000 individual objects/pages. The collection uses just under 5 Tb of storage, which includes high resolution TIFFs for the majority of the objects. -## AWS configuration +## AWS Configuration - Production: - m4.xlarge EC2 Reserved Instance **Note**: We have a 3 year contract for the m4.xlarge. Amazon offers newer m5 instances for this tier which would be preferred. @@ -28,7 +28,7 @@ Rather than a separate 300 EBS volume, it would be possible to simply increase t Adding volumes in AWS is a fairly simple process, and is well documented [on their site](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html) Be sure to check the AWS site for current pricing. EBS volumes are billed based on the size allotted, not used. -Colgate's ISLE host server' fstab has the following entry: +Colgate's ISLE host server's fstab has the following entry: >UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx /mnt/tempstorage xfs defaults,nofail 0 2 @@ -54,14 +54,14 @@ Colgate primarily uses the Islandora Multi Importer (IMI) module for ingesting n - In the IMI GUI, the "local" location is selected, and object is mapped to the spreadsheet column "filepath" - Files are ingested. Upon completion and verification that the ingest was successful, the archivist deletes /studentnews subdirectory. This can be done at any time after ingest so long as there is still capacity on the volume. Because AWS charges for the GB allocated rather than used, there is no cost savings for deleting the files quickly. Only deleting the volume entirely via the AWS console would avoid charges. -## Accessing the server +## Accessing the Server The above workflow assumes the archivist has access to the AWS server - AWS block all ports by default. A static IP address for anyone moving files to the server would be ideal. Barring that, limiting the range to a library staff vlan would be better than opening the SSH port to all of campus. - SSH keys are required to connect to the AWS server. There are various tools to generate these for Windows and Mac. Amazon has [documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) on this process. - At Colgate, provisions were made for a remote worker without a static IP address by setting up an rsync script to move files from a server on campus that the worker did have access to, to the AWS server. This was preferred over whitelisting the entire VPN IP range, but setting that up is outside the scope of this document. -## Removing or resizing the ingest volume +## Removing or Resizing the Ingest Volume **Note**: These instructions assume that the *only* data stored in the temporary volume is meant to be ephemeral and everything on it can be safely deleted.