From ecf939008843a53563786cff8b107d74639815fc Mon Sep 17 00:00:00 2001 From: Greg Tunink Date: Wed, 14 Jun 2023 12:22:44 -0500 Subject: [PATCH 1/5] Add batch management docs; update readme --- README.md | 55 ++----- docs/batch-management.md | 337 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 349 insertions(+), 43 deletions(-) create mode 100644 docs/batch-management.md diff --git a/README.md b/README.md index 9f4db01..78f3909 100644 --- a/README.md +++ b/README.md @@ -1,16 +1,13 @@ -# open-oni_nebraska_theme -This is the new nebraska child theme for open-oni +# Open ONI Nebraska Theme +This is the Nebraska child theme for Open ONI ## Including in the Open ONI App -Clone the Open ONI repository's latest v1.0.x tag and add the Nebraska theme to the `themes` directory. -If you are upgrading from v0.1.x, you will need to remove -`sass_processor` from `onisite/settings_local.py`. +Clone Open ONI repository and add Nebraska theme to the `themes` directory -``` -git clone git@github.com:open-oni/open-oni.git open-oni +```bash +git clone git@github.com:open-oni/open-oni.git cd open-oni -git checkout tags/v0.10.0 git clone git@github.com:CDRH/open-oni_nebraska_theme.git themes/nebraska ``` @@ -22,7 +19,7 @@ The Nebraska theme depends upon several plugins: You will need to clone and configure these as well or else the theme's navigation will not work as intended. Follow instructions in those repositories for customizing the contents of the map and the featured content using the files in this theme's `conf/plugins` directory. -``` +```bash git clone git@github.com:open-oni/plugin_calendar.git onisite/plugins/calendar git clone git@github.com:open-oni/plugin_featured_content.git onisite/plugins/featured_content git clone git@github.com:open-oni/plugin_map.git onisite/plugins/map @@ -30,8 +27,7 @@ git clone git@github.com:open-oni/plugin_map.git onisite/plugins/map Copy in Nebraska configuration for the calendar plugin (these may differ if the plugin has been changed and our theme has not yet been updated): -``` -# calendar +```bash cp themes/nebraska/conf/plugins/calendar/config.py onisite/plugins/calendar/config.py ``` @@ -91,7 +87,7 @@ TODO Until such time as the LoC updates their Marc records from OCLC, you need to include this line in your `settings_local.py` file before you can load any of the Plattsmouth batches. All other batches will not load while this line is in your settings, so remember to comment it or remove it after loading Plattsmouth: -``` +```python MARC_RETRIEVAL_URLFORMAT = "https://raw.githubusercontent.com/CDRH/open-oni_nebraska_theme/master/marc/%s/marc.xml" ``` @@ -104,36 +100,9 @@ tool](https://sass-lang.com/install): sass static/css/main.scss static/css/main.css ``` -[See nebraska oni docs](docs/openoni.md#compile-static-assets) +For production, one must run additional commands to +[Compile Static Assets](docs/openoni.md#compile-static-assets) -## Load Batch Quick Reference +## Batch Management -Upload your batch files to `/var/local/newspapers/(batch_name)`. Then run: - -```bash -sudo chmod -R g+rwX /var/local/newspapers/(batch_name) -sudo chmod -R o+rX /var/local/newspapers/(batch_name) -``` - -Then run the following commands from the base of the Nebraska open-oni install as a regular user, NOT as root. - -```bash -source ENV/bin/activate - -./manage.py load_batch /var/local/newspapers/(batch_name)/ - -# For batches to be visible in /batches page, must be released -# Add --reset flag to clear release dates and recalculate them -# Release date and time come from: -# 1. bag-info.txt, if found in the batch source -# 2. Tab-delimited CSV file if provided in format: batch_name \t batch_date -# 3. http://chroniclingamerica.loc.gov/batches.xml -# 4. Current server datetime -./manage.py release -``` - -If you want to chain together multiple batches, or you want to prevent scripts from exiting if the terminal shell is closed / session disconnects, run `nohup` with a command or script. - -```bash -nohup (command) >> nohup.out -``` +[Batch Management documentation](docs/batch-management.md) diff --git a/docs/batch-management.md b/docs/batch-management.md new file mode 100644 index 0000000..4a36853 --- /dev/null +++ b/docs/batch-management.md @@ -0,0 +1,337 @@ +# Nebraska Newspapers Batch Management + +## One Time Setup Actions + +These are steps that only need to be run once on a new Nebraska Newspapers server + +Install cifs software to enable mounting network file shares + +```bash +sudo yum install cifs-utils +``` + +Install bagit.py globally (separate from Open ONI virtual environment) + +```bash +# Add to system-wide package list regularly updated +echo "bagit" | sudo tee -a /root/requirements.txt + +# Manually install now +sudo pip install bagit +``` + +## nohup Utility +Several commands run throughout this process will take from around 45 minutes up +to many hours to complete. If one's terminal or SSH connection is closed, the +process will normally be interrupted and require some cleanup and restarting the +command again. To avoid this we can prefix these long-running commands with the +`nohup` utility and start them running in the background of the Bash shell by +appending an `&` at the end of the command. + +This is most useful for the load_batch command which loads a batches data into +Open ONI as this command usually takes multiple hours to complete and it's not +necessary to stay connected while this runs. For this, we take the normal +command + +```bash +./manage load_batch /var/local/newspapers/batch_nbu_(batch_name_ver##) > (batch_name).txt +``` + +and modify as follows + +```bash +nohup ./manage load_batch /var/local/newspapers/batch_nbu_(batch_name_ver##) > (batch_name).txt & +``` + +One may also wish to use this pattern with `rsync` and `bagit.py` commands which +also run longer in case one expects to interrupt their network connection or +sleep / shutdown their computer. + +We can also chain commands together by putting them in a bash script file +and running it as `nohup load_multiple_batches.sh &`. + +## Transfer Batch Files to Dev Server + +SSH into the server + +```bash +ssh (active-directory-name@unl.edu)@nebnewspapers-dev.unl.edu +# Example: ssh lweakly2@unl.edu@nebnewspapers-dev.unl.edu +``` + +Mount the network share from libr1901 to transfer files to the server + +```bash +# Make directory to use as mount point +cd [~] +mkdir libr1901-newspapers + +sudo mount -t cifs -o username=(active-directory-name but no "@unl.edu") \ + //libr1901.unl.edu/newspapers/ libr1901-newspapers/ +# Example: username=gtunink2 +# NOT username=gtunink2@unl.edu + +# This will prompt you for your Active Directory password twice: +# once for sudo, and once for connecting to libr1901 + +# This transfer will take approximately an hour for most batches + +# Batches on libr1901 have a directory structure like: +batch_nbu_(batch name)/ +┣━━━ (lccn) +┣━━━ ... +┣━━━ batch.xml +┗━━━ batch_1.xml + +# This rsync command is resumable if interrupted +# Explanation of rsync options: https://explainshell.com/explain?cmd=rsync+-ahuX+--del +# Note that omission and presence of trailing slashes here matters, +# so type carefully and double-check before submitting the command! +rsync -ahuX --del --info=progress2 --exclude=""*.tif"" libr1901-newspapers/batch_nbu_(batch_name) /var/local/newspapers/ +``` + +### Record Batch Storage Info +Record under the Size columns on the Newspaper Batch Information spreadsheet + +```bash +# Size including tiffs on libr1901 +du -sh libr1901-newspapers/batch_nbu_(batch_name) + +# Size without tiffs (because the batch was moved over without them) +du -sh /var/local/newspapers/batch_nbu_(batch_name) +``` + +### Unmount Network Share +Unmount after transfers complete. The `libr1901-newspapers` directory will +appear empty after successful unmount. + +```bash +sudo umount libr1901-newspapers + +ls libr1901-newspapers +# No output here indicates an empty directory and successful unmount +``` + +### Prepare Files for Ingest +Set an ONI-compatible directory name, permissions, and add a README file. + +```bash +cd /var/local/newspapers + +# Set ONI-compatible directory name +# Typically end with ver01 unless LoC corrections were made +# Note in "ver" column of Newspaper Batch Information spreadsheet if not "ver01" +mv batch_nbu_(batch_name) batch_nbu_(batch_name_ver##) + +# Set group-writeable, global-readable batch permissions +# These commands may run for 10-15 minutes on the long end. Unfortunately there +# is no way to know its progress, so please be patient. +chmod -R g+rwX batch_nbu_(batch_name_ver##) +chmod -R o+rX batch_nbu_(batch_name_ver##) +``` + +Create README.txt file with fields noted in example + +`nano batch_nbu_(batch_name_ver##)/README.txt`: + +``` +batch_name: nbu_(batch_name) +canonical: libr1901 +contents: (List the titles and their LCCNs here as + "Title (lccn)[, Title (lccn), ...]" + Before ingest, the LCCNs are the top level directories + but one has to look at the XML files to find the title text + in batch_nbu_(batch_name_ver##)/(lccn)/(reel #)/(reel #).xml and search + for the "" element.) +loc: ("yes" if NDNP batch, "no" otherwise) +description: (Most batches have nothing to note here. This is for unusual + circumstances, e.g. to articulate what LCCN changes were made, alterations + to dates, etc needed to recreate batch if copied from canonical location + again, or any other problems) +``` + +Example from `batch_nbu_keithsbear_ver03`: +``` +batch_name: nbu_keithsbear_ver03 +canonical: libr1901 +contents: The Frontier (2010270509), The Monitor (sn94055234), Omaha Monitor (sn94055235) +loc: yes +description: Corrected version received from LoC +``` + +### Create Checksums and Manifest +There should be no more editing of files after creating checksums and manifest +unless errors are being corrected and new checksums and manifest are created. + +This step moves files into a `data/` directory +and CANNOT BE RERUN without moving them back. +The batch directory should look like this afterwards: + +batch_nbu_(batch name_ver##)/ +┣━┳━ data/ +┃ ┣━━━ (lccn) +┃ ┣━━━ ... +┃ ┣━━━ batch.xml +┃ ┣━━━ batch_1.xml +┃ ┗━━━ README.txt +┣━━━ bag-info.txt +┣━━━ bagit.txt +┣━━━ manifest-md5.txt +┗━━━ tagmanifest-md5.txt + +```bash +bagit.py --quiet --md5 batch_nbu_(batch_name_ver##) +``` + +#### Reset Files to Recreate Checksums and Manifest +If the checksums and manifest creation is interrupted or errors need corrected +afterwards, we must delete them and move files out of the `data/` directory + +```bash +cd /var/local/newspapers/batch_nbu_(batch_name_ver##) + +# Delete all files in this directory but not (sub)directories +rm *.* + +# Move all files in the data/ directory to the current directory; remove data/ +mv data/* . +rmdir data +``` + +Now one may rerun the creation command above + +## Dev Server Ingest + +```bash +cd /var/local/www/django/openoni + +# Load the Open ONI Python virtual environment +. ENV/bin/activate + +# This command usually takes many hours to complete so we use nohup and run the +# command in the background so we can disconnect SSH and it will still run +nohup ./manage load_batch /var/local/newspapers/batch_nbu_(batch_name_ver##) > (batch_name).txt & + +# One can print the end of the log file to ensure the command is progressing +tail (batch_name).txt + +# Or follow log info as it is recorded, but note this does slow down the +# ingest process a little. Press Ctrl + C to quit following what's written. +tail -f (batch_name).txt + +# After completion, review the log for errors etc +nano (batch_name).txt +``` + +If an error occurred during batch ingest, ask a developer for help parsing the +log file. + +After batch ingest completes, compare the batch info in Open ONI and Chronicling +America if an NDNP batch: +- https://nebnewspapers-dev.unl.edu/batches +- https://nebnewspapers.unl.edu/batches +- https://chroniclingamerica.loc.gov/awardees/nbu/ + +Record the issue and page counts in the Newspaper Batch Information spreadsheet + +### Word Coordinates Files Permissions +Directories and files created during ingest process have permissions +which cause errors if additional or reworked issues/pages are ingested +in the same LCCN by a different user. We run this command to set permissions +which avoid these errors. It might have to be run with `sudo` by a developer if +this step is missed before a subsequent ingest. + +```bash +# This command can take 15-20 minutes to complete +chmod -R g+rwX /var/local/www/django/openoni/data/word_coordinates/lccn/ 2>/dev/null +``` + +## Production Server Transfer and Ingest + +SSH into the server + +```bash +ssh (active-directory-name@unl.edu)@nebnewspapers.unl.edu +# Example: ssh lweakly2@unl.edu@nebnewspapers.unl.edu + +cd /var/local/newspapers/ + +# Transfer batch from dev server (~45min) +rsync -ahuX --del --info=progress2 nebnewspapers-dev.unl.edu:/var/local/newspapers/batch_nbu_(batch_name_ver##) . + +cd /var/local/newspapers + +# Validate batch checksums and manifest +# Initial fast validation; if validation errors occur, notify a developer +bagit.py --validate --fast batch_nbu_(batch_name_ver##) + +# Thorough validation after fast one succeeds +# May omit --quiet if you prefer to see every file printed as it is processed +# rather than only see errors, but this does slow the process down +bagit.py --validate --quiet batch_nbu_(batch_name_ver##) + +cd /var/local/www/django/openoni + +# Load the Open ONI Python virtual environment +. ENV/bin/activate + +# This command usually takes many hours to complete so we use nohup and run the +# command in the background so we can disconnect SSH and it will still run +nohup ./manage load_batch /var/local/newspapers/batch_nbu_(batch_name_ver##) > (batch_name).txt & + +# One can print the end of the log file to ensure the command is progressing +tail (batch_name).txt + +# Or follow log info as it is recorded, but note this does slow down the +# ingest process a little. Press Ctrl + C to quit following what's written. +tail -f (batch_name).txt + +# After completion, review the log for errors etc +nano (batch_name).txt +``` + +## Purge a Batch +Purging a batch is necessary if an error occurs or data needs correction + +```bash +cd /var/local/www/django/openoni + +# Load the Open ONI Python virtual environment +. ENV/bin/activate + +./manage purge_batch (batch_name) > purge_(batch_name).txt +# Example: ./manage purge_batch batch_nbu_keithsbear_ver02 > purge_keithsbear_ver02.txt + +# There is no progress output as the purge occurs, but we should review the log +# after the command completes +nano purge_(batch_name).txt +``` + +## Miscellaneous Tasks +Note these shouldn't be used without consulting a developer as they are risky +data modifying processes. + +### Change LCCN by Directory +Be careful with this one as it changes files in place. +Suggest one copy the entire directory first before running, +then change the batch files to reflect new LCCN. + +```bash +cd /var/local/newspapers/batch_nbu_(batch_name_ver##)/data/ +find ./(LCCN-wrong) -type f -name ""*.xml"" -exec sed -i '' 's/(LCCN-wrong)/(LCCN-fix)/g' {} \+ +``` + +### Copy Media Files Into Identical Directory Structure +`-n` to print out the files it will move without moving them (aka dry run). +Remove this after dry run to actually move the files. + +Need to be very sure that the directory structure matches +and that to and from paths are starting from the same place!!! + +```bash +rsync -ahuX --info=progress2 -n tiff/location/ new/location/ \ + --include="*/" \ + --include="*.jp2" \ + --include="*.pdf" \ + --exclude="*" +``` From 6929feaaf53f06be1301b9ea0387ffc1190641d2 Mon Sep 17 00:00:00 2001 From: Greg Tunink Date: Wed, 14 Jun 2023 13:34:54 -0500 Subject: [PATCH 2/5] Add time estimate and warning to rsync --- docs/batch-management.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/batch-management.md b/docs/batch-management.md index 4a36853..d3209c4 100644 --- a/docs/batch-management.md +++ b/docs/batch-management.md @@ -83,7 +83,8 @@ batch_nbu_(batch name)/ ┣━━━ batch.xml ┗━━━ batch_1.xml -# This rsync command is resumable if interrupted +# This rsync command is resumable if interrupted and usually takes up to 45 min. +# The % progress and time estimates shown are often optimistically inaccurate. # Explanation of rsync options: https://explainshell.com/explain?cmd=rsync+-ahuX+--del # Note that omission and presence of trailing slashes here matters, # so type carefully and double-check before submitting the command! From 10b6dbbc4b39eb36e94bf3a7c3df7c5457928e58 Mon Sep 17 00:00:00 2001 From: Greg Tunink Date: Fri, 16 Jun 2023 16:10:10 -0500 Subject: [PATCH 3/5] Custom MARC no longer needed for Plattsmouth Plattsmouth LCCNs and MARC records all on chroniclingamerica.gov now Add some extra text around link to Batch Management --- README.md | 9 +-------- marc/2016270200/marc.xml | 1 - marc/2016270201/marc.xml | 1 - marc/2016270202/marc.xml | 1 - marc/2016270203/marc.xml | 1 - marc/2016270204/marc.xml | 1 - marc/2016270205/marc.xml | 1 - marc/2016270206/marc.xml | 1 - marc/2016270207/marc.xml | 1 - marc/2016270208/marc.xml | 1 - marc/sn84022825/marc.xml | 1 - 11 files changed, 1 insertion(+), 18 deletions(-) delete mode 100644 marc/2016270200/marc.xml delete mode 100644 marc/2016270201/marc.xml delete mode 100644 marc/2016270202/marc.xml delete mode 100644 marc/2016270203/marc.xml delete mode 100644 marc/2016270204/marc.xml delete mode 100644 marc/2016270205/marc.xml delete mode 100644 marc/2016270206/marc.xml delete mode 100644 marc/2016270207/marc.xml delete mode 100644 marc/2016270208/marc.xml delete mode 100644 marc/sn84022825/marc.xml diff --git a/README.md b/README.md index 78f3909..4a17c9f 100644 --- a/README.md +++ b/README.md @@ -83,14 +83,6 @@ The JS file containing the map locations does not need to be moved into the plug TODO - consider adding lat / lng to titles in the database, otherwise update the existing list `themes/nebraska/static/js/cities_list.js` as needed -## Load Plattsmouth Papers - -Until such time as the LoC updates their Marc records from OCLC, you need to include this line in your `settings_local.py` file before you can load any of the Plattsmouth batches. All other batches will not load while this line is in your settings, so remember to comment it or remove it after loading Plattsmouth: - -```python -MARC_RETRIEVAL_URLFORMAT = "https://raw.githubusercontent.com/CDRH/open-oni_nebraska_theme/master/marc/%s/marc.xml" -``` - ## Compile Assets Compile `main.scss` to `main.css` with [Sass command line @@ -105,4 +97,5 @@ For production, one must run additional commands to ## Batch Management +For loading newspaper data into the Open ONI see [Batch Management documentation](docs/batch-management.md) diff --git a/marc/2016270200/marc.xml b/marc/2016270200/marc.xml deleted file mode 100644 index 0b034e6..0000000 --- a/marc/2016270200/marc.xml +++ /dev/null @@ -1 +0,0 @@ -00000cas a2200000 i 4500934730295160114d18821892nbuwr ne 0 0eng c2016270200LDLengrdaLDLOCLCFPlattsmouth weekly herald.Plattsmouth, Nebraska :Plattsmouth Herald Pub. Co.volumesWeeklytexttxtrdacontentunmediatednrdamediavolumencrdacarrierBegan in 1882; ceased in 1892.Merger of: Nebraska herald, and: Plattsmouth enterprise.Other edition: Plattsmouth daily herald.Description based on: November 9, 1882; title from masthead.Latest issue consulted: Vol. 29, no. 19 (August 4, 1892)Plattsmouth (Neb.)Newspapers.Newspapers.fast(OCoLC)fst01423814NebraskaPlattsmouth.fast(OCoLC)fst01222697United StatesNebraskaCassPlattsmouth.Plattsmouth daily herald(DLC)2016270201Nebraska herald(DLC)sn8400022825Plattsmouth enterprisePlattsmouth herald(DLC)2016270202 diff --git a/marc/2016270201/marc.xml b/marc/2016270201/marc.xml deleted file mode 100644 index 869d3ca..0000000 --- a/marc/2016270201/marc.xml +++ /dev/null @@ -1 +0,0 @@ -00000cas a2200000 i 4500934730009160114d188319uunbudn ne 0 0eng c2016270201LDLengrdaLDLOCLCFThe Plattsmouth daily herald.Title in publisher's box:Plattsmouth heraldPlattsmouth, Nebraska :[Plattsmouth Herald Publishing Company]volumesDaily (except Sunday)texttxtrdacontentunmediatednrdamediavolumencrdacarrierBegan in 1883.Suspended after <January> 1885; resumed with 1st year, no. 1 (September 12, 1887).Other editions.: Plattsmouth weekly herald, 1882-1892, and: Plattsmouth herald, 1892-1894.Description based on: Volume 1, number [11] (March 9, 1883); title from masthead.Latest issue consulted: 5th year, number 262 (July 30, 1892)Plattsmouth (Neb.)Newspapers.Cass County (Neb.)Newspapers.Newspapers.fast(OCoLC)fst01423814NebraskaCass County.fast(OCoLC)fst01221880NebraskaPlattsmouth.fast(OCoLC)fst01222697United StatesNebraskaCassPlattsmouth.Plattsmouth weekly herald(DLC)2016270200Plattsmouth herald(DLC)2016270202 diff --git a/marc/2016270202/marc.xml b/marc/2016270202/marc.xml deleted file mode 100644 index 76521e8..0000000 --- a/marc/2016270202/marc.xml +++ /dev/null @@ -1 +0,0 @@ -00000cas a2200000 i 4500934730139160114d18921894nbuwr ne 0 0eng c2016270202LDLengrdaLDLOCLCFPlattsmouth herald.Weekly heraldPlattsmouth, Nebraska :Knotts Bros.volumesWeeklytexttxtrdacontentunmediatednrdamediavolumencrdacarrierBegan in 1892; ceased in Dec. 1894.Merged with: Semi-weekly news (Plattsmouth, Neb.), to form: Semi-weekly news-herald.Description based on: Vol. 29, no. 21 (August 18, 1892); title from masthead.Latest issue consulted: Vol. 30, no. 3 (April 13, 1893)Cass County (Neb.)Newspapers.Newspapers.fast(OCoLC)fst01423814NebraskaCass County.fast(OCoLC)fst01221880United StatesNebraskaCassPlattsmouth.Plattsmouth daily herald(DLC)2016270201Plattsmouth weekly herald(DLC)2016270200Semi-weekly news (Plattsmouth, Neb.)(DLC)2016270203Semi-weekly news-herald(DLC)2016270204 diff --git a/marc/2016270203/marc.xml b/marc/2016270203/marc.xml deleted file mode 100644 index 409a4ea..0000000 --- a/marc/2016270203/marc.xml +++ /dev/null @@ -1 +0,0 @@ -00000cas a2200000 i 4500934731090160114d189u1894nbucr ne 0 0eng c2016270203LDLengrdaLDLOCLCFSemi-weekly news (Plattsmouth, Neb.)Semi-weekly news.Plattsmouth, Nebraska :Polk Bros.volumesSemiweeklytexttxtrdacontentunmediatednrdamediavolumencrdacarrierCeased in Dec. 1894.Merged with: Plattsmouth herald, to form: Semi-weekly news-herald.Description based on: Vol. 2, no. 79 (June 2, 1894); title from masthead.Plattsmouth (Neb.)Newspapers.Newspapers.fast(OCoLC)fst01423814NebraskaPlattsmouth.fast(OCoLC)fst01222697United StatesNebraskaCassPlattsmouth.Evening news (Plattsmouth, Neb.)(DLC)sn2016270209Plattsmouth herald(DLC)2016270202Semi-weekly news-herald(DLC)2016270204 diff --git a/marc/2016270204/marc.xml b/marc/2016270204/marc.xml deleted file mode 100644 index 1cc68f9..0000000 --- a/marc/2016270204/marc.xml +++ /dev/null @@ -1 +0,0 @@ -00000cas a2200000 i 4500934731325160114d18951909nbucr ne 0 0eng c2016270204LDLengrdaLDLOCLCFSemi-weekly news-herald.Plattsmouth, Nebraska :News Pub. Co.,-1909.volumesSemiweeklytexttxtrdacontentunmediatednrdamediavolumencrdacarrierBegan in January 1895.-vol. 16, no. 65 (January 7, 1909).Continues numbering of: Semi-weekly news (Plattsmouth, Neb.).Formed by the union of: Plattsmouth herald, and: Semi-weekly news (Plattsmouth, Neb.).Description based on: Vol. 3, no. 62 (April 17, 1895); title from masthead.Latest issue consulted: Vol. 6, no. 55 (June 4, 1898)Cass County (Neb.)Newspapers.Newspapers.fast(OCoLC)fst01423814NebraskaCass County.fast(OCoLC)fst01221880United StatesNebraskaCassPlattsmouth.Evening news (Plattsmouth, Neb.)(DLC)sn2016270209Plattsmouth herald(DLC)2016270202Semi-weekly news (Plattsmouth, Neb.)(DLC)sn2006270203News-herald (Plattsmouth, Neb.)(DLC)2016270205 diff --git a/marc/2016270205/marc.xml b/marc/2016270205/marc.xml deleted file mode 100644 index b48b53e..0000000 --- a/marc/2016270205/marc.xml +++ /dev/null @@ -1 +0,0 @@ -00000cas a2200000 i 4500934729533160114d19091911nbuwr ne 0 0eng c2016270205LDLengrdaLDLOCLCFNews-herald (Plattsmouth, Neb.)The news-herald.Plattsmouth, Neb. :A.L. Tidd,1909-volumesWeekly,December 30, 1910-<August 11, 1911>Semiweekly,January 11, 1909-December 23, 1910texttxtrdacontentunmediatednrdamediavolumencrdacarrierVol. 45, no. 65 (January 11, 1909)-Ceased in September 1911.Cass County (Neb.)Newspapers.Newspapers.fast(OCoLC)fst01423814NebraskaCass County.fast(OCoLC)fst01221880United StatesNebraskaCassPlattsmouth.Semi-weekly news-herald(DLC)2016270204Plattsmouth news-herald(DLC)2016270208 diff --git a/marc/2016270206/marc.xml b/marc/2016270206/marc.xml deleted file mode 100644 index f306c4a..0000000 --- a/marc/2016270206/marc.xml +++ /dev/null @@ -1 +0,0 @@ -00000cas a2200000 i 4500934730335160114c19019999nbucr ne 0 0eng c2016270206LDLengrdaLDLOCLCFLDLThe Plattsmouth journal.Plattsmouth semi-weekly journal1914-1942Plattsmouth, Nebraska :Charles B. Mann and W.K. Fox,1901-volumesSemiweekly,January 2, 1908-Weekly,July 12, 1901-December 26, 1907texttxtrdacontentunmediatednrdamediavolumencrdacarrierVol. 21, no. 28 (July 12, 1901)-Absorbed: Plattsmouth news-herald, 1912; Nehawka enterprise, 1952; Elmwood leader-echo, 1953; Weeping Water Republican, 1986; and Louisville messenger, 1986.Other editions.: Plattsmouth daily journal (Plattsmouth, Neb. : 1902), 1902; Evening journal (Plattsmouth, Neb.), 1902-<1913>; Plattsmouth evening journal, <1916-1942>; and Plattsmouth daily journal (Plattsmouth, Neb. : 1949), 1949.Description based on: Vol. 21, no. 28 (July 12, 1901); title from masthead.Latest issue consulted: Vol. no. 55 (Sept. 28, 1939)Cass County (Neb.)Newspapers.NebraskaCass County.fast(OCoLC)fst01221880Newspapers.fast(OCoLC)fst01423814United StatesNebraskaCassPlattsmouth.Plattsmouth daily journal (Plattsmouth, Neb. : 1902)Evening journal (Plattsmouth, Neb.)(DLC)sn2016270209Plattsmouth evening journalPlattsmouth daily journal (Plattsmouth, Neb. : 1949)Plattsmouth weekly journal(DLC)2016270207Plattsmouth news-herald(DLC)2016270208Nehawka enterpriseElmwood leader-echoWeeping Water RepublicanLouisville messenger diff --git a/marc/2016270207/marc.xml b/marc/2016270207/marc.xml deleted file mode 100644 index eab221c..0000000 --- a/marc/2016270207/marc.xml +++ /dev/null @@ -1 +0,0 @@ -00000cas a2200000 i 4500934729186160114d18811901nbuwr ne 0 0eng c2016270207LDLengrdaLDLOCLCFPlattsmouth weekly journal.Issues for <April 23 1897-May 14, 1897> published as:Plattsmouth journalPlattsmouth, Neb. :C.W. Sherman,1881-1901.21 volumesWeekly,<September 14, 1900>-July 5, 1901Weekly,December 21, 1881-<May 14, 1897>Semiweekly,<June 21, 1898>-<April 12, 1899>texttxtrdacontentunmediatednrdamediavolumencrdacarrierVol. 1, no. 1 (December 21, 1881)-vol. 21, no. 27 (July 5, 1901).Other editions.: Plattsmouth daily journal, Daily journal (Plattsmouth, Neb.), Plattsmouth daily journal (Plattsmouth, Neb. : 1888), and: Daily journal (Plattsmouth, Neb. : 1898).Published in conjunction with: Murray weekly journal, <January>-May 1897.Description based on: Vol. 9, no. 48 (November 20, 1890); title from masthead.Latest issue consulted: Vol. 21, no. 27 (July 5, 1901)Plattsmouth (Neb.)Newspapers.Newspapers.fast(OCoLC)fst01423814NebraskaPlattsmouth.fast(OCoLC)fst01222697United StatesNebraskaCassPlattsmouth.Plattsmouth daily journalDaily journal (Plattsmouth, Neb.)Plattsmouth daily journal (Plattsmouth, Neb. : 1888)Daily journal (Plattsmouth, Neb. : 1898)Plattsmouth journal(DLC)2016270206Murray weekly journal diff --git a/marc/2016270208/marc.xml b/marc/2016270208/marc.xml deleted file mode 100644 index 3a56438..0000000 --- a/marc/2016270208/marc.xml +++ /dev/null @@ -1 +0,0 @@ -00000cas a2200000 i 4500934730154160114d19111912nbuwr ne 0 0eng c2016270208LDLengrdaLDLOCLCFThe Plattsmouth news-herald.Plattsmouth, Neb. :E.O. Mayfield,-1912.volumesWeeklytexttxtrdacontentunmediatednrdamediavolumencrdacarrierBegan in September 1911.-v. 49, no. 31 (November 15, 1912).Description based on: Vol. 48, no. 24 (September 29, 1911); title from masthead.Plattsmouth (Neb.)Newspapers.Cass County (Neb.)Newspapers.Newspapers.fast(OCoLC)fst01423814NebraskaCass County.fast(OCoLC)fst01221880NebraskaPlattsmouth.fast(OCoLC)fst01222697United StatesNebraskaCassPlattsmouth.News-herald (Plattsmouth, Neb.)(DLC)2016270205Plattsmouth journal(DLC)2016270206 diff --git a/marc/sn84022825/marc.xml b/marc/sn84022825/marc.xml deleted file mode 100644 index 18b8d45..0000000 --- a/marc/sn84022825/marc.xml +++ /dev/null @@ -1 +0,0 @@ -00000cas a22000007a 450010583989840402d18651882nbuwr ne 0 0eng dsn 84022825 NPXengNPXNPUNVGMGNOCLCQOCLCFOCLCOmscn-us-nbNebraska herald.Plattsmouth, N.T. [Neb.] :H.D. Hathawayv. ;65 cm.WeeklyBegan Apr. 12, 1865; ceased in 1882.Republican.Other ed.: Daily Nebraska herald, June 24-Nov. 27, 1872.Merged with: Plattsmouth enterprise, to form: Plattsmouth weekly herald.Description based on: Vol. 1, no. 2 (Apr. 19, 1865).Plattsmouth (Neb.)Newspapers.Newspapers.fast(OCoLC)fst01423814NebraskaPlattsmouth.fast(OCoLC)fst01222697United StatesNebraskaCassPlattsmouth.Daily Nebraska heraldMicrofilmNebraska State Historical Society(DLC)sn 95069726Plattsmouth enterprisePlattsmouth weekly herald From 737ebf90d77ae6fcf8ddd376d94e3d218770b671 Mon Sep 17 00:00:00 2001 From: Greg Tunink Date: Fri, 16 Jun 2023 16:13:10 -0500 Subject: [PATCH 4/5] Refine batch management doc - Fix ./manage.py commands missing .py extension - Fix typos and add code formatting - Add section about log files - Improve examples for longer commands and those with multiple placeholders to be clearer and easier to copy, paste, and modify - Add link to Newspaper Batch Information spreadsheet --- docs/batch-management.md | 81 +++++++++++++++++++++++++++++----------- 1 file changed, 60 insertions(+), 21 deletions(-) diff --git a/docs/batch-management.md b/docs/batch-management.md index d3209c4..0f1387e 100644 --- a/docs/batch-management.md +++ b/docs/batch-management.md @@ -28,28 +28,50 @@ command again. To avoid this we can prefix these long-running commands with the `nohup` utility and start them running in the background of the Bash shell by appending an `&` at the end of the command. -This is most useful for the load_batch command which loads a batches data into +This is most useful for `load_batch` which loads a batch's data into Open ONI as this command usually takes multiple hours to complete and it's not necessary to stay connected while this runs. For this, we take the normal command ```bash -./manage load_batch /var/local/newspapers/batch_nbu_(batch_name_ver##) > (batch_name).txt +./manage.py load_batch /var/local/newspapers/batch_nbu_(batch_name_ver##) > (batch_name).txt ``` and modify as follows ```bash -nohup ./manage load_batch /var/local/newspapers/batch_nbu_(batch_name_ver##) > (batch_name).txt & +nohup ./manage.py load_batch /var/local/newspapers/batch_nbu_(batch_name_ver##) > (batch_name).txt & ``` -One may also wish to use this pattern with `rsync` and `bagit.py` commands which -also run longer in case one expects to interrupt their network connection or -sleep / shutdown their computer. - We can also chain commands together by putting them in a bash script file and running it as `nohup load_multiple_batches.sh &`. +## Log Files +The use of `> (batch_name).txt` at the end ("redirection" in command line lingo) +saves the output from the command normally printed to one's terminal in a text +file. This can be done regardless of whether `nohup` is used, but without this +the `nohup` utility will save the output in `nohup.out`. We prefer to use a file +name that describes what action the text file is recording. This is often an +abbreviated form of the batch name. For example we can use `manyissues` +rather than the entire name `batch_nbu_manyissues_ver01`. + +We assume the default action of these files is a load_batch command, but +one may also wish to use `nohup` with `rsync`, `bagit.py`, or `purge_batch` +commands which also run longer in case one expects to interrupt their network +connection by disconnect from the VPN or sleep / shutdown of their computer. +For these other actions, please prepend the log file name with the action. +For example a `purge_batch` log file could be named `purge_manyissues.txt`. + +If having to repeat an action during troubleshooting, one might also wish to +date or number the log file so past action log files aren't overwritten, +e.g. `purge_manyissues-20230615.txt` or `purge_manyissues-2.txt`. After logs +are reviewed and any questions answered, please move them to a `log/` directory +adjacent to where they were created: + +```bash +mv *.txt log/ +``` + ## Transfer Batch Files to Dev Server SSH into the server @@ -68,14 +90,12 @@ mkdir libr1901-newspapers sudo mount -t cifs -o username=(active-directory-name but no "@unl.edu") \ //libr1901.unl.edu/newspapers/ libr1901-newspapers/ -# Example: username=gtunink2 -# NOT username=gtunink2@unl.edu +# Example: +# sudo mount -t cifs -o username=gtunink2 //libr1901.unl.edu/newspapers/ libr1901-newspapers/ # This will prompt you for your Active Directory password twice: # once for sudo, and once for connecting to libr1901 -# This transfer will take approximately an hour for most batches - # Batches on libr1901 have a directory structure like: batch_nbu_(batch name)/ ┣━━━ (lccn) @@ -89,10 +109,13 @@ batch_nbu_(batch name)/ # Note that omission and presence of trailing slashes here matters, # so type carefully and double-check before submitting the command! rsync -ahuX --del --info=progress2 --exclude=""*.tif"" libr1901-newspapers/batch_nbu_(batch_name) /var/local/newspapers/ +# Example: +# rsync -ahuX --del --info=progress2 --exclude=""*.tif"" libr1901-newspapers/batch_nbu_manyissues /var/local/newspapers/ ``` ### Record Batch Storage Info -Record under the Size columns on the Newspaper Batch Information spreadsheet +Record under the Size columns on the [Newspaper Batch Information +spreadsheet](https://uofnelincoln.sharepoint.com/:x:/r/sites/UNL-UniversityLibraries/DISC/cdrh/Shared%20Documents/CDRH%20Projects/Project%20Folders/Newspapers/Newspapers_General/Newspaper%20Batch%20Information.xlsx?d=w0f0beecd41ca47d48c581b4618d37c77&csf=1&web=1&e=bcK4Hb) ```bash # Size including tiffs on libr1901 @@ -123,6 +146,7 @@ cd /var/local/newspapers # Typically end with ver01 unless LoC corrections were made # Note in "ver" column of Newspaper Batch Information spreadsheet if not "ver01" mv batch_nbu_(batch_name) batch_nbu_(batch_name_ver##) +# Example: mv batch_nbu_manyissues batch_nbu_manyissues_ver01 # Set group-writeable, global-readable batch permissions # These commands may run for 10-15 minutes on the long end. Unfortunately there @@ -168,6 +192,7 @@ This step moves files into a `data/` directory and CANNOT BE RERUN without moving them back. The batch directory should look like this afterwards: +``` batch_nbu_(batch name_ver##)/ ┣━┳━ data/ ┃ ┣━━━ (lccn) @@ -179,6 +204,7 @@ batch_nbu_(batch name_ver##)/ ┣━━━ bagit.txt ┣━━━ manifest-md5.txt ┗━━━ tagmanifest-md5.txt +``` ```bash bagit.py --quiet --md5 batch_nbu_(batch_name_ver##) @@ -197,6 +223,9 @@ rm *.* # Move all files in the data/ directory to the current directory; remove data/ mv data/* . rmdir data + +# Return to parent directory +cd .. ``` Now one may rerun the creation command above @@ -209,9 +238,14 @@ cd /var/local/www/django/openoni # Load the Open ONI Python virtual environment . ENV/bin/activate +# It's not necessary for our use, but one my unload the virtual environment +# by later running the command `deactivate` + # This command usually takes many hours to complete so we use nohup and run the # command in the background so we can disconnect SSH and it will still run -nohup ./manage load_batch /var/local/newspapers/batch_nbu_(batch_name_ver##) > (batch_name).txt & +nohup ./manage.py load_batch /var/local/newspapers/batch_nbu_(batch_name_ver##) > (batch_name).txt & +# Example: +# nohup ./manage.py load_batch /var/local/newspapers/batch_nbu_manyissues_ver01 > manyissues.txt & # One can print the end of the log file to ensure the command is progressing tail (batch_name).txt @@ -220,8 +254,9 @@ tail (batch_name).txt # ingest process a little. Press Ctrl + C to quit following what's written. tail -f (batch_name).txt -# After completion, review the log for errors etc +# After completion, review the log for errors etc, move to log directory nano (batch_name).txt +mv (batch_name).txt log/ ``` If an error occurred during batch ingest, ask a developer for help parsing the @@ -233,7 +268,8 @@ America if an NDNP batch: - https://nebnewspapers.unl.edu/batches - https://chroniclingamerica.loc.gov/awardees/nbu/ -Record the issue and page counts in the Newspaper Batch Information spreadsheet +Record the issue and page counts in the [Newspaper Batch Information +spreadsheet](https://uofnelincoln.sharepoint.com/:x:/r/sites/UNL-UniversityLibraries/DISC/cdrh/Shared%20Documents/CDRH%20Projects/Project%20Folders/Newspapers/Newspapers_General/Newspaper%20Batch%20Information.xlsx?d=w0f0beecd41ca47d48c581b4618d37c77&csf=1&web=1&e=bcK4Hb) ### Word Coordinates Files Permissions Directories and files created during ingest process have permissions @@ -259,8 +295,8 @@ cd /var/local/newspapers/ # Transfer batch from dev server (~45min) rsync -ahuX --del --info=progress2 nebnewspapers-dev.unl.edu:/var/local/newspapers/batch_nbu_(batch_name_ver##) . - -cd /var/local/newspapers +# Example: +# rsync -ahuX --del --info=progress2 nebnewspapers-dev.unl.edu:/var/local/newspapers/batch_nbu_manyissues_ver01 . # Validate batch checksums and manifest # Initial fast validation; if validation errors occur, notify a developer @@ -278,7 +314,8 @@ cd /var/local/www/django/openoni # This command usually takes many hours to complete so we use nohup and run the # command in the background so we can disconnect SSH and it will still run -nohup ./manage load_batch /var/local/newspapers/batch_nbu_(batch_name_ver##) > (batch_name).txt & +nohup ./manage.py load_batch /var/local/newspapers/batch_nbu_(batch_name_ver##) > (batch_name).txt & +# Example: nohup ./manage.py load_batch /var/local/newspapers/batch_nbu_manyissues_ver01 > manyissues.txt & # One can print the end of the log file to ensure the command is progressing tail (batch_name).txt @@ -287,8 +324,9 @@ tail (batch_name).txt # ingest process a little. Press Ctrl + C to quit following what's written. tail -f (batch_name).txt -# After completion, review the log for errors etc +# After completion, review the log for errors etc, move to log directory nano (batch_name).txt +mv (batch_name).txt log/ ``` ## Purge a Batch @@ -300,12 +338,13 @@ cd /var/local/www/django/openoni # Load the Open ONI Python virtual environment . ENV/bin/activate -./manage purge_batch (batch_name) > purge_(batch_name).txt -# Example: ./manage purge_batch batch_nbu_keithsbear_ver02 > purge_keithsbear_ver02.txt +./manage.py purge_batch (batch_name) > purge_(batch_name).txt +# Example: ./manage.py purge_batch batch_nbu_manyissues_ver01 > purge_manyissues.txt # There is no progress output as the purge occurs, but we should review the log # after the command completes nano purge_(batch_name).txt +mv purge_(batch_name).txt log/ ``` ## Miscellaneous Tasks From f933cc933e9602259d17eb4fc9a8ab134d3b249e Mon Sep 17 00:00:00 2001 From: Greg Tunink Date: Wed, 18 Oct 2023 17:08:46 -0500 Subject: [PATCH 5/5] Augment for clarity and with examples --- docs/batch-management.md | 80 +++++++++++++++++++++++++--------------- 1 file changed, 51 insertions(+), 29 deletions(-) diff --git a/docs/batch-management.md b/docs/batch-management.md index 0f1387e..090d481 100644 --- a/docs/batch-management.md +++ b/docs/batch-management.md @@ -86,6 +86,7 @@ Mount the network share from libr1901 to transfer files to the server ```bash # Make directory to use as mount point cd [~] +# Only one time and can skip for future ingests mkdir libr1901-newspapers sudo mount -t cifs -o username=(active-directory-name but no "@unl.edu") \ @@ -96,12 +97,16 @@ sudo mount -t cifs -o username=(active-directory-name but no "@unl.edu") \ # This will prompt you for your Active Directory password twice: # once for sudo, and once for connecting to libr1901 -# Batches on libr1901 have a directory structure like: -batch_nbu_(batch name)/ -┣━━━ (lccn) -┣━━━ ... -┣━━━ batch.xml -┗━━━ batch_1.xml +# Batches on libr1901 should have a directory structure like: +libr1901-newspapers/batch_nbu_(batch name)/ +├─── (lccn) +│ └── (reel) +├─── ... +├─── batch.xml +└─── batch_1.xml + +# View tree of top directories and files to confirm expected structure +tree -L 2 libr1901-newspapers/batch_nbu_(batch name)/ # This rsync command is resumable if interrupted and usually takes up to 45 min. # The % progress and time estimates shown are often optimistically inaccurate. @@ -121,7 +126,7 @@ spreadsheet](https://uofnelincoln.sharepoint.com/:x:/r/sites/UNL-UniversityLibra # Size including tiffs on libr1901 du -sh libr1901-newspapers/batch_nbu_(batch_name) -# Size without tiffs (because the batch was moved over without them) +# Size without tiffs (because the batch was rsynced over without them) du -sh /var/local/newspapers/batch_nbu_(batch_name) ``` @@ -148,11 +153,21 @@ cd /var/local/newspapers mv batch_nbu_(batch_name) batch_nbu_(batch_name_ver##) # Example: mv batch_nbu_manyissues batch_nbu_manyissues_ver01 + # Set group-writeable, global-readable batch permissions # These commands may run for 10-15 minutes on the long end. Unfortunately there # is no way to know its progress, so please be patient. chmod -R g+rwX batch_nbu_(batch_name_ver##) chmod -R o+rX batch_nbu_(batch_name_ver##) + +# Check permissions include group write permissions now +tree -pL 2 batch_nbu_(batch_name_ver##) + +batch_nbu_(batch_name_ver##)/ +├── [-rwxrwxr-x] batch_1.xml +├── [-rwxrwxr-x] batch.xml +└── [drwxrwxr-x] (lccn)) + └── [drwxrwxr-x] (reel) ``` Create README.txt file with fields noted in example @@ -184,35 +199,39 @@ loc: yes description: Corrected version received from LoC ``` -### Create Checksums and Manifest -There should be no more editing of files after creating checksums and manifest -unless errors are being corrected and new checksums and manifest are created. +### Create Checksums and Manifests +There should be no more editing of files after creating checksums and manifests +unless ingest errors are being corrected. This step moves files into a `data/` +directory and SHOULD NOT BE RERUN without moving them back out before new +checksums and manifests are created. + +```bash +bagit.py --quiet --md5 batch_nbu_(batch_name_ver##) +``` -This step moves files into a `data/` directory -and CANNOT BE RERUN without moving them back. The batch directory should look like this afterwards: ``` -batch_nbu_(batch name_ver##)/ -┣━┳━ data/ -┃ ┣━━━ (lccn) -┃ ┣━━━ ... -┃ ┣━━━ batch.xml -┃ ┣━━━ batch_1.xml -┃ ┗━━━ README.txt -┣━━━ bag-info.txt -┣━━━ bagit.txt -┣━━━ manifest-md5.txt -┗━━━ tagmanifest-md5.txt +batch_nbu_(batch_name_ver##)/ +├── bag-info.txt +├── bagit.txt +├── data +│   ├── batch_1.xml +│   ├── batch.xml +│   ├── README.txt +│   └── (lccn)) +├── manifest-md5.txt +└── tagmanifest-md5.txt ``` ```bash -bagit.py --quiet --md5 batch_nbu_(batch_name_ver##) +tree -L 2 batch_nbu_(batch_name_ver##) ``` -#### Reset Files to Recreate Checksums and Manifest -If the checksums and manifest creation is interrupted or errors need corrected -afterwards, we must delete them and move files out of the `data/` directory +#### Reset Files to Recreate Checksums and Manifests +**Only if** the checksums and manifest creation is **interrupted** or **errors +need corrected after a failed ingest**, we must delete them and move files out +of the `data/` directory ```bash cd /var/local/newspapers/batch_nbu_(batch_name_ver##) @@ -228,7 +247,8 @@ rmdir data cd .. ``` -Now one may rerun the creation command above +Return to [Create Checksums and Manifests](#create-checksums-and-manifests) above +to recreate ## Dev Server Ingest @@ -260,7 +280,9 @@ mv (batch_name).txt log/ ``` If an error occurred during batch ingest, ask a developer for help parsing the -log file. +log file. We will likely need to edit files to fix errors and then follow steps +in [Reset Files to Recreate Checksums and +Manifests](#reset-files-to-recreate-checksums-and-manifests). After batch ingest completes, compare the batch info in Open ONI and Chronicling America if an NDNP batch: