Skip to content

Commit

Permalink
Merge pull request #136 from collectionspace/gh-132-case-insensitive-…
Browse files Browse the repository at this point in the history
…vocab-term-lookup

Protections against introducing data integrity problems
  • Loading branch information
kspurgin authored Jan 21, 2022
2 parents ed43713 + 232becc commit 7ee6974
Show file tree
Hide file tree
Showing 95 changed files with 1,798 additions and 1,327 deletions.
3 changes: 2 additions & 1 deletion Gemfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# frozen_string_literal: true

source 'https://rubygems.org'
git_source(:github){|repo_name| "https://github.com/#{repo_name}" }
git_source(:github){ |repo_name| "https://github.com/#{repo_name}" }

ruby '2.7.4'

Expand All @@ -10,4 +10,5 @@ gem 'facets', require: false
# Specify your gem's dependencies in collectionspace-mapper.gemspec
gem 'collectionspace-client', tag: 'v0.10.0', git: 'https://github.com/collectionspace/collectionspace-client.git'
gem 'collectionspace-refcache', tag: 'v0.7.7', git: 'https://github.com/collectionspace/collectionspace-refcache.git'

gemspec
16 changes: 9 additions & 7 deletions collectionspace-mapper.gemspec
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# frozen_string_literal: true

lib = File.expand_path('../lib', __FILE__)
lib = File.expand_path('lib', __dir__)
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
require 'collectionspace/mapper/version'

Expand All @@ -10,12 +10,12 @@ Gem::Specification.new do |spec|
spec.authors = ['Kristina Spurgin']
spec.email = ['[email protected]']

spec.summary = %q{Generic mapper turns hash of data into CollectionSpace XML}
spec.summary = 'Generic mapper turns hash of data into CollectionSpace XML'
spec.homepage = 'https://github.com/lyrasis/collectionspace-mapper'
spec.license = 'MIT'

spec.required_ruby_version = '>= 2.7.4'

# Prevent pushing this gem to RubyGems.org. To allow pushes either set the 'allowed_push_host'
# to allow pushing to a single host or delete this section to allow pushing to any host.
if spec.respond_to?(:metadata)
Expand All @@ -26,12 +26,12 @@ Gem::Specification.new do |spec|
spec.metadata['changelog_uri'] = 'https://github.com/lyrasis/collectionspace-mapper'
else
raise 'RubyGems 2.0 or newer is required to protect against ' \
'public gem pushes.'
'public gem pushes.'
end

# Specify which files should be added to the gem when it is released.
# The `git ls-files -z` loads the files in the RubyGem that have been added into git.
spec.files = Dir.chdir(File.expand_path('..', __FILE__)) do
spec.files = Dir.chdir(File.expand_path(__dir__)) do
`git ls-files -z`.split("\x0").reject{ |f| f.match(%r{^(test|spec|features)/}) }
end
spec.bindir = 'exe'
Expand All @@ -45,12 +45,14 @@ Gem::Specification.new do |spec|
spec.add_dependency 'xxhash', '>= 0.4.0'

spec.add_development_dependency 'bundler', '>= 2.1.2'
spec.add_development_dependency 'byebug'
spec.add_development_dependency 'pry'
spec.add_development_dependency 'pry-byebug'
spec.add_development_dependency 'rake', '>= 13.0.1'
spec.add_development_dependency 'rspec', '~> 3.0'
spec.add_development_dependency 'rubocop', '~> 1.18.3'

# Uncomment these if you need to use the scripts in utils/benchmarking
#spec.add_development_dependency 'ruby-prof', '~> 1.4.3'
#spec.add_development_dependency 'time_up', '~> 0.0.7'
# spec.add_development_dependency 'ruby-prof', '~> 1.4.3'
# spec.add_development_dependency 'time_up', '~> 0.0.7'
end
17 changes: 0 additions & 17 deletions doc/batch_configuration.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@ A JSON config hash may be passed to a new `Mapper::DataHandler` to control vario
"response_mode": "verbose",
"strip_id_values": true,
"multiple_recs_found": "fail",
"check_terms" : true,
"check_record_status" : true,
"force_defaults": false,
"date_format": "month day year",
Expand Down Expand Up @@ -118,22 +117,6 @@ While it is possible to use this setting to batch update existing records that d
- *Data type*: string
- *Allowed values*: `fail`, `use_first`

== check_terms

If `true`, looks up each term via `collectionspace-refcache`. If found, uses existing refname. If not found, searches for term via cspace-services API and uses existing refname if found. If term not found in refcache or services API, builds a new refname, uses that in the record, adds it to refcache, and returns the term with `found=false` in `Response::Terms`.

If `false`, never searches services API for the term. Uses refcache refname if it exists, otherwise builds a new refname and adds it to refcache. Returns all terms with `found=false` in `Response::Terms`.

[NOTE]
====
Set this to false only if you are certain no terms from your data exist in CollectionSpace, and all of the terms need to be created as new. Otherwise, you may end up with duplicate terms being added to CollectionSpace, due to the fact that `collectionspace-mapper` does not generate exactly the same hashed short identifier value for use in the refname as the CollectionSpace application does.
====

- *Required?:* yes
- *Defaults to:* true
- *Data type*: boolean
- *Allowed values*: `true`, `false`

== check_record_status

If `true`, looks up each record via cspace-services API and sets `Response.record_status` to `:exists` if the record is found, or `:new` if it is not.
Expand Down
2 changes: 1 addition & 1 deletion lib/collectionspace/mapper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ module Errors
class UnprocessableDataError < StandardError
UnprocessableDataError = CollectionSpace::Mapper::Errors::UnprocessableDataError
attr_reader :input

def initialize(message, input)
super(message)
@input = input
Expand Down Expand Up @@ -76,6 +77,5 @@ def merge_default_values(data, batchconfig)
def term_key(term)
"#{term[:refname].type}-#{term[:refname].subtype}-#{term[:refname].display_name}"
end

end
end
4 changes: 1 addition & 3 deletions lib/collectionspace/mapper/authority_hierarchy_prepper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,7 @@ def transform_terms
end

@response.split_data.each do |field, value|
unless @response.transformed_data.key?(field)
@response.transformed_data[field] = value
end
@response.transformed_data[field] = value unless @response.transformed_data.key?(field)
end
end

Expand Down
5 changes: 1 addition & 4 deletions lib/collectionspace/mapper/authority_transformer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,8 @@

module CollectionSpace
module Mapper

# transforms authority display name into RefName
class AuthorityTransformer < Transformer

def initialize(opts)
super
@type = opts[:transform][0]
Expand All @@ -16,8 +14,7 @@ def initialize(opts)
@csclient = opts[:recmapper].csclient
end

def transform(value)
end
def transform(value); end
end
end
end
4 changes: 1 addition & 3 deletions lib/collectionspace/mapper/behrensmeyer_transformer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,9 @@

module CollectionSpace
module Mapper

# transforms digit into full Behrensmeyer scale vocabulary term
class BehrensmeyerTransformer < Transformer
def transform(value)
end
def transform(value); end
end
end
end
1 change: 0 additions & 1 deletion lib/collectionspace/mapper/boolean_transformer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@

module CollectionSpace
module Mapper

# transforms a variety of binary values into Boolean string values for CS
class BooleanTransformer < Transformer
def transform(value)
Expand Down
1 change: 1 addition & 0 deletions lib/collectionspace/mapper/column_mapping.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ module Mapper
class ColumnMapping
attr_reader :recmapper, :data_type, :fieldname, :in_repeating_group, :is_group, :namespace, :opt_list_values,
:repeats, :source_type, :transforms, :xpath

def initialize(mapping_hash, recmapper)
@recmapper = recmapper
mapping_hash.each do |key, value|
Expand Down
1 change: 1 addition & 0 deletions lib/collectionspace/mapper/column_mappings.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ class ColumnMappings
extend Forwardable

attr_reader :config

def_delegators :@all, :each, :length, :map, :reject!, :select

def initialize(opts = {})
Expand Down
1 change: 0 additions & 1 deletion lib/collectionspace/mapper/column_value.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

module CollectionSpace
module Mapper

# represents a row of data from a CSV.
class ColumnValue
def initialize(column:, value:, recmapper:, mapping:)
Expand Down
30 changes: 14 additions & 16 deletions lib/collectionspace/mapper/config.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,37 +4,38 @@

module CollectionSpace
module Mapper

# This is the default config, which is modified for object or authority hierarchy,
# or non-hierarchichal relationships via module extension
# :reek:InstanceVariableAssumption - instance variables are set during initialization
class Config
attr_reader :delimiter, :subgroup_delimiter, :response_mode, :strip_id_values, :multiple_recs_found, :force_defaults,
:check_record_status, :check_terms, :date_format, :two_digit_year_handling, :transforms, :default_values,
:record_type
# todo: move default config in here
:check_record_status, :date_format, :two_digit_year_handling, :transforms, :default_values,
:record_type

# TODO: move default config in here
include Tools::Symbolizable

DEFAULT_CONFIG = {delimiter: '|',
subgroup_delimiter: '^^',
response_mode: 'normal',
strip_id_values: true,
multiple_recs_found: 'fail',
check_terms: true,
check_record_status: true,
force_defaults: false,
date_format: 'month day year',
two_digit_year_handling: 'coerce'
}
two_digit_year_handling: 'coerce'}

class ConfigKeyMissingError < StandardError
attr_reader :keys

def initialize(message, keys)
super(message)
@keys = keys
end
end

class ConfigResponseModeError < StandardError; end

class UnhandledConfigFormatError < StandardError; end

def initialize(opts = {})
Expand All @@ -57,7 +58,7 @@ def initialize(opts = {})
end

def hash
config = self.to_h
config = to_h
config = symbolize(config)
transforms = config[:transforms]
return config unless transforms
Expand Down Expand Up @@ -97,13 +98,13 @@ def set_instance_variables(hash)
def validate
begin
has_required_attributes
rescue ConfigKeyMissingError => err
err.keys.each{ |key| instance_variable_set("@#{key}", DEFAULT_CONFIG[key]) }
rescue ConfigKeyMissingError => e
e.keys.each{ |key| instance_variable_set("@#{key}", DEFAULT_CONFIG[key]) }
end

begin
valid_response_mode
rescue ConfigResponseModeError => err
rescue ConfigResponseModeError => e
replacement_value = DEFAULT_CONFIG[:response_mode]
@response_mode = replacement_value
end
Expand All @@ -112,16 +113,14 @@ def validate
def valid_response_mode
valid = %w[normal verbose]
unless valid.any?(@response_mode)
raise ConfigResponseModeError.new("Invalid response_mode value in config: #{@response_mode}")
raise ConfigResponseModeError, "Invalid response_mode value in config: #{@response_mode}"
end
end

def has_required_attributes
required_keys = DEFAULT_CONFIG.keys
remaining_keys = required_keys - hash.keys
unless remaining_keys.empty?
raise ConfigKeyMissingError.new('Config missing key', remaining_keys)
end
raise ConfigKeyMissingError.new('Config missing key', remaining_keys) unless remaining_keys.empty?
end

def special_defaults
Expand All @@ -130,4 +129,3 @@ def special_defaults
end
end
end

31 changes: 16 additions & 15 deletions lib/collectionspace/mapper/data_handler.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@

module CollectionSpace
module Mapper

# given a RecordMapper hash and a data hash, returns CollectionSpace XML document
class DataHandler
# this is an accessor rather than a reader until I refactor away the hideous
Expand All @@ -32,7 +31,7 @@ def process(data)
end

def prep(data)
response = CollectionSpace::Mapper::setup_data(data, @mapper.batchconfig)
response = CollectionSpace::Mapper.setup_data(data, @mapper.batchconfig)
if response.valid?
case @mapper.record_type
when 'authorityhierarchy'
Expand Down Expand Up @@ -74,7 +73,7 @@ def service_type
end

def validate(data)
response = CollectionSpace::Mapper::setup_data(data, @mapper.batchconfig)
response = CollectionSpace::Mapper.setup_data(data, @mapper.batchconfig)
validator.validate(response)
end

Expand Down Expand Up @@ -116,12 +115,12 @@ def xpath_hash

# populate parent of all non-top xpaths
h.each do |xpath, ph|
if xpath['/']
keys = h.keys - [xpath]
keys = keys.select{ |k| xpath[k] }
keys = keys.sort{ |a, b| b.length <=> a.length }
ph[:parent] = keys[0] unless keys.empty?
end
next unless xpath['/']

keys = h.keys - [xpath]
keys = keys.select{ |k| xpath[k] }
keys = keys.sort{ |a, b| b.length <=> a.length }
ph[:parent] = keys[0] unless keys.empty?
end

# populate children
Expand All @@ -145,13 +144,15 @@ def xpath_hash
if v.size > 1
puts "WARNING: #{xpath} has fields with different :in_repeating_group values (#{v}). Defaulting to treating NOT as a group"
end
ph[:is_group] =
true if ct == 1 && v == ['as part of larger repeating group'] && ph[:mappings][0].repeats == 'y'
if ct == 1 && v == ['as part of larger repeating group'] && ph[:mappings][0].repeats == 'y'
ph[:is_group] =
true
end
end

# populate is_subgroup
subgroups = []
h.each{ |k, v| subgroups << v[:subgroups] }
h.each{ |_k, v| subgroups << v[:subgroups] }
subgroups = subgroups.flatten.uniq
h.keys.each{ |k| h[k][:is_subgroup] = true if subgroups.include?(k) }
h
Expand Down Expand Up @@ -186,7 +187,7 @@ def set_record_status(response)
status = searchresult[:status]
response.record_status = status
return if status == :new

response.csid = searchresult[:csid]
response.uri = searchresult[:uri]
response.refname = searchresult[:refname]
Expand All @@ -202,10 +203,10 @@ def tag_terms(result)
return if terms.empty?

terms.select{ |t| !t[:found] }.each do |term|
@new_terms[CollectionSpace::Mapper::term_key(term)] = nil
@new_terms[CollectionSpace::Mapper.term_key(term)] = nil
end
terms.select{ |t| t[:found] }.each do |term|
term[:found] = false if @new_terms.key?(CollectionSpace::Mapper::term_key(term))
term[:found] = false if @new_terms.key?(CollectionSpace::Mapper.term_key(term))
end

result.terms = terms
Expand Down
Loading

0 comments on commit 7ee6974

Please sign in to comment.