Skip to content

Commit

Permalink
Merge pull request #171 from collectionspace/nullvalue-handling-opt
Browse files Browse the repository at this point in the history
Add "empty" handling option for %NULLVALUE% field values
  • Loading branch information
kspurgin authored Jan 26, 2024
2 parents 97a5000 + 929b646 commit 8363a10
Show file tree
Hide file tree
Showing 16 changed files with 183 additions and 13 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ This project bumps the version number for any changes (including documentation u

## [Unreleased] - i.e. pushed to main branch but not yet tagged as a release

## [5.0.3] - 2024-01-26
- Add `null_value_string_handling` batch configuration option, with ability to switch to creating empty string nodes, rather than deleting nodes.

## [5.0.2] - 2023-12-19
- BUGFIX for [#148](https://github.com/collectionspace/collectionspace-mapper/issues/148)

Expand Down
17 changes: 16 additions & 1 deletion Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ GIT
PATH
remote: .
specs:
collectionspace-mapper (5.0.2)
collectionspace-mapper (5.0.3)
activesupport (= 7.0.4.3)
chronic
collectionspace-client (~> 0.15.0)
Expand Down Expand Up @@ -48,6 +48,9 @@ GEM
concurrent-ruby (1.2.2)
crack (0.4.5)
rexml
debug (1.8.0)
irb (>= 1.5.0)
reline (>= 0.3.1)
diff-lcs (1.5.0)
docile (1.4.0)
dry-configurable (0.16.1)
Expand All @@ -66,6 +69,10 @@ GEM
multi_xml (>= 0.5.2)
i18n (1.12.0)
concurrent-ruby (~> 1.0)
io-console (0.6.0)
irb (1.8.0)
rdoc (~> 6.5)
reline (>= 0.3.6)
json (2.6.3)
language_server-protocol (3.17.0.3)
lint_roller (1.1.0)
Expand All @@ -85,12 +92,18 @@ GEM
pry (0.14.2)
coderay (~> 1.1)
method_source (~> 1.0)
psych (5.1.0)
stringio
public_suffix (5.0.1)
racc (1.7.1)
rainbow (3.1.1)
rake (13.0.6)
rdoc (6.5.0)
psych (>= 4.0.0)
redis (4.2.5)
regexp_parser (2.8.1)
reline (0.3.8)
io-console (~> 0.5)
rexml (3.2.6)
rspec (3.12.0)
rspec-core (~> 3.12.0)
Expand Down Expand Up @@ -149,6 +162,7 @@ GEM
standard-performance (1.2.0)
lint_roller (~> 1.1)
rubocop-performance (~> 1.19.0)
stringio (3.0.8)
tzinfo (2.0.6)
concurrent-ruby (~> 1.0)
unicode-display_width (2.4.2)
Expand All @@ -168,6 +182,7 @@ PLATFORMS
DEPENDENCIES
almost_standard!
collectionspace-mapper!
debug
pry (~> 0.14)
rake (~> 13.0)
rspec
Expand Down
1 change: 1 addition & 0 deletions collectionspace-mapper.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ Gem::Specification.new do |spec|
spec.add_dependency "xxhash", ">= 0.4.0"
spec.add_dependency "zeitwerk", "~> 2.5"

spec.add_development_dependency "debug"
spec.add_development_dependency "pry", "~>0.14"
spec.add_development_dependency "rake", "~> 13.0"
spec.add_development_dependency "rspec"
Expand Down
13 changes: 13 additions & 0 deletions doc/batch_configuration.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,19 @@ While it is possible to use this setting to batch update existing records that d
- *Data type*: string
- *Allowed values*: `fail`, `use_first`

== null_value_string_handling

Controls how fields containing `%NULLVALUE%` are handled.

The default is `delete`, which is what the behavior has been all along. The effect of this is that, if you are loading data into a repeating field group and all the values for a given row are `%NULLVALUE%`, that row will be removed. This is generally desirable, as it prevents the creation of empty rows.

However, for some more complex ingest processes, you may need to invoke the `empty` option. One example of this is if you are loading Associated date data into Objects, and you are using `batch mode: date details` to load the structured date fields. Some objects have multiple date values in this field, and not all of the values have an associated date values also have assocDateType or assocDateNote values. If you do a subsequent normal ingest to populate those fields, and empty rows are deleted, the date values associated with the empty rows will also be deleted. If you switch to treating ``%NULLVALUE%`` as an empty string instead, then the empty rows (in the second ingest) associated with date values lacking types or notes will be loaded, preventing the deletion of dates.

- *Required?:* no
- *Defaults to:* `delete`
- *Data type*: string
- *Allowed values*: `delete`, `empty`

== response_mode

If `normal`, `Mapper::Response.orig_data` returns the original data hash, and `Mapper::Response.doc` returns the resulting XML document.
Expand Down
1 change: 1 addition & 0 deletions lib/collectionspace/mapper/batch_config.rb
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ class BatchConfig
date_format: ["month day year", "day month year"],
force_defaults: ["true", "false", true, false],
multiple_recs_found: %w[fail use_first],
null_value_string_handling: %w[delete empty],
response_mode: %w[normal verbose],
search_if_not_cached: ["true", "false", true, false],
status_check_method: %w[client cache],
Expand Down
20 changes: 17 additions & 3 deletions lib/collectionspace/mapper/data_mapper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ def initialize(response, handler)
end
set_identifier_value
clean_doc
defuse_bomb
add_namespaces
response.add_doc(doc)
end
Expand Down Expand Up @@ -78,9 +77,24 @@ def map(xpath)
end

def clean_doc
remove_blank_nodes
handle_null_value_strings
defuse_bomb
end

def remove_blank_nodes
doc.traverse { |node| node.remove unless node.text.match?(/\S/m) }
end

def handle_null_value_strings
doc.traverse do |node|
node.remove if node.text == "%NULLVALUE%"
node.remove unless node.text.match?(/\S/m)
case handler.config.batch.null_value_string_handling
when "delete"
node.remove if node.text == "%NULLVALUE%"
node.remove unless node.text.match?(/\S/m)
when "empty"
node.content = "" if node.text == "%NULLVALUE%"
end
end
end

Expand Down
1 change: 1 addition & 0 deletions lib/collectionspace/mapper/handler_full_record.rb
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ class HandlerFullRecord
setting :delimiter, default: "|", reader: true
setting :force_defaults, default: false, reader: true
setting :multiple_recs_found, default: "fail", reader: true
setting :null_value_string_handling, default: "delete", reader: true
setting :response_mode, default: "normal", reader: true
setting :search_if_not_cached, default: true, reader: true
setting :status_check_method, default: "client", reader: true
Expand Down
6 changes: 2 additions & 4 deletions lib/collectionspace/mapper/term_handler.rb
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,8 @@ def handle_terms

def handle_term(val)
@value = val
return "" if val.blank? || val == "%NULLVALUE%"
if val == CollectionSpace::Mapper.bomb
return CollectionSpace::Mapper.bomb
end
return val if val.blank? || val == "%NULLVALUE%" ||
val == CollectionSpace::Mapper.bomb

added = false

Expand Down
2 changes: 1 addition & 1 deletion lib/collectionspace/mapper/version.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@

module CollectionSpace
module Mapper
VERSION = "5.0.2"
VERSION = "5.0.3"
end
end
31 changes: 30 additions & 1 deletion spec/collectionspace/mapper/data_mapper_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,6 @@
let(:mapper) { "core_7-2-0_collectionobject" }

context "overflow subgroup record with uneven subgroup values" do
# skip: "subgroup complications" do
let(:customcfg) { {delimiter: "|"} }
let(:datahash_path) do
"spec/support/datahashes/core/collectionobject2.json"
Expand Down Expand Up @@ -88,6 +87,36 @@
it_behaves_like "Mapped"
end

context "with %NULLVALUE% field values" do
let(:datahash_path) do
"spec/support/datahashes/core/collectionobject7.json"
end

context "with %NULLVALUE% nodes deleted" do
let(:customcfg) do
{
delimiter: "|",
null_value_string_handling: "delete"
}
end
let(:fixture_path) { "core/collectionobject7_deleted.xml" }

it_behaves_like "MappedWithBlanks"
end

context "with %NULLVALUE% nodes empty" do
let(:customcfg) do
{
delimiter: "|",
null_value_string_handling: "empty"
}
end
let(:fixture_path) { "core/collectionobject7_empty.xml" }

it_behaves_like "MappedWithBlanks"
end
end

context "overflow subgroup record with even subgroup values" do
let(:datahash_path) do
"spec/support/datahashes/core/collectionobject3.json"
Expand Down
4 changes: 2 additions & 2 deletions spec/collectionspace/mapper/term_handler_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@
it "result is the transformed value for mapping",
vcr: "term_handler_result_titletranslationlanguage" do
expected = [
["",
["%NULLVALUE%",
"urn:cspace:c.core.collectionspace.org:vocabularies:name"\
"(languages):item:name(swa)'Swahili'"],
["",
Expand All @@ -74,7 +74,7 @@
"(citation):item:name(Arthur62605812848)'Arthur'",
"urn:cspace:c.core.collectionspace.org:citationauthorities:name"\
"(citation):item:name(Harding2510592089)'Harding'",
""
"%NULLVALUE%"
]
expect(result).to eq(expected)
end
Expand Down
54 changes: 54 additions & 0 deletions spec/support/cassettes/core_domain_check.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions spec/support/datahashes/core/collectionobject7.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"objectNumber": "36548",
"assocDateType": "Print|%NULLVALUE%|Negative",
"assocDateNote": "Note1|%NULLVALUE%|Note2"
}
2 changes: 1 addition & 1 deletion spec/support/matchers/match_doc.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
module MatchDocMatcher
class MatchDocMatcher
def initialize(fixture_path, handler, mode: :normal, blanks: :drop)
delblank = blanks == :drop
delblank = (blanks == :drop)
@fixture_doc = get_xml_fixture(fixture_path, delblank)
@fixture_xpaths = test_xpaths(fixture_doc, handler.record.mappings)
@mode = mode
Expand Down
16 changes: 16 additions & 0 deletions spec/support/xml/core/collectionobject7_deleted.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
<?xml version="1.0"?>
<document>
<ns2:collectionobjects_common xmlns:ns2="http://collectionspace.org/services/collectionobject" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<assocDateGroupList>
<assocDateGroup>
<assocDateType>Print</assocDateType>
<assocDateNote>Note1</assocDateNote>
</assocDateGroup>
<assocDateGroup>
<assocDateType>Negative</assocDateType>
<assocDateNote>Note2</assocDateNote>
</assocDateGroup>
</assocDateGroupList>
<objectNumber>36548</objectNumber>
</ns2:collectionobjects_common>
</document>
20 changes: 20 additions & 0 deletions spec/support/xml/core/collectionobject7_empty.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
<?xml version="1.0"?>
<document>
<ns2:collectionobjects_common xmlns:ns2="http://collectionspace.org/services/collectionobject" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<assocDateGroupList>
<assocDateGroup>
<assocDateType>Print</assocDateType>
<assocDateNote>Note1</assocDateNote>
</assocDateGroup>
<assocDateGroup>
<assocDateType></assocDateType>
<assocDateNote></assocDateNote>
</assocDateGroup>
<assocDateGroup>
<assocDateType>Negative</assocDateType>
<assocDateNote>Note2</assocDateNote>
</assocDateGroup>
</assocDateGroupList>
<objectNumber>36548</objectNumber>
</ns2:collectionobjects_common>
</document>

0 comments on commit 8363a10

Please sign in to comment.