-
Notifications
You must be signed in to change notification settings - Fork 10
Metadata manipulator: NormalizeDate
This metadata manipulator converts dates in specifc MODs <originInfo>
child elements to w3cdtf (yyyy-mm-dd) formatted dates. Also logs invalid dates.
Can be used within any toolchain (i.e., is not specific to CONTENTdm CSV, etc.) that uses a MODS metadata parser.
To register this manipulator in your toolchain, add an entry similar to the following to the "[MANIPULATORS]" section of your .ini file:
metadatamanipulators[] = "NormalizeDate|Date|dateIssued|m"
This manipulator takes three parameters:
- The first parameter (required) is the name of the date field in the raw metadata that you want to check for formatting. In the example above, that field is 'Date'. For CONTENTdm metadata, the field should usually be 'date' (lowercase d).
- What does 'usually' mean? In CONTENTdm toolchains, the field is identified by its internal name, or in CONTENTdm jargon, 'nickname'. If the human-readable field name in CONTENTdm is 'Date', the nickname should be 'date'. Try this to see if it works. If it doesn't look at the raw metadata that is written to your toolchain's temp directory, or use a utility like cdminspect to view all of the field nicknames for your collection.
- The second parameter (required) is the name of the child of
<originInfo>
that you want to add the reformatted date to. In the example above, that element isdateCreated
. Allowed child elements aredateIssued
,dateCreated
,dateCaptured
,dateValid
,dateModified
,copyrightDate
, anddateOther
. The value of this parameter must correspond to the snippet in the entry for the source metadata date element in your mappings file. - The third parameter (optional) takes the value 'm' to indicate that for date patterns that may be interpreted as either having the day first or the month first (e.g., 10/12/1978) the month is to be interpreted as the first part of the date value. The default (i.e., when the 'm' parameter is not included) is to interpret the first part of the date as the day.
This metadata manipulator normalizes a date from the source metadata for use within MODS' originInfo dateIssued, dateCreated, dateCaptured, dateValid, dateModified, copyrightDate, and dateOther child elements.
The date string in the source metadata field is matched against two regular expressions ('/^(\d\d)\-(\d\d)\-(\d\d\d\d)$/'
and '/^(\d\d\d\d)\s+(\d\d)\s+(\d\d)$/'
) and if a match occurs, the string is reformatted into the yyyy-mm-dd format and the MODS element containing the original date is replaced with an identical element containing the reformatted date.
This manipulator also logs invalid dates after it performs it normalization. This behaviour is not configurable; it happens automatically. Invalid dates receive entries in the manipulator log like this:
[2016-04-09 08:18:07] config.WARNING: NormalizeDate {"Record key":1737,"Normalized date value is not a valid date":"1942-19-06"} []
Additional regular expressions and accompanying logic can be added to this manipulator's class file (src/metadatamanipulators/NormalizeDate.php
) as needed. If you add more regular expressions, you probably should add some corresponding tests to tests/MetadataManipulatorTest.php
.
Content on the Move to Islandora Kit wiki is licensed under a Creative Commons Attribution 4.0 International License.