Skip to content

Metadata manipulator: NormalizeDate

Mark Jordan edited this page Apr 9, 2016 · 10 revisions

Overview

This metadata manipulator converts dates in specifc MODs <originInfo> child elements to w3cdtf (yyyy-mm-dd) formatted dates. Also logs invalid dates.

Toolchains

Can be used within any toolchain (i.e., is not specific to CONTENTdm CSV, etc.) that uses a MODS metadata parser.

Configuration

To register this manipulator in your toolchain, add an entry similar to the following to the "[MANIPULATORS]" section of your .ini file:

metadatamanipulators[] = "NormalizeDate|Date|dateIssued|m"

Parameters

This manipulator takes three parameters:

  • The first parameter (required) is the name of the date field in the raw metadata that you want to check for formatting. In the example above, that field is 'Date'. For CONTENTdm metadata, the field should usually be 'date' (lowercase d).
    • What does 'usually' mean? In CONTENTdm toolchains, the field is identified by its internal name, or in CONTENTdm jargon, 'nickname'. If the human-readable field name in CONTENTdm is 'Date', the nickname should be 'date'. Try this to see if it works. If it doesn't look at the raw metadata that is written to your toolchain's temp directory, or use a utility like cdminspect to view all of the field nicknames for your collection.
  • The second parameter (required) is the name of the child of <originInfo> that you want to add the reformatted date to. In the example above, that element is dateCreated. Allowed child elements are dateIssued, dateCreated, dateCaptured, dateValid, dateModified, copyrightDate, and dateOther. The value of this parameter must correspond to the snippet in the entry for the source metadata date element in your mappings file.
  • The third parameter (optional) takes the value 'm' to indicate that for date patterns that may be interpreted as either having the day first or the month first (e.g., 10/12/1978) the month is to be interpreted as the first part of the date value. The default (i.e., when the 'm' parameter is not included) is to interpret the first part of the date as the day.

Functionality

This metadata manipulator normalizes a date from the source metadata for use within MODS' originInfo dateIssued, dateCreated, dateCaptured, dateValid, dateModified, copyrightDate, and dateOther child elements.

The date string in the source metadata field is matched against two regular expressions ('/^(\d\d)\-(\d\d)\-(\d\d\d\d)$/' and '/^(\d\d\d\d)\s+(\d\d)\s+(\d\d)$/') and if a match occurs, the string is reformatted into the yyyy-mm-dd format and the MODS element containing the original date is replaced with an identical element containing the reformatted date.

This manipulator also logs invalid dates after it performs it normalization. This behaviour is not configurable; it happens automatically. Invalid dates receive entries in the manipulator log like this:

[2016-04-09 08:18:07] config.WARNING: NormalizeDate {"Record key":1737,"Normalized date value is not a valid date":"1942-19-06"} []

Extending this metadata manipulator

Additional regular expressions and accompanying logic can be added to this manipulator's class file (src/metadatamanipulators/NormalizeDate.php) as needed. If you add more regular expressions, you probably should add some corresponding tests to tests/MetadataManipulatorTest.php.

Clone this wiki locally