A reporting database for collating and doing stuff with ArchivesSpace and Preservica metadata
The tables in the database and how they are generated
This table is populated by running the archival_object_table.sql
query against the ArchivesSpace database.
Column | Description | Type |
---|---|---|
id | The archival object id | int |
repo_id | The repository id | int |
root_record_id | The parent resource id | int |
parent_id | The parent archival object | int |
ref_id | The ref id | varchar(255) |
component_id | The component unique id | varchar(255) |
title | The archival object title | varchar(8704) |
publish | 1 published, 0 unpublished | int |
level | Hierarchical level (i.e. file) | varchar(255) |
preservica_collection_id | Found in the note field | varchar(255) |
extent | Concatenated extent value | varchar(255) |
physical_containers | Concatenated container data | mediumtext |
create_time | ArchivesSpace creation time | timestamp |
m_time | ArchivesSpace last modified | timestamp |
This table is populated by running the digital_object_table.sql
query against the ArchivesSpace database.
Column | Description | Type |
---|---|---|
id | The digital object id | int |
digital_object_id | The Preservica DelUnit id | varchar(255) |
archival_object_id | The linked record id | int |
title | The digital object title | varchar(255) |
publish | 1 published, 0 unpublished | int |
has_content_link | 1 has link. 0 no link | int |
create_time | ArchivesSpace creation time | timestamp |
m_time | ArchivesSpace creation time | timestamp |
This table is populated by running the digital_object_component_table.sql
query against the ArchivesSpacee database.
Column | Description | Type |
---|---|---|
id | The digital object component id | int |
repo_id | The repository id | int |
root_record_id | The root digital object id | int |
parent_id | The parent digital object id | int |
component_id | The component unique id | varchar(255) |
title | The component title | varchar(8704) |
publish | 1 published, 0 unpublished | int |
create_time | ArchivesSpace creation time | timestamp |
m_time | ArchivesSpace creation time | timestamp |
This table is populated by running the restrictions_table.sql
query against the ArchivesSpace database. This query can only be run on a local/test version of YUL ArchivesSpace, as it requires SQL common table expressions (CTEs) that only exist in MySQL 8+. The production version of YUL ArchivesSpace should be upgraded to MySQL 8 in early 2021.
Column | Description | Type |
---|---|---|
id | The archival object id | int |
resource_note_text | The resource-level note | longtext |
lvl | The note level | int |
path | The restriction text hierarchy | longtext |
type_path | The restriction type hierarchy | longtext |
end_path | The restriction date hierarchy | longtext |
This table is populated by running the hierarchies.sql
query against the ArchivesSpace database. The hierarchies
table is a view that only exists in a local/test version of YUL ArchivesSpace, as it requires SQL common table expressions (CTEs) that only exist in MySQL 8+. The production version of YUL ArchivesSpace should be upgraded to MySQL 8 in early 2021.
Column | Description | Type |
---|---|---|
id | The archival object id | int |
root_record_id | The resource id | int |
repo_id | The repository id | int |
full_path | The full hierarchy | longtext |
lvl | The hierarchical level | int |
This table is populated by extracting data from Preservica collection XML files. The collection IDs are stored in notes within ArchivesSpace archival object records, and are extracted by running the get_collection_ids.sql
query against the ArchivesSpace database.
Column | Description | Type |
---|---|---|
id | Preservica collection id | varchar(255) |
parent_collection_id | Preservica parent collection | varchar(255) |
collection_code | Preservica collection code | varchar(255) |
security_tag | Preservica security tag | varchar(255) |
title | Preservica collection title | varchar(8704) |
create_time | Local database create time | timestamp |
m_time | Local database last modified | timestamp |
This table is populated by extracting data from Preservica deliverable unit XML files. The deliverable unit IDs are stored in digital object records in ArchivesSpace, and are extracted by running the get_deliverable_unit_ids.sql
query against the ArchivesSpace database.
Column | Description | Type |
---|---|---|
id | The digital object id | int |
collection_id | The Preservica collection id | varchar(255) |
parent_deliverable_unit | The parent deliverable unit id | varchar(255) |
root_parent_del_unit | The root deliverable unit id | varchar(255) |
deliverable_unit_level | The deliverable unit level | int |
digital_surrogate | yes or no | varchar(255) |
coverage_from | Start date | timestamp |
coverage_to | End date | timestamp |
security_tag | Security tag | varchar(255) |
create_time | Local database create time | timestamp |
m_time | Local database last modified | timestamp |
This table is populated by extracting data from Preservica deliverable unit XML files.
Column | Description | Type |
---|---|---|
id | Preservica manifestation id | varchar(255) |
deliverable_unit_id | Deliverable unit id | varchar(255) |
typeref | The manifestation type | varchar(255) |
summary | Preservica summary | varchar(8704) |
create_time | Local database create time | timestamp |
m_time | Local database last modified | timestamp |
This table is populated by extracting data from Preservica digital file XML files. The list of digital files is derived from Preservica deliverable unit XML files.
Column | Description | Type |
---|---|---|
id | The digital file id | int |
manifestation_id | The manifestation id | int |
file_set_id | The file set id | int |
filesize | The file size in bytes | int |
filemoddate | The file modification date | timestamp |
filename | The name of the file | varchar(255) |
working_path | The relative path to the file | varchar(255) |
format_puid | The format unique id | varchar(255) |
format_name | The format name | mediumtext |
mimetype | The format mimetype | varchar(255) |
create_time | Local database creation time | timestamp |
m_time | Local database last modified | timestamp |
How to extract data from ArchivesSpace and Preservica
Data is extracted from ArchivesSpace via several SQL queries
Data is extracted from Preservica by calling the Preservica API, as access to the Preservica database is restricted.
How to create and populate the database using extracted data from ArchivesSpace and Preservica
The first step in initializing the aspace_preservica_db
database is running the process_xml_for_db.py
script on extracted Preservica XML files.
Once the CSV spreadsheets are prepared for ingest, it is time to run the db_manager.py
script to create and populate the database.
After the data is ingested, a little bit of clean-up is necessary.
Some data was missed during the initial load:
- Collection data
- Child deliverable unit data
- Digital object component data
- Restriction data
Consider re-designing manifestation table, since there are duplicate manifestation IDs; or at least sort out those issues.