From 340a126f403fc9073763a1e19a7031171f02a044 Mon Sep 17 00:00:00 2001 From: David Eckhard Date: Tue, 3 Aug 2021 15:45:59 +0200 Subject: [PATCH] docs: update OAI-PMH documentation --- docs/configs/oai-pmh.md | 124 ++++++++++++++++++++++++++++++++++++---- 1 file changed, 114 insertions(+), 10 deletions(-) diff --git a/docs/configs/oai-pmh.md b/docs/configs/oai-pmh.md index 7d166d7..1126358 100644 --- a/docs/configs/oai-pmh.md +++ b/docs/configs/oai-pmh.md @@ -1,6 +1,6 @@ # OAI-PMH -OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) is a low-barrier mechanism for repository interoperability. +OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) is a low-barrier mechanism for repository interoperability. [Documentation](http://www.openarchives.org/OAI/openarchivesprotocol.html) **Data Providers** are repositories that expose structured metadata via OAI-PMH. **Service Providers** then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six verbs or services that are invoked within HTTP. @@ -8,6 +8,7 @@ OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) is a low-bar ### Public The following endpoints are public. The full documentation of the verbs can be found here: [Link](https://www.openarchives.org/OAI/openarchivesprotocol.html#ProtocolMessages) +These endpoints/verbs are used by harvesters (other repositories) to request metadata of available records. Additionally, informations about the repository and available data formats is provided. |Verb |Description |Example URL| --- | --- |--- @@ -20,7 +21,7 @@ The following endpoints are public. The full documentation of the verbs can be f ### Non Public -The following endpoints are only available as admin and all deal with sets of OAI-PMH. +The following endpoints are only available as admin and all deal with sets of OAI-PMH. These endpoints will be revamped in a later update. |Request Methods |Description |Example URL| --- | --- |--- @@ -35,12 +36,115 @@ The following endpoints are only available as admin and all deal with sets of OA |GET, HEAD, OPTIONS |Main page for sets |[https://127.0.0.1:5000/admin/oaiset/](https://127.0.0.1:5000/admin/oaiset/)| -## Issues +## Configuration -- Records - - Missing the `"_oai"` field necessary for record and identifier retrieval -- elasticsearch index - - It is possible to define the index used by elasticsearch for all relevant verbs via a config variable `OAISERVER_RECORD_INDEX='records'` - - Currently, it is set to use the 'records' index which does not exist - - The correct index for records on my machine looks like this `rdmrecords-records-record-v2.0.0-1621247047` - - Available indices can be retrieved by visiting [http://localhost:9200/_cat/indices?v&health=yellow&pretty](http://localhost:9200/_cat/indices?v&health=yellow&pretty) +#### ElasticSearch index +Elastisearch index to be used. This will be set by `invenio-rdm-records`: +```conf +OAISERVER_RECORD_INDEX='records' +``` + +#### OAI ID Prefix +The prefix that will be applied to the generated OAI-PMH ids. Should be the address of the repository (f.e. repsoitory.tugraz.at): +```conf +OAISERVER_ID_PREFIX = 'repository.tugraz.at': +``` + +#### Admin Emails +The e-mail addresses of administrators of the repository +```conf +OAISERVER_ADMIN_EMAILS = [ + 'info@inveniosoftware.org', +]: +``` + +#### Available Metadata Formats +Define the metadata formats available from a repository. These can be completely redefined or modified (extended, replaced, removed) as need be. +```conf +OAISERVER_METADATA_FORMATS = {` + 'oai_dc': { + 'serializer': ( + 'invenio_oaiserver.utils:dumps_etree', { + 'xslt_filename': pkg_resources.resource_filename( + 'invenio_oaiserver', 'static/xsl/MARC21slim2OAIDC.xsl' + ), + } + ), + 'schema': 'http://www.openarchives.org/OAI/2.0/oai_dc.xsd', + 'namespace': 'http://www.openarchives.org/OAI/2.0/oai_dc/', + }, + 'marc21': { + 'serializer': ( + 'invenio_oaiserver.utils:dumps_etree', { + 'prefix': 'marc', + } + ), + 'schema': 'http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd', + 'namespace': 'http://www.loc.gov/MARC21/slim', + } +} +``` + +The serializer is defined by a function and additional arguments. The function will be called with two fixed arguments (pid, record) and additional specified arguments. +It must return an LXML element instance, to be used in the response of the OAI server. + +Adding another format can be achieved by putting following code in a module's `init_app` function, which resides in the module's `ext.py` file: +```python + +def my_metadata_serializer(pid, record, **kwargs): + # record['_source'] will hold the record data + return = MyMetadataFormat().dump_xml(record['_source']) + +def init_app(self, app): + app.extensions['invenio-oaiserver']['OAISERVER_METADATA_FORMATS'].set('my_metadata_format', { + 'serializer': ('my_module.ext:my_metadata_serializer', {}), + 'schema' : 'link_to_schema_definition_file', + 'namespace' : 'link_to_schema_definition', + } + ) +``` + +After this, verbs supporting the `metadataFormat` attribute will be able to pick up the metadata format. + + +#### Record Search Class +Record search class for the `ListRecords` verb. This is an ElasticSearch class by default and it should return harvestable records. +```conf +OAISERVER_SEARCH_CLS = 'invenio_oaiserver.query:OAIServerSearch' +``` + +#### OAI ID Fetcher Function +Will return the OAI ID of a record. This will take two arguments: `function(record_uuid, record_as_dict)`. +```conf +OAISERVER_ID_FETCHER = 'invenio_oaiserver.fetchers:oaiid_fetcher' +``` + +#### Record Updated Key +Record dictionary key for information on when the record was last updated. Set by `invenio-rdm-records`. +```conf +OAISERVER_LAST_UPDATE_KEY = "_updated" +``` + +#### Record Created Key +Record dictionary key for information on when the record was created. +```conf +OAISERVER_CREATED_KEY = "_created" +``` + +#### Record's Sets Fetcher Function +Function to fetch the sets a record belongs to as a list. Takes one argument: `function(record_as_dict)` +```conf +OAISERVER_RECORD_SETS_FETCHER = 'invenio_oaiserver.utils:record_sets_fetcher' +``` + +#### Record Retrieval Class +Used when an `identifier` parameter is used in a verb and `OAISERVER_GETRECORD_FETCHER` is not overridden. +```conf +OAISERVER_RECORD_CLS = 'invenio_records.api:Record' +``` + +#### Single Record Fetcher Function +Function to fetch a record and return as dict. Takes one argument: `function(record_uuid)` +```conf +OAISERVER_GETRECORD_FETCHER = 'invenio_oaiserver.utils:getrecord_fetcher' +```