Skip to content

Commit

Permalink
docs: update OAI-PMH documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
rekt-hard committed Aug 3, 2021
1 parent b977277 commit 340a126
Showing 1 changed file with 114 additions and 10 deletions.
124 changes: 114 additions & 10 deletions docs/configs/oai-pmh.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
# OAI-PMH

OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) is a low-barrier mechanism for repository interoperability.
OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) is a low-barrier mechanism for repository interoperability. [Documentation](http://www.openarchives.org/OAI/openarchivesprotocol.html)
**Data Providers** are repositories that expose structured metadata via OAI-PMH. **Service Providers** then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six verbs or services that are invoked within HTTP.


## Endpoints

### Public
The following endpoints are public. The full documentation of the verbs can be found here: [Link](https://www.openarchives.org/OAI/openarchivesprotocol.html#ProtocolMessages)
These endpoints/verbs are used by harvesters (other repositories) to request metadata of available records. Additionally, informations about the repository and available data formats is provided.

|Verb |Description |Example URL|
--- | --- |---
Expand All @@ -20,7 +21,7 @@ The following endpoints are public. The full documentation of the verbs can be f


### Non Public
The following endpoints are only available as admin and all deal with sets of OAI-PMH.
The following endpoints are only available as admin and all deal with sets of OAI-PMH. These endpoints will be revamped in a later update.

|Request Methods |Description |Example URL|
--- | --- |---
Expand All @@ -35,12 +36,115 @@ The following endpoints are only available as admin and all deal with sets of OA
|GET, HEAD, OPTIONS |Main page for sets |[https://127.0.0.1:5000/admin/oaiset/](https://127.0.0.1:5000/admin/oaiset/)|


## Issues
## Configuration

- Records
- Missing the `"_oai"` field necessary for record and identifier retrieval
- elasticsearch index
- It is possible to define the index used by elasticsearch for all relevant verbs via a config variable `OAISERVER_RECORD_INDEX='records'`
- Currently, it is set to use the 'records' index which does not exist
- The correct index for records on my machine looks like this `rdmrecords-records-record-v2.0.0-1621247047`
- Available indices can be retrieved by visiting [http://localhost:9200/_cat/indices?v&health=yellow&pretty](http://localhost:9200/_cat/indices?v&health=yellow&pretty)
#### ElasticSearch index
Elastisearch index to be used. This will be set by `invenio-rdm-records`:
```conf
OAISERVER_RECORD_INDEX='records'
```

#### OAI ID Prefix
The prefix that will be applied to the generated OAI-PMH ids. Should be the address of the repository (f.e. repsoitory.tugraz.at):
```conf
OAISERVER_ID_PREFIX = 'repository.tugraz.at':
```

#### Admin Emails
The e-mail addresses of administrators of the repository
```conf
OAISERVER_ADMIN_EMAILS = [
'[email protected]',
]:
```

#### Available Metadata Formats
Define the metadata formats available from a repository. These can be completely redefined or modified (extended, replaced, removed) as need be.
```conf
OAISERVER_METADATA_FORMATS = {`
'oai_dc': {
'serializer': (
'invenio_oaiserver.utils:dumps_etree', {
'xslt_filename': pkg_resources.resource_filename(
'invenio_oaiserver', 'static/xsl/MARC21slim2OAIDC.xsl'
),
}
),
'schema': 'http://www.openarchives.org/OAI/2.0/oai_dc.xsd',
'namespace': 'http://www.openarchives.org/OAI/2.0/oai_dc/',
},
'marc21': {
'serializer': (
'invenio_oaiserver.utils:dumps_etree', {
'prefix': 'marc',
}
),
'schema': 'http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd',
'namespace': 'http://www.loc.gov/MARC21/slim',
}
}
```

The serializer is defined by a function and additional arguments. The function will be called with two fixed arguments (pid, record) and additional specified arguments.
It must return an LXML element instance, to be used in the response of the OAI server.

Adding another format can be achieved by putting following code in a module's `init_app` function, which resides in the module's `ext.py` file:
```python

def my_metadata_serializer(pid, record, **kwargs):
# record['_source'] will hold the record data
return = MyMetadataFormat().dump_xml(record['_source'])

def init_app(self, app):
app.extensions['invenio-oaiserver']['OAISERVER_METADATA_FORMATS'].set('my_metadata_format', {
'serializer': ('my_module.ext:my_metadata_serializer', {}),
'schema' : 'link_to_schema_definition_file',
'namespace' : 'link_to_schema_definition',
}
)
```

After this, verbs supporting the `metadataFormat` attribute will be able to pick up the metadata format.


#### Record Search Class
Record search class for the `ListRecords` verb. This is an ElasticSearch class by default and it should return harvestable records.
```conf
OAISERVER_SEARCH_CLS = 'invenio_oaiserver.query:OAIServerSearch'
```

#### OAI ID Fetcher Function
Will return the OAI ID of a record. This will take two arguments: `function(record_uuid, record_as_dict)`.
```conf
OAISERVER_ID_FETCHER = 'invenio_oaiserver.fetchers:oaiid_fetcher'
```

#### Record Updated Key
Record dictionary key for information on when the record was last updated. Set by `invenio-rdm-records`.
```conf
OAISERVER_LAST_UPDATE_KEY = "_updated"
```

#### Record Created Key
Record dictionary key for information on when the record was created.
```conf
OAISERVER_CREATED_KEY = "_created"
```

#### Record's Sets Fetcher Function
Function to fetch the sets a record belongs to as a list. Takes one argument: `function(record_as_dict)`
```conf
OAISERVER_RECORD_SETS_FETCHER = 'invenio_oaiserver.utils:record_sets_fetcher'
```

#### Record Retrieval Class
Used when an `identifier` parameter is used in a verb and `OAISERVER_GETRECORD_FETCHER` is not overridden.
```conf
OAISERVER_RECORD_CLS = 'invenio_records.api:Record'
```

#### Single Record Fetcher Function
Function to fetch a record and return as dict. Takes one argument: `function(record_uuid)`
```conf
OAISERVER_GETRECORD_FETCHER = 'invenio_oaiserver.utils:getrecord_fetcher'
```

0 comments on commit 340a126

Please sign in to comment.