Nabu primarily is a tool for reading from an S3 object store and writing to a triplestore. The object stores can be any S3 compliant object stores so AWS S3, Google Cloud Storage, Wasabi, or others. For most cases I am using Minio, an open source S3 object store.
Similarly, the triplestore can be any standards compliant triplestore. Here the primary standards we need implemented include
- SPARQL 1.1 with Update support
- SPARQL 1.1 over HTTP
As noted in the documentation late, if using the bulk upload feature you need set the METHOD and ContentType your triplestore will expect.
Nabu needs a configuration file. A template for this can be seen in example.yaml.
Commands are like the following:
nabu --help
The mode "prefix" in Nabu is used for loading a S3 object prefix path into a triplestore
nabu --cfg example.yaml prune
nabu --cfg example.yaml prune --prefix summoned/amgeo
nabu --cfg .example.yaml prefix
nabu --cfg example.yaml prefix -prefix summoned/amgeo
nabu prefix --cfg ../gleaner/configs/nabu
and gleaner generated configuration:
nabu prefix --cfgPath directory --cfgName name
eg use generated
nabu prefix --cfgPath ../gleaner/configs --cfgName local
The mode "prune" in Nabu is to sync a prefix to the graph (remove graphs no longer in use, add new ones)
Note that updated graphs become new objects, since the object name is the SHA256 of the object
nabu prune --cfg file
and gleaner generated configuration:
nabu prefix --cfgPath directory --cfgName name
eg use generated
nabu prefix --cfgPath ../gleaner/configs --cfgName local
This commands loads all the triples into the triplestore using the bulk load approach. This is a SPARQL UPDATE call, vs the classic SPARQL command. Nabu will generate all the triples into a temporary file and then use that load into the triplestore. This file will be removed after it is used.
Required configuration entry
sparql:
endpoint: http://localhost/blazegraph/namespace/earthcube/sparql
endpointBulk: http://coreos.lan:3030/testing/data
endpointMethod: PUT
contentType: application/n-quads
authenticate: false
username: ""
password: ""
The bulk loading endpoint for many triplestores is different from the default SPARQL endpoint. Also, different vendors will likely require different methods and content type. These are only needed in the case where you are using the bulk command in Nabu. For example:
GraphDB example (reference)
endpointBulk: http://example.org:7200/repositories/testing/statements
endpointMethod: PUT
contentType: application/n-quads
Jena example (reference)
endpointBulk: http://example.org:3030/testing/data
endpointMethod: PUT
contentType: application/n-quads
Blazegraph example (reference)
endpointBulk: http://example.org9090/blazegraph/namespace/kb/sparql
endpointMethod: POST
contentType: text/x-nquads
Bulk load the specified source in the objects-prefix node of the configuration file, use the --prefix flag to specify the source.
nabu bulk --cfg ./example.yaml --prefix summoned/providera
Bulk load all the sources defined in the objects-prefix node of the configuration file.
nabu bulk --cfg ./example.yaml
The release command is used to build out release graphs. These are the entire set of objects associated with a provider, rolled up in one file. These are done as nquads with the named graph following the pattern as defined in the ADR 0001-URN-decision.
To build a release graphs for a specified source in the objects-prefix node of the configuration file, use the --prefix flag to specify the source.
nabu release --cfg ./example.yaml --prefix summoned/providera
To build all the release graphs for the sources defined in the objects-prefix node of the configuration file.
nabu release --cfg ./example.yaml
The mode "object" in Nabu is used for loading a S3 object path into a triplestore
nabu object --cfg file objectId
eg
nabu object --cfg ../gleaner/configs/nabu milled/opentopography/ffa0df033bb3a8fc9f600c80df3501fe1a2dbe93.rdf
and gleaner generated configuration:
nabu object --cfgPath directory --cfgName name objectId
eg use generated
nabu object --cfgPath ../gleaner/configs --cfgName local milled/opentopography/ffa0df033bb3a8fc9f600c80df3501fe1a2dbe93.rdf