Skip to content

Latest commit

 

History

History
228 lines (166 loc) · 7.78 KB

README.md

File metadata and controls

228 lines (166 loc) · 7.78 KB

nocfl-js

An opinionated S3 storage library inspired by ocfl but simpler.

Table of Contents

Repository and Documentation

Background

In working with the Oxford Common File Layout - OCFL we came to realise that some quite serious compromises were required. This is not to say that OCFL is not a good specification; just that we needed something different.

The name of this library came from Peter Sefton.

Why not just use OCFL?

Developing this library / tests

This library has extensive tests. To run them: npm run test:watch. You will need docker as this command will start a local S3 service called MinIO. (npm run develop exists as a more semantically meaningful shortcut for test:watch)

The minio credentials are root/rootpass and are defined in docker-compose.yml.

Releasing an update

  • Tag the release with npm version {major|minor|patch} as appropriate
  • Push tags to master with git push origin master --tags which will trigger a github action to build the distributables and update the docs
  • Publish the release to npm with npm publish

About this library

This library is intended to simplify working with data in an S3 bucket. Its primary objective is to ease the creation and management of data objects in the bucket in a well defined way. Accordingly, the API is intentionally simple. You define some properties when creating a hook to the bucket and then get / put data from it.

Research Object Crate Metadata

The library will create a metadata file for you - Research Object Crate - RO-Crate. By default, when you put a file into the object it will be registered in the hasPart property of the root dataset. And when you remove a file, it's content will also be removed from the crate file.

Index files

Object storage has no concept of folders so when looking for objects, you have to walk all of the keys. This can be painful and slow so new items are automatically added to an index file on the storage. See Indexer for more information.

Versioning

This library can version files for you. The versioning is not on by default but it can be turned on per file PUT. When you version a file the following happens - as an example, let's say the file is called something.txt

  • - the existing file (something.txt) will be copied to something.v${Date as ISO String}.txt
  • - the new version will be uploaded to something.txt

Think of the versioned examples as being the content of that file until that point in time. One can retrieve the versions of a given file by calling the listFileVersions method on a given base file name:

listFileVersions({ target: 'something.txt' })

Creating a new item

When creating a new item you need to

  • - pass in a prefix name (a domain name is a good option)
  • - pass in the primary type of the data type (e.g. Collection, Item, Person etc)
  • - pass in the object identifier

Both id and type must start with letter (upper or lowercase) and be followed by any number of letters (upper and lower), numbers and underscore. Any other characters will not be accepted and result in an error. Path creation will use the first letter of the identifier to prefix the item (this is configurable by defining the splay property in the constructor). The domain and class name will be lowercased.

Path creation from the identifier is illustrated following:

Examples:

-   prefix: example.com, type: Item, id: test -> `(bucket)/example.com/item/t/test (splay = default = 1)`
-   prefix: eXamPLe.cOm, type: Item, id: test -> `(bucket)/example.com/item/t/test (splay = default = 1)`

-   prefix: example.com, type: Collection, id: test, splay: 2 -> `(bucket)/example.com/collection/te/test`
-   prefix: example.com, type: Collection, id: test, splay: 4 -> `(bucket)/example.com/collection/test/test`
-   prefix: example.com, type: Collection, id: test, splay: 10 -> `(bucket)/example.com/collection/test/test`

Load the library


# ES modules
import { Store } from "@coedl/nocfl-js";

# CommonJS
const { Store } = require('@coedl/nocfl-js)

Store

The is the workhorse class to interact with the storage. This is how you get / put files to / from the storage and just generally work with them.

Create an item and put a file to it

// get a hook to the storage
const store = new Store({ prefix: "exmaple.com", type: "item", id: "test", credentials });

// create the object
await store.createObject();

// upload a file to it
await store.put({ localPath: path.join(__dirname, file), target: file });

// download a file from the storage
await store.get({ target: file, localPath: path.join("/tmp", file) });

// get a pre signed link to a file
let link = await store.getPresignedUrl({ target: file });

See the tests for more usage examples.

Indexer

This class helps you create and manage file based indices of the content on the storage. In Object storage there is no such thing as a folder. It's key / value pairs where the key (the fully qualified filename you gave it) points to the file data. That means you can't do things like:

ls /data/folder1/today/my/files

even though it looks like you have just such a path. Practically this means that whenever you want to look for something, you have to walk all of the keys. Obviously this becomes more and more painful as the amount of content in the storage grows. So, to shortcut this, you can create index files to the objects on the storage. And you do that via this class.

const indexer = new Indexer({credentials})
await indexer.createIndices({})

This will walk the storage and create an indices folder per prefix which contains a folder for each type it finds (collection, item, etc) and within those folders, an index file for each letter of the alphabet:

- domain1.example.com
    - indices
        - collection
            - a.json
            - b.json
            - ...
        - item
            - a.json
            - b.json
            - ...
- domain2.example.com
        - collection
            - a.json
            - b.json
            - ...
        - item
            - a.json
            - b.json
            - ...

The you can operate on those:

// list all indices in the domain
listIndices({ prefix: 'domain1.example.com' })

// list all indices of type in the domain
listIndices({ prefix: 'domain1.example.com', type: 'collection' })

// get a specific index
getIndex({ prefix: 'domain1.example.com', type: 'collection', file: 'a.json'})

Walker

The class will walk the storage for you and emit an object you can use with the storage class to attach to an object in the storage and operate on it.

const walker = new Walker({ credentials: this.credentials });
walker.on("object", (object) => {
    let { domain, className, id, splay } = object;

    // do something with object
})
await walker.walk({})