Skip to content

Commit

Permalink
Switching to webcrypto (#8)
Browse files Browse the repository at this point in the history
  • Loading branch information
iherman authored Nov 23, 2023
1 parent 50d4dbb commit 6393696
Show file tree
Hide file tree
Showing 75 changed files with 669 additions and 1,055 deletions.
4 changes: 4 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# Version 3.0.0

- As `crypto-js` package has been discontinued, switching to the WebCrypto API for hashing (available in `node.js` for versions 21 and upwards). ***This is a backward incompatible change***, because hashing in WebCrypto is an asynchronous function, and this "bubbles up" to the generic interface as well.

# Version 2.0.4

- Added SHA-384 to the list of available hash functions (missed it the last time)
Expand Down
48 changes: 33 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,23 @@
# RDF Canonicalization in TypeScript

This is an implementation of the [RDF Dataset Canonicalization](https://www.w3.org/TR/rdf-canon/) algorithm, also referred to as RDFC-1.0. (The algorithm is being specified by the W3C [RDF Dataset Canonicalization and Hash Working Group](https://www.w3.org/groups/wg/rch).)

> **The [specification](https://www.w3.org/TR/rdf-canon/) is not yet final. This implementations aims at reflecting _exactly_ the specification, which means it may evolve alongside the specification even if changes are editorial only.**
This is an implementation of the [RDF Dataset Canonicalization](https://www.w3.org/TR/rdf-canon/) algorithm, also referred to as RDFC-1.0. The algorithm has been published by the W3C [RDF Dataset Canonicalization and Hash Working Group](https://www.w3.org/groups/wg/rch).

## Requirements

### RDF packages and references

The implementation depends on the interfaces defined by the [RDF/JS Data model specification](http://rdf.js.org/data-model-spec/) for RDF terms, named and blank nodes, or quads. It also depends on an instance of an RDF Data Factory, specified by the aforementioned [specification](http://rdf.js.org/data-model-spec/#datafactory-interface). For TypeScript, the necessary type specifications are available through the [`@rdfjs/types` package](https://www.npmjs.com/package/@rdfjs/types); an implementation of the RDF Data Factory is provided by, for example, the [`n3` package](https://www.npmjs.com/package/n3) (but there are others), which also provides a Turtle/TriG parser and serializer to test the library.

By default (i.e., if not explicitly specified) the Data Factory of the [`n3` package](https://www.npmjs.com/package/n3) is used.

### Crypto

The implementation relies on the [Web Cryptography API](https://www.w3.org/TR/WebCryptoAPI/) as implemented by modern browsers, `deno` (version 1.3.82 or higher), or `node.js` (version 21 or higher). A side effect of using Web Crypto is that the canonicalization and hashing interface entries are all asynchronous, and must be used, for example, through the `await` idiom of Javascript/Typescript.



## Usage

An input RDF Dataset may be represented by:

- A Set of [Quad instances](https://rdf.js.org/data-model-spec/#quad-interface); or
Expand All @@ -28,11 +36,11 @@ The canonicalization process can be invoked by
- A Set or an Array of Quad instances, if the input was a Set or an Array, respectively;
- A Set of Quad instances if the input was an N-Quads document.

The separate [testing folder](https://github.com/iherman/rdfjs-c14n/tree/main/testing) includes a tiny application that runs the official specification tests, and can be used as an example for the additional packages that are required.
The separate [testing folder](https://github.com/iherman/rdfjs-c14n/tree/main/testing) includes a tiny application that runs some specification tests, and can be used as an example for the additional packages that are required.

## Installation

The usual `npm` installation can be used:
For `node.js`, the usual `npm` installation can be used:

```
npm rdfjs-c14n
Expand All @@ -42,7 +50,15 @@ The package has been written in TypeScript but is distributed in JavaScript; the

Also, using appropriate tools (e.g., [esbuild](https://esbuild.github.io/)) the package can be included into a module that can be loaded into a browser.

## Usage
For `deno` a simple

```
import { RDFC10, Quads } from "npm:rdfjs-c14n"
```

will do.

## Usage Examples

There is a more detailed documentation of the classes and types [on github](https://iherman.github.io/rdfjs-c14n/). The basic usage may be as follows:

Expand All @@ -64,16 +80,16 @@ main() {

// "normalized" is a dataset of quads with "canonical" blank node labels
// per the specification.
const normalized: Quads = rdfc10.c14n(input).canonicalized_dataset;
const normalized: Quads = (await rdfc10.c14n(input)).canonicalized_dataset;

// If you care only of the N-Quads results only, you can make it simpler
const normalized_N_Quads: string = rdfc10.c14n(input).canonical_form;
const normalized_N_Quads: string = (await rdfc10.c14n(input)).canonical_form;

// Or even simpler, using a shortcut:
const normalized_N_Quads_bis: string = rdfc10.canonicalize(input);
const normalized_N_Quads_bis: string = await rdfc10.canonicalize(input);

// "hash" is the hash value of the canonical dataset, per specification
const hash: string = rdfc10.hash(normalized);
const hash: string = await rdfc10.hash(normalized);
}
```

Expand All @@ -93,10 +109,10 @@ main() {
const input: string = fetchYourNQuadsDocument();

// "normalized" is an N-Quads document with all blank nodes canonicalized
const normalized: string = rdfc10.canonicalize(input);
const normalized: string = await rdfc10.canonicalize(input);

// "hash" is the hash value of the canonical dataset, per specification
const hash = rdfc10.hash(normalized);
const hash = await rdfc10.hash(normalized);
}
```

Expand All @@ -105,7 +121,8 @@ main() {

#### Choice of hash

The [RDFC 1.0](https://www.w3.org/TR/rdf-canon/) algorithm is based on an extensive usage of hashing. By default, as specified by the document, the hash function is 'sha256'. This default hash function can be changed via the
The [RDFC 1.0](https://www.w3.org/TR/rdf-canon/) algorithm is based on an extensive usage of hashing. By default, as specified by the specification, the hash function is 'sha256'.
This default hash function can be changed via the

```js
rdfc10.hash_algorithm = algorithm;
Expand All @@ -117,7 +134,8 @@ attribute, where `algorithm` can be any hash function identification. Examples a
rdfc10.available_hash_algorithms;
```

which corresponds to any value that the underlying `npm/crypto-js` package (version 4.1.1., as of July 2023) accepts.
which corresponds to the values defined by, and also usually implemented, the [Web Cryptography API specification](https://www.w3.org/TR/WebCryptoAPI/) (as of December 2013),
namely 'sha1', 'sha256', 'sha384', and 'sha512'.

#### Controlling the complexity level

Expand Down Expand Up @@ -178,7 +196,7 @@ Specific applications may want to add the possibility to let the user configure
```

where `null` stands for a possible `DataFactory` instance (or `null` if the default is used) and `getConfigData` stands for a callback returning the configuration data. An example [callback](https://github.com/iherman/rdfjs-c14n/blob/main/extras/nodeConfiguration.ts) (using a combination of environment variables and configuration files and relying on the node.js platform) is available, and can be easily adapted to other platforms (e.g., deno). (A [javascript version](https://github.com/iherman/rdfjs-c14n/blob/main/extras/nodeConfiguration.js) of the callback is also available.)
where `null` stands for a possible `DataFactory` instance (or `null` if the default is used) and `getConfigData` stands for a callback returning the configuration data. An example [callback](https://github.com/iherman/rdfjs-c14n/blob/main/extras/nodeConfiguration.ts) (using a combination of environment variables and configuration files and relying on the `node.js` platform) is available, and can be easily adapted to other platforms (e.g., `deno`). (A [javascript version](https://github.com/iherman/rdfjs-c14n/blob/main/extras/nodeConfiguration.js) of the callback is also available.)

---

Expand Down
34 changes: 23 additions & 11 deletions dist/index.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -66,22 +66,30 @@ declare class RDFC10 {


/**
* The Hash algorithm. The value can be anything that the underlying `npm/crypto-js` package accepts. The default is "sha256".
*/
* Set the Hash algorithm. The default is "sha256".
* If the algorithm is available the value is ignored (and an exception is thrown).
*
* The name is considered to be case insensitive. Also, both the formats including, or not, the '-' characters
* are accepted (i.e., "sha256" and "sha-256").
*
* @param algorithm_in: the (case insensitive) name of the algorithm,
*/
set hash_algorithm(algorithm: string);
get hash_algorithm(): string;
get available_hash_algorithms(): string[]

/**
* Set the maximal level of recursion this canonicalization should use. Setting this number to a reasonably low number (say, 3),
* Set the maximal complexity number. This number, multiplied with the number of blank nodes in the dataset,
* sets a maximum level of calls the algorithm can do for the so called "hash n degree quads" function.
* Setting this number to a reasonably low number (say, 30),
* ensures that some "poison graphs" would not result in an unreasonably long canonicalization process.
* See the [security consideration section](https://www.w3.org/TR/rdf-canon/#security-considerations) in the specification.
*
* The default value set by this implementation is 50; any number _greater_ then this number is ignored (and an exception is thrown).
*/
set maximum_recursion_level(level: number);
get maximum_recursion_level(): number;
get maximum_allowed_recursion_level(): number
set maximum_complexity_number(level: number);
get maximum_complexity_number(): number;
get maximum_allowed_complexity_number(): number

/**
* Canonicalize a Dataset into an N-Quads document.
Expand All @@ -92,11 +100,14 @@ declare class RDFC10 {
*
* @remarks
* Note that the N-Quads parser throws an exception in case of syntax error.
* @throws - RangeError, if the complexity of the graph goes beyond the set complexity number. See {@link maximum_complexity_number}
*
*
* @param input_dataset
* @returns - N-Quads document using the canonical ID-s.
* @async
*/
canonicalize(input_dataset: InputDataset): string;
canonicalize(input_dataset: InputDataset): Promise<string>;

/**
* Canonicalize a Dataset into a full set of information.
Expand All @@ -110,11 +121,13 @@ declare class RDFC10 {
*
* @remarks
* Note that the N-Quads parser throws an exception in case of syntax error.
* @throws - RangeError, if the complexity of the graph goes beyond the set complexity number. See {@link maximum_complexity_number}
*
* @param input_dataset
* @returns - Detailed results of the canonicalization
* @async
*/
c14n(input_dataset: InputDataset): C14nResult ;
c14n(input_dataset: InputDataset): Promise<C14nResult> ;

/**
* Serialize the dataset into a (possibly sorted) Array of nquads.
Expand All @@ -135,12 +148,11 @@ declare class RDFC10 {
*
* @param input_dataset
* @returns
* @async
*/
hash(input_dataset: InputDataset): Hash;
hash(input_dataset: InputDataset): Promise<Hash>;
}

declare class RDFCanon extends RDFC10 {}

/*****************************************************************************
Type and class declarations for logging; can be ignored if no logging is used
******************************************************************************/
Expand Down
33 changes: 14 additions & 19 deletions dist/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
* @packageDocumentation
*/
Object.defineProperty(exports, "__esModule", { value: true });
exports.RDFCanon = exports.RDFC10 = exports.LogLevels = void 0;
exports.RDFC10 = exports.LogLevels = void 0;
const n3 = require("n3");
const common_1 = require("./lib/common");
const config_1 = require("./lib/config");
Expand Down Expand Up @@ -76,8 +76,11 @@ class RDFC10 {
return logging_1.LoggerFactory.loggerTypes();
}
/**
* Set the Hash algorithm. The value can be anything that the underlying `npm/crypto-js` package accepts. The default is "sha256".
* If the algorithm is not listed as existing for `crypto-js`, the value is ignored (and an exception is thrown).
* Set the Hash algorithm. The default is "sha256".
* If the algorithm is available the value is ignored (and an exception is thrown).
*
* The name is considered to be case insensitive. Also, both the formats including, or not, the '-' characters
* are accepted (i.e., "sha256" and "sha-256").
*
* @param algorithm_in: the (case insensitive) name of the algorithm,
*/
Expand Down Expand Up @@ -144,9 +147,11 @@ class RDFC10 {
* @param input_dataset
* @returns - N-Quads document using the canonical ID-s.
*
* @async
*
*/
canonicalize(input_dataset) {
return this.c14n(input_dataset).canonical_form;
async canonicalize(input_dataset) {
return (await this.c14n(input_dataset)).canonical_form;
}
/**
* Canonicalize a Dataset producing the full set of information.
Expand All @@ -166,8 +171,10 @@ class RDFC10 {
*
* @param input_dataset
* @returns - Detailed results of the canonicalization
*
* @async
*/
c14n(input_dataset) {
async c14n(input_dataset) {
return (0, canonicalization_1.computeCanonicalDataset)(this.state, input_dataset);
}
/**
Expand All @@ -191,7 +198,7 @@ class RDFC10 {
* @param input_dataset
* @returns
*/
hash(input_dataset) {
async hash(input_dataset) {
if (typeof input_dataset === 'string') {
return (0, common_1.computeHash)(this.state, input_dataset);
}
Expand All @@ -201,15 +208,3 @@ class RDFC10 {
}
}
exports.RDFC10 = RDFC10;
/**
* Alternative name for {@link RDFC10}.
*
* @remark
* This is only for possible backward compatibility's sake; this was the old name of the class
* The WG has decided what the final name of the algorithm is (RDFC 1.0), hence the renaming of the core
* class.
*/
class RDFCanon extends RDFC10 {
}
exports.RDFCanon = RDFCanon;
;
17 changes: 11 additions & 6 deletions dist/lib/canonicalization.js
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,10 @@ const createBidMap = (graph) => {
* @param state - the overall canonicalization state + interface to the underlying RDF environment
* @param input
* @returns - A semantically identical set of Quads, with canonical BNode labels. The exact format of the output depends on the format of the input. If the input is a Set or an Array, so will be the return. If it is an N-Quads document (string) then the return is a Set of Quads.
*
* @async
*/
function computeCanonicalDataset(state, input) {
async function computeCanonicalDataset(state, input) {
// Re-initialize the state information: canonicalization should always start with a clean state
state.bnode_to_quads = {};
state.hash_to_bnodes = {};
Expand Down Expand Up @@ -103,17 +105,20 @@ function computeCanonicalDataset(state, input) {
// Compute a hash value for each bnode (depending on the quads it appear in)
// In simple cases a hash value refers to one bnode only; in unlucky cases there
// may be more. Hence the usage of the hash_to_bnodes map.
Object.keys(state.bnode_to_quads).forEach((n) => {
// Step 3.1
const hfn = (0, hash1DegreeQuads_1.computeFirstDegreeHash)(state, n);
// The code below serializes a series of Promise references which is not nice
// However, if I use a Promise.all, although that works, it messes up the log entries' order for some reasons
// Because, deep underneath all, the hash function operates on in-memory data in one block (as opposed to streaming),
// it probably does not really matter speed-wise...
for (const n of Object.keys(state.bnode_to_quads)) {
const hfn = await (0, hash1DegreeQuads_1.computeFirstDegreeHash)(state, n);
// Step 3.2
if (state.hash_to_bnodes[hfn] === undefined) {
state.hash_to_bnodes[hfn] = [n];
}
else {
state.hash_to_bnodes[hfn].push(n);
}
});
}
/* @@@ */ state.logger.pop();
/* @@@ */
state.logger.info("ca.3.2", "Calculated first degree hashes (4.4.3. (3))", {
Expand Down Expand Up @@ -194,7 +199,7 @@ function computeCanonicalDataset(state, input) {
// to make eslint happy
/* const bn = */ temporary_issuer.issueID(n);
// Step 5.2.4
const result = (0, hashNDegreeQuads_1.computeNDegreeHash)(state, n, temporary_issuer);
const result = await (0, hashNDegreeQuads_1.computeNDegreeHash)(state, n, temporary_issuer);
hash_path_list.push(result);
}
}
Expand Down
28 changes: 19 additions & 9 deletions dist/lib/common.js
Original file line number Diff line number Diff line change
Expand Up @@ -25,16 +25,23 @@ var Constants;
Various utility functions used by the rest of the code.
***********************************************************/
/**
* Return the hash of a string.
* Return the hash of a string (encoded in UTF-8).
*
* @param data
* This is the core of the various hashing functions. It is the interface to the Web Crypto API,
* which does the effective calculations.
*
* @param input
* @returns - hash value
*
* @async
*/
function computeHash(state, data) {
// The value of the state.hash_algorithm is checked at setting, so there
// no reason to check it here.
const hash_value = config_1.AVAILABLE_HASH_ALGORITHMS[state.hash_algorithm](data);
return hash_value.toString();
async function computeHash(state, input) {
const encoder = new TextEncoder();
const data = encoder.encode(input);
const hashBuffer = await crypto.subtle.digest(config_1.AVAILABLE_HASH_ALGORITHMS[state.hash_algorithm], data);
const hashArray = Array.from(new Uint8Array(hashBuffer));
const hashHex = hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
return hashHex;
}
exports.computeHash = computeHash;
/**
Expand All @@ -57,9 +64,10 @@ exports.concatNquads = concatNquads;
*
* @param nquads
* @returns - hash value
* @async
*
*/
function hashNquads(state, nquads) {
async function hashNquads(state, nquads) {
// Care should be taken that the final data to be hashed include a single `/n`
// for every quad, before joining the quads into a string that must be hashed
return computeHash(state, concatNquads(nquads));
Expand Down Expand Up @@ -101,8 +109,10 @@ exports.quadsToNquads = quadsToNquads;
* @param quads
* @param sort - whether the quads must be sorted before hash. Defaults to `true`.
* @returns - hash value
*
* @async
*/
function hashDataset(state, quads, sort = true) {
async function hashDataset(state, quads, sort = true) {
const nquads = quadsToNquads(quads, sort);
return hashNquads(state, nquads);
}
Expand Down
Loading

0 comments on commit 6393696

Please sign in to comment.