From 869ef008604956c0d8c57a94edb77d97ec387a8a Mon Sep 17 00:00:00 2001 From: Daniel Garijo Date: Tue, 30 Mar 2021 13:56:30 +0200 Subject: [PATCH] Fix #178 --- docs/output.md | 23 +++++++++++++++++++++++ docs/usage.md | 8 ++++++++ mkdocs.yml | 1 + 3 files changed, 32 insertions(+) create mode 100644 docs/output.md diff --git a/docs/output.md b/docs/output.md new file mode 100644 index 00000000..1b8c7ca3 --- /dev/null +++ b/docs/output.md @@ -0,0 +1,23 @@ +SOMEF supports three main output formats. Each of them contains different information with different levels of granularity. Below we enumerate them from more granular to less granular: + +### JSON format +Simple JSON representation that indicates, for each extracted metadata category, the technique used for its extraction and its confidence, in addition to the detected excerpt. The JSON snippet below shows an example for the Description category of a Python library. + +```json +"description": [ + { + "excerpt": "KGTK is a Python library ...", + "confidence": [0.8294290479925978], + "technique": "Supervised classification" + } + ] +``` +The `confidence` depends on the `technique` used. In this case, the confidence is driven by the classifier which makes the prediction. + +The techniques can be of several types: `header analysis`, `supervised classification`, `file exploration`, `GitHub API` and `regular expression`. Among these, only `supervised classification` provides a confidence different to `1`. + +### Turtle format +RDF representation using the [Software Description Ontology](https://w3id.org/okn/o/sd/). The snippet below shows a sample description of a software entry. The `excerpt` and `confidence` fields are ommitted in this representation (every category with confidence above the threshold specified when running SOMEF will be included in the results) + +### Codemeta format +JSON-LD representation following the [Codemeta specification](https://codemeta.github.io/) (which itself extends [Schema.org](https://schema.org/)). The `excerpt` and `confidence` fields are ommitted in this representation (every category with confidence above the threshold specified when running SOMEF will be included in the results). In addition, any metadata category outside from what is defined in Codemeta will be avoided. diff --git a/docs/usage.md b/docs/usage.md index e0ce7c5a..c98fe91e 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -63,6 +63,14 @@ To obtain the same information as a JSON-LD file: somef describe -r https://github.com/dgarijo/Widoco/ -g test.jsonld -f json-ld -t 0.8 ``` +If you prefer to export as a [Codemeta](https://codemeta.github.io/) JSON-LD, just type: + +```bash +somef describe -r https://github.com/dgarijo/Widoco/ -c test.json +``` + +For more information about the output types supported by SOMEF, please see [the output format help page](https://somef.readthedocs.io/en/latest/output/). + We recommend having a high value for the `threshold` parameter, 0.8 (default) or above. To see a live usage example, try our Binder Notebook: [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/KnowledgeCaptureAndDiscovery/somef/HEAD?filepath=notebook%2FSOMEF%20Usage%20Example.ipynb) \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 70800510..e3e2801d 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -3,6 +3,7 @@ nav: - Home: index.md - Install: install.md - Usage: usage.md + - Output: output.md theme: name: material