Skip to content

Commit

Permalink
Merge pull request #184 from KnowledgeCaptureAndDiscovery/dev
Browse files Browse the repository at this point in the history
Fix #178
  • Loading branch information
dgarijo authored Mar 30, 2021
2 parents 1816a1e + 869ef00 commit e0e035f
Show file tree
Hide file tree
Showing 3 changed files with 32 additions and 0 deletions.
23 changes: 23 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
SOMEF supports three main output formats. Each of them contains different information with different levels of granularity. Below we enumerate them from more granular to less granular:

### JSON format
Simple JSON representation that indicates, for each extracted metadata category, the technique used for its extraction and its confidence, in addition to the detected excerpt. The JSON snippet below shows an example for the Description category of a Python library.

```json
"description": [
{
"excerpt": "KGTK is a Python library ...",
"confidence": [0.8294290479925978],
"technique": "Supervised classification"
}
]
```
The `confidence` depends on the `technique` used. In this case, the confidence is driven by the classifier which makes the prediction.

The techniques can be of several types: `header analysis`, `supervised classification`, `file exploration`, `GitHub API` and `regular expression`. Among these, only `supervised classification` provides a confidence different to `1`.

### Turtle format
RDF representation using the [Software Description Ontology](https://w3id.org/okn/o/sd/). The snippet below shows a sample description of a software entry. The `excerpt` and `confidence` fields are ommitted in this representation (every category with confidence above the threshold specified when running SOMEF will be included in the results)

### Codemeta format
JSON-LD representation following the [Codemeta specification](https://codemeta.github.io/) (which itself extends [Schema.org](https://schema.org/)). The `excerpt` and `confidence` fields are ommitted in this representation (every category with confidence above the threshold specified when running SOMEF will be included in the results). In addition, any metadata category outside from what is defined in Codemeta will be avoided.
8 changes: 8 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,14 @@ To obtain the same information as a JSON-LD file:
somef describe -r https://github.com/dgarijo/Widoco/ -g test.jsonld -f json-ld -t 0.8
```

If you prefer to export as a [Codemeta](https://codemeta.github.io/) JSON-LD, just type:

```bash
somef describe -r https://github.com/dgarijo/Widoco/ -c test.json
```

For more information about the output types supported by SOMEF, please see [the output format help page](https://somef.readthedocs.io/en/latest/output/).

We recommend having a high value for the `threshold` parameter, 0.8 (default) or above.

To see a live usage example, try our Binder Notebook: [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/KnowledgeCaptureAndDiscovery/somef/HEAD?filepath=notebook%2FSOMEF%20Usage%20Example.ipynb)
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ nav:
- Home: index.md
- Install: install.md
- Usage: usage.md
- Output: output.md
theme:
name: material

Expand Down

0 comments on commit e0e035f

Please sign in to comment.