Skip to content

Commit

Permalink
Merge pull request #172 from KnowledgeCaptureAndDiscovery/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
dgarijo authored Mar 10, 2021
2 parents 01d4b48 + 31bf319 commit 70ddbe9
Show file tree
Hide file tree
Showing 21 changed files with 26,499 additions and 28 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ __pycache__/
Lib/*
Scripts/*
.idea/*
test.json
35 changes: 34 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,39 @@ Software Metadata Extraction Framework: A command line interface for automatical

**Authors:** Daniel Garijo, Allen Mao, Haripriya Dharmala, Vedant Diwanji, Jiaying Wang and Aidan Kelley.

## Features
Given a readme file (or a GitHub repository) SOMEF will extract the following categories (if present):

- **Name**: Name identifying a software component
- **Full name**: Name + owner (owner/name)
- **Full title**: If the repository is a short name, we will attempt to extract the longer version of the repository name
- **Description**: A description of what the software does.
- **Citation**: Preferred citation (usually in `.bib` form) as the authors have stated in their readme file.
- **Installation instructions**: A set of instructions that indicate how to install a target repository
- **Invocation**: Execution command(s) needed to run a scientific software component
- **Usage examples**: Assumptions and considerations recorded by the authors when executing a software component, or examples on how to use it.
- **Documentation**: Where to find additional documentation about a software component.
- **Requirements**: Pre-requisites and dependencies needed to execute a software component.
- **Contributors**: Contirbutors to a software component
- **FAQ**: Frequently asked questions about a software component
- **Support**: Guidelines and links of where to obtain support for a software component
- **License**: License and usage terms of a software component
- **Contact**: Contact person responsible for maintaining a software component
- **Download URL**: URL where to download the target software (typically the installer, package or a tarball to a stable version)
- **DOI**: Digital Object Identifier associated with the software (if any)
- **DockerFile**: Build file to create a Docker image for the target software
- **Notebooks**: Jupyter notebooks included in a repository
- **Executable notebooks**: Jupyter notebooks ready for execution (e.g., through myBinder)
- **Owner**: Name of the user or organization in charge of the repository
- **Keywords**: set of terms used to commonly identify a software component
- **Source code**: Link to the source code (typically the repository where the readme can be found)
- **Releases**: Pointer to the available versions of a software component
- **Changelog**: Description of the changes between versions
- **Issue tracker**: Link where to open issues for the target repository
- **Programming languages**: Languages used in the repository

We use different supervised classifiers, header analysis, regular expressions and the GitHub API to retrieve all these fields (more than one technique may be used for each field)

## Documentation
See full documentation at [https://somef.readthedocs.io/en/latest/](https://somef.readthedocs.io/en/latest/)

Expand Down Expand Up @@ -48,7 +81,7 @@ Commands:
version Show somef version.
```

## Installing Through Docker
## Installing through Docker
We provide a Docker image with SOMEF already installed. To run through Docker, you may build the Dockerfile provided in the repository by running:

```bash
Expand Down
4 changes: 3 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ SOMEF has currently been tested with GitHub repositories, but it can extract met
Given a readme file (or a GitHub repository) SOMEF will extract the following categories (if present):

- **Name**: Name identifying a software component
- **Full name**: Full name of the software (not abbreviated)
- **Full name**: Name + owner (owner/name)
- **Full title**: If the repository is a short name, we will attempt to extract the longer version of the repository name
- **Description**: A description of what the software does.
- **Citation**: Preferred citation (usually in `.bib` form) as the authors have stated in their readme file.
- **Installation instructions**: A set of instructions that indicate how to install a target repository
Expand All @@ -30,6 +31,7 @@ Given a readme file (or a GitHub repository) SOMEF will extract the following ca
- **DOI**: Digital Object Identifier associated with the software (if any)
- **DockerFile**: Build file to create a Docker image for the target software
- **Notebooks**: Jupyter notebooks included in a repository
- **Executable notebooks**: Jupyter notebooks ready for execution (e.g., through myBinder)
- **Owner**: Name of the user or organization in charge of the repository
- **Keywords**: set of terms used to commonly identify a software component
- **Source code**: Link to the source code (typically the repository where the readme can be found)
Expand Down
Loading

0 comments on commit 70ddbe9

Please sign in to comment.