-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Action only runs if changes were made to the input files
- Loading branch information
Showing
5 changed files
with
304 additions
and
29 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,11 @@ | ||
# rml-action | ||
|
||
`rml-action` is a GitHub Action that converts a structured data source file (e.g. JSON, XML, CSV...) to RDF | ||
[Resource Description Framework (RDF)](https://www.w3.org/RDF/). Multiple serialization formats are supported: `nquads` (default), `turtle`, `trig`, `trix`, `jsonld`, `hdt`. | ||
`rml-action` is a GitHub Action that converts a structured data source file (e.g. JSON, XML, CSV...) to [Resource Description Framework (RDF)](https://www.w3.org/RDF/) rules. | ||
Multiple serialization formats are supported: `nquads` (default), `turtle`, `trig`, `trix`, `jsonld`, `hdt`. | ||
|
||
## Usage | ||
|
||
Create a `.github/workflows/data.yaml` file in the repository where you want to fetch data. An example: | ||
Create a `.github/workflows/data.yaml` file in the repository where you want to fetch and convert data. An example: | ||
|
||
```yaml | ||
name: Convert to RDF Workflow | ||
|
@@ -26,45 +26,48 @@ jobs: | |
GLOBAL_PATTERN: "*.yml" | ||
SERIALIZATION_FORMAT: turtle | ||
OUTPUT_DIRECTORY: output | ||
CONVERT_ALL: true | ||
steps: | ||
# Checks-out your repository | ||
- uses: actions/checkout@v2 | ||
|
||
- name: Creates an output directory for RDF files (if doesn't exist) | ||
run: mkdir -p output | ||
shell: bash | ||
|
||
- name: Converts YARRRML rules to RDF | ||
uses: RMLio/rml-action@main | ||
uses: RMLio/rml-action@v1.0.0 | ||
with: | ||
# the global pattern for all YARRRML mappings | ||
global-pattern: ${{ env.GLOBAL_PATTERN }} | ||
# serialization format is optional; default - "nquads" | ||
serialization-format: ${{ env.SERIALIZATION_FORMAT }} | ||
# the name of the directory where all the output files will be stored | ||
output-directory: ${{ env.OUTPUT_DIRECTORY }} | ||
# convert-all is optional; default - "false" | ||
# if convert-all is "true", the action will always convert all the files to | ||
# RDF based on the yarrrml-files provided by `GLOBAL_PATTERN`, even if no | ||
# changes were detected | ||
convert-all: ${{ env.CONVERT_ALL }} | ||
|
||
# Push the generated RDF files to the repository | ||
- name: Commit and push the output | ||
run: | | ||
git config --global user.name 'your_username' | ||
git config --global user.email '[email protected]' | ||
git add . | ||
set +e | ||
git status | grep "nothing to commit, working tree clean" | ||
if [ $? -eq 0 ]; then set -e; echo "No changes since last run"; else set -e; \ | ||
if [ $? -eq 0 ]; then set -e; echo "INFO: No changes since last run"; else set -e; \ | ||
git commit -m "feat: convert to RDF with Github Actions"; git push; fi | ||
shell: bash | ||
``` | ||
|
||
If you are using the example that was provided above: | ||
If you are using the example workflow that was provided above, make sure to update it as follows: | ||
|
||
- Make sure to check whether the conditions to trigger the action are set properly (change the name of the branch(-es) if needed etc.). | ||
- Configure the input parameters for the action (`GLOBAL_PATTERN`, `SERIALIZATION_FORMAT` and `OUTPUT_DIRECTORY`). | ||
- Verify whether the conditions to trigger the action are set properly (change the name of the branch(-es) if needed etc.). | ||
- Configure the environment variables for the input parameters for the action under `jobs` > `build` > `env` (`GLOBAL_PATTERN`, `SERIALIZATION_FORMAT`, `OUTPUT_DIRECTORY` and `CONVERT_ALL`). | ||
- In the "Commit and push the output" step, replace `user.name` and `user.email` from the example with your github username and email. You may also want to change the commit message that will be used to commit the files created by the action. | ||
|
||
The `RMLio/rml-action` action will perform the following operations: | ||
|
||
1. iterate over all files matching the provided global pattern (which are all expected to contain `YARRRML` rules) | ||
1. iterate over all files matching the provided global pattern (which are all expected to contain `YARRRML` rules and have an extension `.yaml` or `.yml`) | ||
2. convert `YARRRML` rules in all these files to RDF | ||
|
||
**Note:** you need to follow the guidelines of the above workflow file example (step "Commit and push the output") to commit and push all of the generated data to your repository. | ||
|
@@ -73,12 +76,25 @@ The `RMLio/rml-action` action will perform the following operations: | |
|
||
### `global-pattern` | ||
|
||
The global pattern that matches all the mapping files that need to be converted. | ||
The global pattern that matches all the mapping files that need to be converted (e.g. `"*.yml"`). The pattern has to be surrounded by quotes. | ||
|
||
### `serialization-format` (optional) | ||
|
||
The serialization format that needs to be used for convertion. Default: `nquads`. | ||
The serialization format that needs to be used for conversion. Default: `nquads`. Possible values: `nquads`, `turtle`, `trig`, `trix`, `jsonld`, `hdt`. | ||
|
||
### `output-directory` | ||
|
||
The relative path from the root of your repository to a directory where the output files will be stored. | ||
The relative path from the root of your repository to a directory where the output files will be stored (e.g. `output` (or `path_from_root/output_folder_name`), this will save all the output files to a folder named `output` (or `path_from_root/output_folder_name`) that can be found at the root of the repository). | ||
|
||
### `convert-all` (optional) | ||
|
||
An indicator as to whether or not the conversion should be run for all files. Default: `false`. Possible values: `true`, `false`. | ||
If `convert-all` is set to `true`, all files will be converted, even if no changes were detected. | ||
If the meta folder of the action (`rml_action_meta`) or some file in that folder is not present (e.g. it was deleted), again, all files will be converted, even if no changes were made to the input files. | ||
|
||
## Important remarks | ||
|
||
- Don't remove the meta folder for this action (`rml_action_meta`). This folder is created when the action runs for the first time and contains the information that is relevant for it. Removing this folder won't cause any errors - it will just be created again, but this will result in a performance loss, since all the files will be converted again. | ||
- Changes to the output folder are not detected. This means that if you remove a part of or all of the files that were already generated and are stored in the output folder, they will not be generated again by default. In this case, you might want to set `convert-all` to `true` to convert all the files once again. | ||
- If some files (yarrrml-files or data source files) have been added/removed or renamed, the action will run for all the files (all of them will be converted). | ||
- If some files (yarrrml-files or data source files) have been modified, the action will only convert the modified files (if data source files were modified) or the files that are a part of yarrrml-files that were modified. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
#!/bin/bash | ||
|
||
# if "convert_all" input parameter was set to TRUE, | ||
# convert all files anyway (e.g. if one of the output files or the output folder and | ||
# all the files were deleted and need to be recovered) | ||
if [[ $CONVERT_ALL == "true" ]] | ||
then | ||
echo "INFO: convert-all is true => running the action" | ||
exit 1 | ||
fi | ||
|
||
cd $WORKING_DIRECTORY | ||
meta_dir="rml_action_meta" | ||
|
||
# "rml_action_meta" is the directory that contains the metadata for the action | ||
# "yamlToDs.md5" is the checksum of a list of all mapping files used for conversion | ||
# and the data source files that need to be converted | ||
# "contents.md5" is the checksum of contents of all mapping files and all the data source files | ||
# if both checksums exist and are correct, finish the action without conversion | ||
|
||
if [[ ! -f $meta_dir/yamlToDs.md5 || ! -f $meta_dir/contents.md5 ]] | ||
then | ||
# one of the checksums is not present => run the action | ||
echo "INFO: one of the checksums is not present => running the action" | ||
exit 1 | ||
fi | ||
|
||
# check if the checksum for the list of filenames hasn't changed | ||
# if it has, some files have been added/removed or renamed, | ||
# so the action should be run further (conversion) | ||
md5sum --status --check $meta_dir/yamlToDs.md5 | ||
FIRST_CHECKSUM_RESULT=$? | ||
# check if the checksum for the contents of mapping files and data source files | ||
# is still the same; if it's not, run the action further (conversion) | ||
md5sum --status --check $meta_dir/contents.md5 | ||
SECOND_CHECKSUM_RESULT=$? | ||
|
||
if [[ $FIRST_CHECKSUM_RESULT == 0 && $SECOND_CHECKSUM_RESULT == 0 ]] | ||
then | ||
# there are no changes, don't run the action | ||
echo "INFO: No changes, stopping the action" | ||
exit 0 | ||
fi | ||
|
||
# the action needs to be run in this case | ||
echo "INFO: Changes detected: running the action" | ||
exit 1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,28 +1,113 @@ | ||
#!/bin/bash | ||
|
||
while read filepath | ||
do | ||
if [[ "$filepath" == *".github"* ]]; then | ||
continue | ||
cd $WORKING_DIRECTORY | ||
meta_dir="rml_action_meta" | ||
|
||
# a list of files that will be used for conversion | ||
temp_filenames="$meta_dir/temp_filenames.txt" | ||
> $temp_filenames | ||
|
||
# if the action was called, then either 'yamlToDs.md5' or 'contents.md5' has changed, | ||
# or one of them is not present | ||
# if 'yamlToDs.md5' has changed, some files have been added/removed or renamed | ||
# if 'contents.md5' has changed, the contents of some files listed in 'yamlToDs' have changed | ||
|
||
# check if the checksum for "yamlToDs.txt" has changed | ||
md5sum --status --check $meta_dir/yamlToDs.md5 | ||
FIRST_CHECKSUM_RESULT=$? | ||
|
||
# if convert-all input parameter was set to true or there is no "yamlToDs.md5" or | ||
# the checksum "yamlToDs.md5" has changed, convert all files that were given by the | ||
# global pattern | ||
if [[ $CONVERT_ALL == "true" || ! -f $meta_dir/yamlToDs.md5 || \ | ||
! -f $meta_dir/contents.md5 || $FIRST_CHECKSUM_RESULT != 0 ]] | ||
then | ||
echo "INFO: either convert-all is true or one of the checksums is not present \ | ||
or the checksum for the list of filenames does not match" | ||
# add all the filenames to a list of files that will be used for conversion, because either | ||
# some files have been added/removed/renamed or the checksum of the contents is not present | ||
cat $meta_dir/yamlToDs.txt | cut -d ' ' -f1 > $temp_filenames | ||
if [[ ! -f $meta_dir/yamlToDs.md5 || $FIRST_CHECKSUM_RESULT != 0 ]] | ||
then | ||
echo "INFO: the checksum for the list of filenames is not present or doesn't match" | ||
# recalculate the checksum if it does not exist or has changed | ||
md5sum $meta_dir/yamlToDs.txt > $meta_dir/yamlToDs.md5 | ||
fi | ||
else | ||
# the checksum "contents.md5" has changed | ||
# get all files that have changed, save yaml files to a list of files that will be used for | ||
# conversion, map data source files to the mapping files and then save these mapping files | ||
# to the same list in `$temp_filenames` | ||
echo "INFO: the second checksum (contents) doesn't match" | ||
md5sum --check $meta_dir/contents.md5 | grep -F "FAILED" | cut -f 1 -d ":" > changed_files.txt | ||
echo "INFO: changed files are:" | ||
cat changed_files.txt | ||
echo | ||
egrep "*.yml|*.yaml" changed_files.txt >> $temp_filenames | ||
egrep -v "*.yml|*.yaml" changed_files.txt | grep -F -f - $meta_dir/yamlToDs.txt | \ | ||
cut -d " " -f1 >> $temp_filenames | ||
rm -f changed_files.txt | ||
fi | ||
|
||
# (re-)calculate the checksum for the contents | ||
md5sum $(cat $meta_dir/yamlToDs.txt | tr -s ' ' '\n') > $meta_dir/contents.md5 | ||
|
||
# determine the correct extension for the output file based on the chosen serialization format | ||
EXTENSION="" | ||
if [[ $SERIALIZATION_FORMAT == "nquads" ]] | ||
then | ||
EXTENSION="nq" | ||
elif [[ $SERIALIZATION_FORMAT == "turtle" ]] | ||
then | ||
EXTENSION="ttl" | ||
elif [[ $SERIALIZATION_FORMAT == "trig" ]] | ||
then | ||
EXTENSION="trig" | ||
elif [[ $SERIALIZATION_FORMAT == "trix" ]] | ||
then | ||
EXTENSION="xml" | ||
elif [[ $SERIALIZATION_FORMAT == "jsonld" ]] | ||
then | ||
EXTENSION="jsonld" | ||
elif [[ $SERIALIZATION_FORMAT == "hdt" ]] | ||
then | ||
EXTENSION="hdt" | ||
else | ||
echo "ERROR: Unsupported serialization format" >> /dev/stderr | ||
exit 1 | ||
fi | ||
|
||
# get rid of the duplicates (e.g. in case multiple data source files | ||
# for the same mapping file were modified) | ||
sort $temp_filenames | uniq > unique_filenames.txt | ||
cp unique_filenames.txt $temp_filenames | ||
rm -f unique_filenames.txt | ||
|
||
echo "INFO: Files for conversion are:" | ||
cat $temp_filenames | ||
|
||
while read filepath | ||
do | ||
# get a basename from the path | ||
FILE_BASENAME=$(basename $filepath) | ||
# get a filename without an extension | ||
FILENAME=$(echo "$FILE_BASENAME" | sed -e 's/\..*//') | ||
# filename for the output file containing RDF | ||
OUTPUT_FILENAME="${FILENAME}_output.ttl" | ||
OUTPUT_FILENAME="${FILENAME}_output.${EXTENSION}" | ||
# get a directory name from the path | ||
# and go to that directory | ||
FILE_DIRNAME=$(dirname $filepath) | ||
cd $FILE_DIRNAME | ||
# convert YARRRML rules to RML | ||
# convert YARRRML rules to RML rules | ||
yarrrml-parser -i $FILE_BASENAME -o $WORKING_DIRECTORY/temp_rml_rules.rml.ttl | ||
# convert RML rules to RDF and save it to the output folder | ||
# convert RML rules to RDF and save the result to the output folder | ||
java -jar $WORKING_DIRECTORY/rmlmapper.jar -m $WORKING_DIRECTORY/temp_rml_rules.rml.ttl \ | ||
-o $WORKING_DIRECTORY/$INPUTS_OUTPUT_DIRECTORY/$OUTPUT_FILENAME -s $SERIALIZATION_FORMAT | ||
cd $WORKING_DIRECTORY | ||
done | ||
done < $temp_filenames | ||
|
||
# remove the temporary file with RML rules | ||
rm -f $WORKING_DIRECTORY/temp_rml_rules.rml.ttl | ||
rm -f temp_rml_rules.rml.ttl | ||
|
||
# remove the temporary file with all the filenames for the action | ||
rm -f $temp_filenames |
Oops, something went wrong.