diff --git a/.gitignore b/.gitignore index e2f847d..c4ca12b 100644 --- a/.gitignore +++ b/.gitignore @@ -3,3 +3,7 @@ env output scripts synergy + +credentials.yml +tmp* +jobs.sh.part* diff --git a/40-kubernetes.md b/40-kubernetes.md index 12d2203..f31904b 100644 --- a/40-kubernetes.md +++ b/40-kubernetes.md @@ -70,6 +70,8 @@ All the `.yml` files that you need to run below are inside the `k8-config` folde The Dockerfiles and scripts are inside `code`. Remember to change to the correct folder as necessary. +## Specific preparation + First, follow the specific guides to setup your local computer or cluster: - [Single computer](41-kubernetes-single-computer.md) @@ -84,20 +86,18 @@ Run the following command taken from [RabbitMQ Cluster Operator](https://www.rab kubectl apply -f "https://github.com/rabbitmq/cluster-operator/releases/latest/download/cluster-operator.yml" ``` -## Create a namespace for asreview things +## Start RabbitMQ configuration -The configuration files use the namespace `asreview-cloud` by default, so if you want to change it, you need to change in the file below and all other places that have `# namespace: asreview-cloud`. +Run ```bash -kubectl apply -f asreview-cloud-namespace.yml +kubectl apply -f rabbitmq.yml ``` -## Start RabbitMQ configuration - -Run +Check that the `rabbitmq-server-0` pod starts running after a minute or two: ```bash -kubectl apply -f rabbitmq.yml +kubectl -n asreview-cloud get pods ``` ## S3 storage (_Optional step_) @@ -134,12 +134,62 @@ To change that, edit [tasker.sh](code/tasker.sh). The [tasker.sh](code/tasker.sh) defines everything that will be executed by the tasker, and indirectly by the workers. The [tasker.Dockerfile](code/tasker.Dockerfile) will create the image that will be executed in the tasker pod. You can modify these as you see fit. -After you are done, compile and push the image: + +The default commands used inside the tasker script and Dockerfile assume that you are: + +- simulating using data from a `data` folder. +- running various settings, classifiers, and/or feature extractors. +- running a custom ARFI template. +- aggregating all jobs.sh into a single one. + +### Data + +If you are providing the data, create a `data` folder inside the `code` folder and put your csv files in there. > **Warning** > -> The default tasker assumes that a data folder exists with your data. -> Make sure to either provide the data or change the tasker and Dockerfile. +> Don't skip this part, you either need to create a data folder, or change below. + +If, instead, you want to use the Synergy data set, edit [tasker.Dockerfile](code/tasker.Dockerfile) and look for the relevant lines. + +### Settings, classifiers and feature extractors + +Like we did for the use case ["Running many jobs.sh files one after the other"](30-many-jobs.md), each line of the file [makita-args.txt](code/makita-args.txt) contains a different setting that you can pass to the asreview command. + +By default, we are running `-m logistic -e tfidf` and `-m nb -e tfidf`. +Edit the file if you want to change or add more. + +### Custom ARFI template + +We also assume that we are running a custom ARFI template [custom_arfi.txt.template](code/custom_arfi.txt.template). +The template contains placeholder values related to the settings mentioned in the section above. +The placeholder `SETTINGS_PLACEHOLDER` will be substituded by each line of the [makita-args.txt](code/makita-args.txt) file. +The placeholder `SETTINGS_DIR` is used to create a folder one level above the data. +By default, the value of `SETTINGS_DIR` is equal to `SETTINGS_PLACEHOLDER`, except that spaces are substituded by `_`. + +This template also removes some unnecessary lines for our case (such as creating images and aggregating the results). + +Furthermore, it runs a new command `rm -f ...` to remove the `.asreview` project file after use. +This ensures that the disk space does not grow to absurd proportions. + +Finally, it moves three commands to the same line, to ensure that the same worker will run these in order: + +- simulate (which creates the project file); +- create metrics using the project file; +- delete the project file. + +### Aggregating all jobs.sh into a single jobs.sh file + +Instead of following ["Running many jobs.sh files one after the other"](30-many-jobs.md), we want to parallelize even between different jobs files. +To do that, we aggregate all `jobs.sh` files into a single one. +Then, when we split the file, all of the simulation calls of all jobs will be sent to the workers at the same time. +This allows scaling the number of workers even more. + +To keep things organized, we create an additional folder level before the dataset, which was described in the custom template above. + +### Build and push + +After you are done with modifications, compile and push the image: ```bash docker build -t YOURUSER/tasker -f tasker.Dockerfile . @@ -152,7 +202,7 @@ docker push YOURUSER/tasker ## Prepare the worker script and Docker image -The [worker.sh](code/worker.sh) defines a very short list of tasks: running [worker-receiver.py](code/worker-receiver.py). +The [worker.sh](code/worker.sh) script simply runs [worker-receiver.py](code/worker-receiver.py). You can do other things before that, but tasks that are meant to be run before **all** workers start working should go on [tasker.sh](code/tasker.sh). The [worker-receiver.py](code/worker-receiver.py) runs continuously, waiting for new tasks from the tasker. @@ -161,10 +211,20 @@ docker build -t YOURUSER/worker -f worker.Dockerfile . docker push YOURUSER/worker ``` +> **Note** +> +> We have created a small script that builds and pushes both images called [build-and-push.sh](code/build-and-push.sh). +> You can run it with `bash build-and-push.sh YOURUSER`. + ## Running the workers The file [worker.yml](k8-config/worker.yml) contains the configuration of the deployment of the workers. Change the `image` to reflect the path to the image that you pushed. + +> **Warning** +> +> Did you change the image? + You can select the number of `replicas` to change the number of workers. Pay attention to the resource limits, and change as you see fit. @@ -198,6 +258,11 @@ Logging as ... Similarly, the [tasker.yml](k8-config/tasker.yml) allows you to run the tasker as a Kubernetes job. Change the `image`, and optionally add a `ttlSecondsAfterFinished` to auto delete the task - I prefer to keep it until I review the log. + +> **Warning** +> +> Did you change the image? + Run ```bash @@ -206,6 +271,43 @@ kubectl apply -f tasker.yml Similarly, you should see a `tasker` pod, and you can follow its log. +## Retrieving the output + +You can copy the `output` folder from the volume with + +```bash +kubectl -n asreview-cloud cp asreview-worker-FULL-NAME:/app/workdir/output ./output +``` + +Also, check the `/app/workdir/issues` folder. +It should be empty, because it contains errors while running the simulate code. +If it is not empty, the infringing lines will be shown. + +### If you used NFS + +When you have an NFS server you can mount it. +Run the following command in a terminal: + +```bash +kubectl -n asreview-cloud port-forward nfs-server-FULL-NAME 2049 +``` + +In another terminal, run + +```bash +mkdir asreview-storage +sudo mount -v -o vers=4,loud localhost:/ asreview-storage +``` + +Copy things out as necessary. +When you're done, run + +```bash +sudo umount asreview-storage +``` + +And hit CTRL-C on the running `kubectl port-forward` command. + ## Deleting and restarting If you plan to make modifications to the tasker or the worker, they have to be deleted, respectivelly. diff --git a/41-kubernetes-single-computer.md b/41-kubernetes-single-computer.md index e0c6760..c4ba84d 100644 --- a/41-kubernetes-single-computer.md +++ b/41-kubernetes-single-computer.md @@ -79,6 +79,14 @@ minikube start --cpus CPU_NUMBER --memory HOW_MUCH_MEMORY The `CPU_NUMBER` argument is the number of CPUs you want to dedicate to `minikube`. The `HOW_MUCH_MEMORY` argument is how much memory. +## Create a namespace for asreview things + +The configuration files use the namespace `asreview-cloud` by default, so if you want to change it, you need to change in the file below and all other places that have `# namespace: asreview-cloud`. + +```bash +kubectl apply -f asreview-cloud-namespace.yml +``` + ## Create a volume To share data between the worker and taskers, and to keep that data after using it, we need to create a volume. @@ -112,15 +120,3 @@ volumes: persistentVolumeClaim: claimName: asreview-storage ``` - -### Retrieving the output - -You can copy the `output` folder from the volume with - -```bash -kubectl cp asreview-worker-FULL-NAME:/app/workdir/output ./output -``` - -Also, check the `/app/workdir/issues` folder. -It should be empty, because it contains errors while running the simulate code. -If it is not empty, the infringing lines will be shown. diff --git a/42-kubernetes-cloud-provider.md b/42-kubernetes-cloud-provider.md index ba75d16..4bf474f 100644 --- a/42-kubernetes-cloud-provider.md +++ b/42-kubernetes-cloud-provider.md @@ -10,6 +10,14 @@ You can check the guide for [Single computer](41-kubernetes-single-computer.md), You have to configure access to the cluster, and since that depends on the cloud provider, I will leave that to you. Please remember that all commands will assume that you are connecting to the cluster, which might involve additional flags to pass your credentials. +## Create a namespace for asreview things + +The configuration files use the namespace `asreview-cloud` by default, so if you want to change it, you need to change in the file below and all other places that have `# namespace: asreview-cloud`. + +```bash +kubectl apply -f asreview-cloud-namespace.yml +``` + ## Create a volume To share data between the worker and taskers, and to keep that data after using it, we need to create a volume. @@ -51,28 +59,3 @@ volumes: server: NFS_SERVICE_IP path: "/" ``` - -### Retrieving the output - -The easiest way to manipulate the output when you have an NFS server is to mount the NFS server. -Run the following command in a terminal: - -```bash -kubectl -n asreview-cloud port-forward nfs-server-FULL-NAME 2049 -``` - -In another terminal, run - -```bash -mkdir asreview-storage -sudo mount -v -o vers=4,loud localhost:/ asreview-storage -``` - -Copy things out as necessary. -When you're done, run - -```bash -sudo umount asreview-storage -``` - -And hit CTRL-C on the running `kubectl port-forward` command. diff --git a/USAGE.md b/USAGE.md deleted file mode 100644 index 401cd60..0000000 --- a/USAGE.md +++ /dev/null @@ -1,52 +0,0 @@ -# Examples of usage - -In this file, we will try to discuss a few common usages and what modifications would be necessary to achieve them. -If necessary, extra files will be provide in the folder [examples](examples). -Be sure to read the [README](README.md) first. -This file only supplements the README and it is not supposed to be enough to quickstart. - -By the way, if you are only testing and don't have a data set yet, you can download one from [asreview-makita](https://github.com/asreview/asreview-makita/blob/8272698e6114106c1f44cfcf7ed85c92ba50d13a/examples/arfi_example/data/van_de_Schoot_2018.csv). - -## Provide only data and run makita on the cloud - -This is the default usage inside the `tasker.sh`. -You only need to: - -- Create a folder `data` with your `.csv` files. -- Modify `tasker.sh` to run the makita template that you want. (Search for `asreview makita`). The default execution is ARFI. -- If necessary, modify `worker.Dockefile` to install more packages. - -## Custom ARFI (different models) and synergy data - -The files inside [examples/custom_arfi_synergy](examples/custom_arfi_synergy/) will allow you to run all data in the Synergy dataset and pass specific classifiers and feature extractors. - -Copy the files there to the root: - -```bash -cp -f examples/custom_arfi_synergy/* . -``` - -Then, modify `SETTINGS` in `tasker.sh` to your liking. - -> **Note** -> -> Since this model runs all files, it is advisable to test the execution with a single simulate call from each file. -> To do that, run `cp custom_arfi.txt.template.test custom_arfi.txt.template` before building your images. -> It might also help to remove the larger files Brouwer_2019.csv and Walker_2018csv. - -### More details - -The `custom_arfi.txt.template` file has the following modifications from the basic ARFI template: - -- it adds an argument `SETTINGS_PLACEHOLDER` to the simulate call. -- it runs `simulate`, `metrics`, and then removes the `.asreview` project file in a single execution. This is done to avoid filling up the disk with `.asreview` files. -- it does not produce a plot, because the `.asreview` files and not present anymore. - -The `tasker.sh` introduces a bash variable `SETTINGS`, which defaults to empty. -If you want to pass a different classifiers and/or feature extractor, change this variable. -An example is given. Then, Makita is called with the template argument to create a `jobs.sh` file with the `SETTINGS_PLACEHOLDER` argument. -Finally, using `sed`, we substitute the `SETTINGS_PLACEHOLDER` by the value of the `SETTINGS` variable. - -The `worker.Dockerfile` updates Makita and adds more packages to support extra models. - -The `tasker.Dockerfile` install the synergy dataset and downloads all files into the `/app/data` folder. diff --git a/code/build-and-push.sh b/code/build-and-push.sh new file mode 100644 index 0000000..57d83c9 --- /dev/null +++ b/code/build-and-push.sh @@ -0,0 +1,20 @@ +#!/bin/bash + +YOURUSER=$1 + +if [ -z "$YOURUSER" ]; then + echo "ERROR: Missing YOURUSER. Run 'bash build-and-push.sh YOURUSER'" + exit 1 +fi + +for f in worker tasker +do + if ! docker build -t "$YOURUSER/$f" -f $f.Dockerfile .; then + echo "ERROR building docker image" + exit 1 + fi + if ! docker push "$YOURUSER/$f"; then + echo "ERROR pushing docker image" + exit 1 + fi +done diff --git a/code/custom_arfi.txt.template b/code/custom_arfi.txt.template new file mode 100644 index 0000000..f080c39 --- /dev/null +++ b/code/custom_arfi.txt.template @@ -0,0 +1,29 @@ +--- +name: ARFI-settings +name_long: All Relevant, Fixed Irrelevant, with settings + +scripts: + - get_plot.py + - merge_descriptives.py + - merge_metrics.py + - merge_tds.py + +docs: + - README.md + +--- +#!/bin/bash +{# This is a template for the ARFI method #} +# version {{ version }} + +{% for dataset in datasets %} +mkdir -p {{ output_folder }}/simulation/SETTINGS_DIR/{{ dataset.input_file_stem }}/metrics +mkdir -p {{ output_folder }}/simulation/SETTINGS_DIR/{{ dataset.input_file_stem }}/descriptives +asreview data describe {{ dataset.input_file }} -o {{ output_folder }}/simulation/SETTINGS_DIR/{{ dataset.input_file_stem }}/descriptives/data_stats_{{ dataset.input_file_stem }}.json +mkdir -p {{ output_folder }}/simulation/SETTINGS_DIR/{{ dataset.input_file_stem }}/state_files + +{% for prior in dataset.priors %} +asreview simulate {{ dataset.input_file }} SETTINGS_PLACEHOLDER -s {{ output_folder }}/simulation/SETTINGS_DIR/{{ dataset.input_file_stem }}/state_files/sim_{{ dataset.input_file_stem }}_{{ prior[0] }}.asreview --prior_record_id {{ " ".join(prior) }} --seed {{ dataset.model_seed }} && asreview metrics {{ output_folder }}/simulation/SETTINGS_DIR/{{ dataset.input_file_stem }}/state_files/sim_{{ dataset.input_file_stem }}_{{ prior[0] }}.asreview -o {{ output_folder }}/simulation/SETTINGS_DIR/{{ dataset.input_file_stem }}/metrics/metrics_sim_{{ dataset.input_file_stem }}_{{ prior[0] }}.json && rm -f {{ output_folder }}/simulation/SETTINGS_DIR/{{ dataset.input_file_stem }}/state_files/sim_{{ dataset.input_file_stem }}_{{ prior[0] }}.asreview +{% endfor %} + +{% endfor %} diff --git a/code/makita-args.txt b/code/makita-args.txt new file mode 100644 index 0000000..5220dde --- /dev/null +++ b/code/makita-args.txt @@ -0,0 +1,2 @@ +-m logistic -e tfidf +-m nb -e tfidf diff --git a/code/tasker.Dockerfile b/code/tasker.Dockerfile index 2bac083..689914c 100644 --- a/code/tasker.Dockerfile +++ b/code/tasker.Dockerfile @@ -1,22 +1,23 @@ -FROM ghcr.io/asreview/asreview:v1.2 +FROM ghcr.io/asreview/asreview:v1.2.1 RUN apt-get update && \ - apt-get install -y curl ca-certificates amqp-tools python \ + apt-get install -y curl ca-certificates amqp-tools python3 \ --no-install-recommends \ && rm -rf /var/lib/apt/lists/* \ && pip install pika #### Don't modify above this line +# Alternative 1: Copy your data folder COPY data /app/data +# Alternative 2: Install and synergy-dataset +# RUN pip install synergy-dataset +# RUN mkdir -p /app/data +# RUN synergy get -l -o ./app/data -# This is necessary until a new asreview-makita is released and the asreview image is updated -RUN apt-get update && \ - apt-get install -y git \ - --no-install-recommends \ - && rm -rf /var/lib/apt/lists/* \ - && pip install asreview-makita #### Don't modify below this line +COPY ./custom_arfi.txt.template /app/custom_arfi.txt.template +COPY ./makita-args.txt /app/makita-args.txt COPY ./split-file.py /app/split-file.py COPY ./tasker-send.py /app/tasker-send.py COPY ./tasker.sh /app/tasker.sh diff --git a/code/tasker.sh b/code/tasker.sh index 0cb8e70..5900f8d 100644 --- a/code/tasker.sh +++ b/code/tasker.sh @@ -13,6 +13,8 @@ rm -rf /app/workdir/* # Copy files from the parent folder for the workdir. cp ../*.sh ../*.py ./ cp -r ../data ./ +cp ../custom_arfi.txt.template ./ +cp ../makita-args.txt ./ # Create a logging function function log { @@ -22,7 +24,16 @@ function log { # Run makita log "Running makita" -asreview makita template arfi -f jobs.sh +echo "" > all-jobs.sh +while read -r SETTINGS +do + SETTINGS_DIR=$(echo "$SETTINGS" | tr ' ' '_') + echo "A" | asreview makita template arfi --template custom_arfi.txt.template -f jobs.sh + sed -i "s/SETTINGS_PLACEHOLDER/$SETTINGS/g" jobs.sh + sed -i "s/SETTINGS_DIR/$SETTINGS_DIR/g" jobs.sh + cat jobs.sh >> all-jobs.sh +done < makita-args.txt +mv all-jobs.sh jobs.sh # Define the S3_PREFIX, using whatever you think makes sense. # This file is run exactly once, to it makes sense to use the date. # You could also use the settings, if any. @@ -46,6 +57,3 @@ log "Sending part 3 to rabbitmq" python tasker-send.py jobs.sh.part3 log "Done" - -# Send results someplace? -# TODO diff --git a/code/worker.Dockerfile b/code/worker.Dockerfile index da88948..36aaa8a 100644 --- a/code/worker.Dockerfile +++ b/code/worker.Dockerfile @@ -1,7 +1,7 @@ -FROM ghcr.io/asreview/asreview:v1.2 +FROM ghcr.io/asreview/asreview:v1.2.1 RUN apt-get update && \ - apt-get install -y curl ca-certificates amqp-tools python \ + apt-get install -y curl ca-certificates amqp-tools python3 \ --no-install-recommends \ && rm -rf /var/lib/apt/lists/* \ && pip install boto3 pika diff --git a/examples/custom_arfi_synergy/custom_arfi.txt.template b/examples/custom_arfi_synergy/custom_arfi.txt.template deleted file mode 100644 index 654c82a..0000000 --- a/examples/custom_arfi_synergy/custom_arfi.txt.template +++ /dev/null @@ -1,51 +0,0 @@ ---- -name: ARFI -name_long: All Relevant, Fixed Irrelevant - -scripts: - - get_plot.py - - merge_descriptives.py - - merge_metrics.py - - merge_tds.py - -docs: - - README.md - ---- - -{# This is a template for the ARFI method #} - -# version {{ version }} - -# Create folder structure. By default, the folder 'output' is used to store output. -mkdir {{ output_folder }} -mkdir {{ output_folder }}/simulation -{% for dataset in datasets %} - -################################## -### DATASET: {{ dataset.input_file_stem }} -################################## -# Create output folder -mkdir {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/ -mkdir {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/metrics - -# Collect descriptives about the dataset -mkdir {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/descriptives -asreview data describe {{ dataset.input_file }} -o {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/descriptives/data_stats_{{ dataset.input_file_stem }}.json - -# Generate wordcloud visualizations of all datasets -asreview wordcloud {{ dataset.input_file }} -o {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/descriptives/wordcloud_{{ dataset.input_file_stem }}.png --width 800 --height 500 -asreview wordcloud {{ dataset.input_file }} -o {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/descriptives/wordcloud_relevant_{{ dataset.input_file_stem }}.png --width 800 --height 500 --relevant -asreview wordcloud {{ dataset.input_file }} -o {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/descriptives/wordcloud_irrelevant_{{ dataset.input_file_stem }}.png --width 800 --height 500 --irrelevant - -# Simulate runs, collect metrics and create plots -mkdir {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/state_files -{% for prior in dataset.priors %} -asreview simulate {{ dataset.input_file }} SETTINGS_PLACEHOLDER -s {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/state_files/sim_{{ dataset.input_file_stem }}_{{ prior[0] }}.asreview --prior_record_id {{ " ".join(prior) }} --seed {{ dataset.model_seed }} && asreview metrics {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/state_files/sim_{{ dataset.input_file_stem }}_{{ prior[0] }}.asreview -o {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/metrics/metrics_sim_{{ dataset.input_file_stem }}_{{ prior[0] }}.json && rm -f {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/state_files/sim_{{ dataset.input_file_stem }}_{{ prior[0] }}.asreview -{% endfor %} -{% endfor %} - -# Merge descriptives and metrics -python {{ scripts_folder }}/merge_descriptives.py -python {{ scripts_folder }}/merge_metrics.py -python {{ scripts_folder }}/merge_tds.py diff --git a/examples/custom_arfi_synergy/custom_arfi.txt.template.test b/examples/custom_arfi_synergy/custom_arfi.txt.template.test deleted file mode 100644 index c52850a..0000000 --- a/examples/custom_arfi_synergy/custom_arfi.txt.template.test +++ /dev/null @@ -1,51 +0,0 @@ ---- -name: ARFI -name_long: All Relevant, Fixed Irrelevant - -scripts: - - get_plot.py - - merge_descriptives.py - - merge_metrics.py - - merge_tds.py - -docs: - - README.md - ---- - -{# This is a template for the ARFI method #} - -# version {{ version }} - -# Create folder structure. By default, the folder 'output' is used to store output. -mkdir {{ output_folder }} -mkdir {{ output_folder }}/simulation -{% for dataset in datasets %} - -################################## -### DATASET: {{ dataset.input_file_stem }} -################################## -# Create output folder -mkdir {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/ -mkdir {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/metrics - -# Collect descriptives about the dataset -mkdir {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/descriptives -asreview data describe {{ dataset.input_file }} -o {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/descriptives/data_stats_{{ dataset.input_file_stem }}.json - -# Generate wordcloud visualizations of all datasets -asreview wordcloud {{ dataset.input_file }} -o {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/descriptives/wordcloud_{{ dataset.input_file_stem }}.png --width 800 --height 500 -asreview wordcloud {{ dataset.input_file }} -o {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/descriptives/wordcloud_relevant_{{ dataset.input_file_stem }}.png --width 800 --height 500 --relevant -asreview wordcloud {{ dataset.input_file }} -o {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/descriptives/wordcloud_irrelevant_{{ dataset.input_file_stem }}.png --width 800 --height 500 --irrelevant - -# Simulate runs, collect metrics and create plots -mkdir {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/state_files - -asreview simulate {{ dataset.input_file }} SETTINGS_PLACEHOLDER -s {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/state_files/sim_{{ dataset.input_file_stem }}_{{ dataset.priors[0][0] }}.asreview --prior_record_id {{ " ".join(dataset.priors[0]) }} --seed {{ dataset.model_seed }} && asreview metrics {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/state_files/sim_{{ dataset.input_file_stem }}_{{ dataset.priors[0][0] }}.asreview -o {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/metrics/metrics_sim_{{ dataset.input_file_stem }}_{{ dataset.priors[0][0] }}.json && rm -f {{ output_folder }}/simulation/{{ dataset.input_file_stem }}/state_files/sim_{{ dataset.input_file_stem }}_{{ dataset.priors[0][0] }}.asreview - -{% endfor %} - -# Merge descriptives and metrics -python {{ scripts_folder }}/merge_descriptives.py -python {{ scripts_folder }}/merge_metrics.py -python {{ scripts_folder }}/merge_tds.py diff --git a/examples/custom_arfi_synergy/tasker.Dockerfile b/examples/custom_arfi_synergy/tasker.Dockerfile deleted file mode 100644 index 8600fb1..0000000 --- a/examples/custom_arfi_synergy/tasker.Dockerfile +++ /dev/null @@ -1,30 +0,0 @@ -FROM ghcr.io/asreview/asreview:v1.2 - -RUN apt-get update && \ - apt-get install -y curl ca-certificates amqp-tools python \ - --no-install-recommends \ - && rm -rf /var/lib/apt/lists/* \ - && pip install pika - -#### Don't modify above this line -RUN pip install synergy-dataset -RUN mkdir -p /app/data -RUN synergy get -l -o ./app/data - -# Temporary, while a new release is not done -RUN apt-get update && \ - apt-get install -y git && \ - rm -rf /var/lib/apt/lists/* && \ - pip install --upgrade git+https://github.com/abelsiqueira/asreview-makita@patch-1 - -COPY ./custom_arfi.txt.template /app/custom_arfi.txt.template -#### Don't modify below this line - -COPY ./split-file.py /app/split-file.py -COPY ./tasker-send.py /app/tasker-send.py -COPY ./tasker.sh /app/tasker.sh - -ENV PYTHONUNBUFFERED=1 -WORKDIR /app/workdir - -ENTRYPOINT [ "/bin/bash", "/app/tasker.sh" ] diff --git a/examples/custom_arfi_synergy/tasker.sh b/examples/custom_arfi_synergy/tasker.sh deleted file mode 100644 index 4a27ff4..0000000 --- a/examples/custom_arfi_synergy/tasker.sh +++ /dev/null @@ -1,50 +0,0 @@ -#!/bin/bash - -# Just checking that we are in the right place. -if [ "$PWD" != "/app/workdir" ]; -then - echo "ERROR: I don't know where I am" - exit 1 -fi - -# Clean workdir. Ensures that we are always starting anew -rm -rf /app/workdir/* - -# Copy files from the parent folder for the workdir. -cp ../*.sh ../*.py ./ -cp -r ../data ./ -cp ../custom_arfi.txt.template ./ - -# Create a logging function -function log { - echo "[$0:$(date --iso=ns)] $1" -} - -# Run makita -log "Running makita" - -SETTINGS="" -# SETTINGS="-m nb -e tfidf" -asreview makita template arfi --template custom_arfi.txt.template -f jobs.sh -sed -i "s/SETTINGS_PLACEHOLDER/$SETTINGS/g" jobs.sh - -# Split the `jobs.sh` file -log "Splitting file" -python split-file.py jobs.sh - -# Run part 1 in the tasker pod -log "Running part 1" -bash jobs.sh.part1 - -# Send part 2, line by line, to the workers -log "Sending part 2 to rabbitmq" -python tasker-send.py jobs.sh.part2 - -# AFTER part 2 is done, send part 3, line by line, to the workers -log "Sending part 3 to rabbitmq" -python tasker-send.py jobs.sh.part3 - -log "Done" - -# Send results someplace? -# TODO diff --git a/examples/custom_arfi_synergy/worker.Dockerfile b/examples/custom_arfi_synergy/worker.Dockerfile deleted file mode 100644 index cac2787..0000000 --- a/examples/custom_arfi_synergy/worker.Dockerfile +++ /dev/null @@ -1,32 +0,0 @@ -FROM ghcr.io/asreview/asreview:v1.2 - -RUN apt-get update && \ - apt-get install -y curl ca-certificates amqp-tools python \ - --no-install-recommends \ - && rm -rf /var/lib/apt/lists/* \ - && pip install pika - -#### Don't modify above this line - -# For sbert: -# RUN pip install sentence-transformers~=2.2.2 - -# For doc2vec: -RUN pip install gensim~=4.2.0 - -# RUN pip install --upgrade asreview-makita~=0.6.3 -# RUN pip install https://github.com/jteijema/asreview-reusable-fe/archive/main.zip -# RUN pip install https://github.com/jteijema/asreview-XGBoost/archive/main.zip - -# For neural network -# RUN pip install tensorflow~=2.9.1 - -#### Don't modify below this line - -COPY ./worker-receiver.py /app/worker-receiver.py -COPY ./worker.sh /app/worker.sh - -ENV PYTHONUNBUFFERED=1 -WORKDIR /app/workdir - -ENTRYPOINT [ "/bin/bash", "/app/worker.sh" ] diff --git a/k8-config/tasker.yml b/k8-config/tasker.yml index 90683bf..134d775 100644 --- a/k8-config/tasker.yml +++ b/k8-config/tasker.yml @@ -11,7 +11,7 @@ spec: spec: containers: - name: c - image: abelsiqueira/tasker + image: YOURUSER/tasker volumeMounts: - name: asreview-storage mountPath: /app/workdir diff --git a/k8-config/worker.yml b/k8-config/worker.yml index 1bb0948..df253c2 100644 --- a/k8-config/worker.yml +++ b/k8-config/worker.yml @@ -15,7 +15,7 @@ spec: spec: containers: - name: c - image: abelsiqueira/worker + image: YOURUSER/worker resources: limits: memory: "4Gi"