diff --git a/.env.example b/.env.example index f28ebccd..137e42cb 100644 --- a/.env.example +++ b/.env.example @@ -40,7 +40,7 @@ APACHE_PORT=80 APACHE_LOG_DIR=/var/log/apache #NGINX/APACHE -## Check CKAN__ROOT_PATH and CKANEXT__DCAT__BASE_URI. If you don't need to use domain locations, it is better to use the nginx configuration. Leave blank or use the root `/`. +## Check CKAN__ROOT_PATH and CKANEXT__DCAT__BASE_URI and CKANEXT__SCHEMING_DCAT_GEOMETADATA_BASE_URI. If you don't need to use domain locations, it is better to use the nginx configuration. Leave blank or use the root `/`. PROXY_SERVER_NAME=localhost PROXY_CKAN_LOCATION=/catalog PROXY_PYCSW_LOCATION=/csw @@ -89,6 +89,7 @@ CKAN_SITE_URL=http://localhost:81 CKAN__ROOT_PATH=/catalog/{{LANG}} CKAN_PORT=5000 CKAN__FAVICON=/catalog/base/images/ckan.ico +CKAN__SITE_LOGO=/images/default/ckan-logo.png CKAN___BEAKER__SESSION__SECRET=CHANGE_ME # See https://docs.ckan.org/en/latest/maintaining/configuration.html#api-token-settings CKAN___API_TOKEN__JWT__ENCODE__SECRET=string:CHANGE_ME @@ -97,6 +98,7 @@ CKAN_SYSADMIN_NAME=ckan_admin CKAN_SYSADMIN_PASSWORD=test1234 CKAN_SYSADMIN_EMAIL=your_email@example.com CKAN_STORAGE_PATH=/var/lib/ckan +CKAN_LOGS_PATH=/var/log CKAN_SMTP_SERVER=smtp.corporateict.domain:25 CKAN_SMTP_STARTTLS=True CKAN_SMTP_USER=user @@ -124,17 +126,19 @@ CKAN__LOCALE_ORDER="en es pt_BR ja it cs_CZ ca fr el sv sr sr@latin no sk fi ru CKAN__LOCALES_OFFERED="en es pt_BR ja it cs_CZ ca fr el sv sr sr@latin no sk fi ru de pl nl bg ko_KR hu sa sl lv" # Extensions -CKAN__PLUGINS="envvars stats text_view image_view webpage_view recline_view resourcedictionary datastore xloader harvest ckan_harvester spatial_metadata spatial_query spatial_harvest_metadata_api csw_harvester waf_harvester doc_harvester resource_proxy geo_view geojson_view wmts_view shp_view dcat dcat_rdf_harvester dcat_json_harvester dcat_json_interface scheming_dcat_datasets scheming_dcat_groups scheming_dcat_organizations scheming_dcat pdf_view pages fluent" +CKAN__PLUGINS="envvars stats text_view image_view webpage_view recline_view resourcedictionary datastore xloader harvest spatial_metadata spatial_query spatial_harvest_metadata_api csw_harvester waf_harvester doc_harvester resource_proxy geo_view geojson_view wmts_view shp_view dcat dcat_rdf_harvester dcat_json_harvester dcat_json_interface scheming_dcat_datasets scheming_dcat_groups scheming_dcat_organizations scheming_dcat scheming_dcat_ckan_harvester scheming_dcat_xls_harvester pdf_view pages fluent" # ckanext-harvest CKAN__HARVEST__MQ__TYPE=redis CKAN__HARVEST__MQ__HOSTNAME=redis CKAN__HARVEST__MQ__PORT=6379 CKAN__HARVEST__MQ__REDIS_DB=1 +# Clean-up mechanism for the harvest log table. The default is 30 days. +CKAN__HARVEST__LOG_TIMEFRAME=40 # ckanext-xloader CKANEXT__XLOADER__API_TOKEN=api_token -CKANEXT__XLOADER__JOBS__DB_URI=postgresql://ckan:ckan@db/ckan +CKANEXT__XLOADER__JOBS__DB_URI=postgresql://ckandbuser:ckandbpassword@db/ckandb # ckanext-dcat CKANEXT__DCAT__BASE_URI=${CKAN_URL} diff --git a/README.md b/README.md index 75cd0819..0b4b7879 100644 --- a/README.md +++ b/README.md @@ -26,7 +26,7 @@ ## Overview Contains Docker images for the different components of CKAN Cloud and a Docker compose environment (based on [ckan](https://github.com/ckan/ckan)) for development and testing Open Data portals. ->**Warning**:
+> [!IMPORTANT] >This is a **custom installation of Docker Compose** with specific extensions for spatial data and [GeoDCAT-AP](https://github.com/SEMICeu/GeoDCAT-AP)/[INSPIRE](https://github.com/INSPIRE-MIF/technical-guidelines) metadata [profiles](https://en.wikipedia.org/wiki/Geospatial_metadata). For official installations, please have a look: [CKAN documentation: Installation](https://docs.ckan.org/en/latest/maintaining/installing/index.html). ![CKAN Docker Platform](/doc/img/ckan-docker-services.png) @@ -69,7 +69,7 @@ The site is configured using environment variables that you can set in the `.env ### ckan-docker roadmap Information about extensions installed in the `main` image. More info described in the [Extending the base images](#extending-the-base-images) ->**Note**
+> [!NOTE] > Switch branches to see the `roadmap` for other projects: [ckan-docker/branches](https://github.com/mjanez/ckan-docker/branches) @@ -79,7 +79,7 @@ Information about extensions installed in the `main` image. More info described | Core + | [Datastore](https://github.com/mjanez/ckan-docker) | 2.9.9 | Completed | ✔️ | ✔️ | Stable installation (Production & Dev images) via Docker Compose. | | Core + | [~~Datapusher~~](https://github.com/mjanez/ckan-docker) | 0.0.19 | Deprecated | ❌ | ❌ | Updated to [xloader](https://github.com/ckan/ckanext-xloader), an express Loader - quickly load data into DataStore. | | Extension | [ckanext-xloader](https://github.com/ckan/ckanext-xloader) | 1.0.1 | Completed | ✔️ | ✔️ | Stable installation, a replacement for DataPusher because it offers ten times the speed and more robustness | -| Extension | [ckanext-harvest](https://github.com/ckan/ckanext-harvest) | 1.5.1 | Completed | ✔️ | ✔️ | Stable installation, necessary for the implementation of the Collector ([ogc_ckan](#recollector-ckan)) | +| Extension | [ckanext-harvest](https://github.com/ckan/ckanext-harvest) | v1.5.6 | Completed | ✔️ | ✔️ | Stable installation, necessary for the implementation of the Collector ([ogc_ckan](#recollector-ckan)) | | Extension | [ckanext-geoview](https://github.com/ckan/ckanext-geoview) | 0.0.20 | Completed | ✔️ | ✔️ | Stable installation. | | Extension | [ckanext-spatial](https://github.com/ckan/ckanext-spatial) | 2.0.0 | Completed | ✔️ | ✔️ | Stable installation, necessary for the implementation of the Collector ([ogc_ckan](#recollector-ckan)) | | Extension | [ckanext-dcat](https://github.com/mjanez/ckanext-dcat) | 1.1.0 | Completed | ✔️ | ✔️ | Stable installation, include DCAT-AP 2.1 profile compatible with GeoDCAT-AP. | @@ -103,7 +103,7 @@ To upgrade Docker Engine, first run sudo `apt-get update`, then follow the [inst To verify a successful Docker installation, run `docker run hello-world` and `docker version`. These commands should output versions for client and server. ->**Note**
+> [!NOTE] > Learn more about [Docker](#docker-basic-commands)/[Docker Compose](#docker-compose-basic-commands) basic commands. > @@ -128,10 +128,10 @@ Use this if you are a maintainer and will not be making code changes to CKAN or - **Apache HTTP Server**: Replace the [`.env`](/.env) with the [`/samples/.env.apache.example`](/samples/.env.apache.example) and modify the variables as needed. - >**Note**:
+ > [!NOTE] > Please note that when accessing CKAN directly (via a browser) ie: not going through Apache/NGINX you will need to make sure you have "ckan" set up to be an alias to localhost in the local hosts file. Either that or you will need to change the `.env` entry for `CKAN_SITE_URL` - >**Warning**:
+ > [!WARNING] > Using the default values on the `.env` file will get you a working CKAN instance. There is a sysadmin user created by default with the values defined in `CKAN_SYSADMIN_NAME` and `CKAN_SYSADMIN_PASSWORD` (`ckan_admin` and `test1234` by default). All ennvars with `API_TOKEN` are automatically regenerated when CKAN is loaded, no editing is required. > >**This should be obviously changed before running this setup as a public CKAN instance.** @@ -141,7 +141,7 @@ Use this if you are a maintainer and will not be making code changes to CKAN or docker compose build ``` - >**Note**
+ > [!NOTE] > You can use a [deploy in 5 minutes](#quick-mode) if you just want to test the package. 4. Start the containers: @@ -153,11 +153,11 @@ This will start up the containers in the current window. By default the containe using a different colour. You could also use the -d "detach mode" option ie: `docker compose up -d` if you wished to use the current window for something else. ->**Note**
+> [!NOTE] > * Or `docker compose up --build` to build & up the containers. > * Or `docker compose -f docker-compose.apache.yml up -d --build` to use the Apache HTTP Server version. ->**Note**
+> [!NOTE] > Learn more about configuring this ckan docker: > - [Backup the CKAN Database](#ckan-backups) > - [Configuring a docker compose service to start on boot](#docker-compose-configure-a-docker-compose-service-to-start-on-boot) @@ -229,7 +229,7 @@ The Docker image config files used to build your CKAN project are located in the * Any custom changes to the scripts run during container start up can be made to scripts in the `setup/` directory. For instance if you wanted to change the port on which CKAN runs you would need to make changes to the Docker Compose yaml file, and the `start_ckan.sh.override` file. Then you would need to add the following line to the Dockerfile ie: `COPY setup/start_ckan.sh.override ${APP_DIR}/start_ckan.sh`. The `start_ckan.sh` file in the locally built image would override the `start_ckan.sh` file included in the base image ->**Note**
+> [!TIP] > If you get an error like ` doesn't have execute permissions`: > >```log @@ -309,7 +309,7 @@ ckan ``` ->**Note**:
+> [!NOTE] > Git diff is a command to output the changes between two sources inside the Git repository. The data sources can be two different branches, commits, files, etc. > * Show changes between working directory and staging area: > `git diff > [file.patch]` @@ -432,6 +432,12 @@ Available components: * **pycsw**: The pycsw app. An [OARec](https://ogcapi.ogc.org/records) and [OGC CSW](https://opengeospatial.org/standards/cat) server implementation written in Python. * **ckan2pycsw**: Software to achieve interoperability with the open data portals based on CKAN. To do this, ckan2pycsw reads data from an instance using the CKAN API, generates ISO-19115/ISO-19139 metadata using [pygeometa](https://geopython.github.io/pygeometa/), or a custom schema that is based on a customized CKAN schema, and populates a [pycsw](https://pycsw.org/) instance that exposes the metadata using CSW and OAI-PMH. +### Harvester consumers on a deployed CKAN +[ckanext-harvest supervisor](https://github.com/ckan/ckanext-harvest#setting-up-the-harvesters-on-a-production-server) allows you to harvest metadata from multiple sources on a production deployment. Here it is deployed [by a worker consumers in the `ckan` container](./ckan/setup/workers/harvester.conf), also the `ckanext-harvest` extension and other custom harvesters ([`ckanext-scheming_dcat`](https://github.com/mjanez/ckanext-scheming_dcat?tab=readme-ov-file#harvesters) or [`ckanext-dcat`](https://github.com/ckan/ckanext-dcat#rdf-dcat-harvester)) are included in the CKAN docker images. + +> ![TIP] +> To enable harvesters you need to set up in the `.env` file the `CKAN__PLUGINS` variable with the `harvest` plugin: https://github.com/mjanez/ckan-docker/blob/a18e0c80d9f16b6d9b6471e3148d48fcb83712bd/.env.example#L126-L127 + ## ckan-docker tips ### CKAN. Backups @@ -474,7 +480,7 @@ PostgreSQL offers the command line tools [`pg_dump`](https://www.postgresql.org/ - `your_postgres_password`: The password for the PostgreSQL user. - `/path/to/your/backup/directory`: The path to the directory where you want to store the backup files. - >**Warning**
+ > [!WARNING] > If you have changed the values of the PostgreSQL container, database or user, change them too. > Check that `zip` package is installed: `sudo apt-get install zip` @@ -498,14 +504,14 @@ PostgreSQL offers the command line tools [`pg_dump`](https://www.postgresql.org/ 0 0 * * * /path/to/your/script/ckan_backup_custom.sh ``` - >**Info**
+ > [!NOTE] > Replace `/path/to/your/script` with the actual path to the `ckan_backup_custom.sh` script. 8. Save and close the file. The cronjob is now set up and will backup your CKAN PostgreSQL database daily at midnight using the custom format. The backups will be stored in the specified directory with the timestamp in the filename. ->**Info**
+> [!NOTE] > Sample scripts for backing up CKAN: [`doc/scripts`](doc/scripts) @@ -530,27 +536,30 @@ If need to use a backup, restore it: ### CKAN. Manage new users -1. Create a new user from the Docker host, for example to create a new user called 'admin' +1. Create a new user from the Docker host, for example to create a new user called `user_example` ```bash - docker exec -it ckan -c ckan.ini user add admin email=admin@localhost + docker exec -it ckan -c ckan.ini user add user_example email=user_example@localhost + + # Admin user + docker exec -it ckan -c ckan.ini sysadmin add admin_example email=admin_example@localhost name=admin_example ``` - To delete the 'admin' user + To delete the 'user_example' user ```bash - docker exec -it ckan -c ckan.ini user remove admin` + docker exec -it ckan -c ckan.ini user remove user_example` ``` 1. Create a new user from within the ckan container. You will need to get a session on the running container ```bash - ckan -c ckan.ini user add admin email=admin@localhost` + ckan -c ckan.ini user add user_example email=user_example@localhost` ``` - To delete the 'admin' user + To delete the 'user_example' user ```bash - ckan -c ckan.ini user remove admin` + ckan -c ckan.ini user remove user_example` ``` @@ -691,7 +700,7 @@ To have Docker Compose run automatically when you reboot a machine, you can foll ## CKAN API ->**Note**
+> [!NOTE] >`params`: Parameters to pass to the action function. The parameters are specific to each action function. >* `fl` (text): Fields of the dataset to return. The parameter controls which fields are returned in the solr query. `fl` can be `None` or a list of result fields, such as: `id,name,extras_custom_schema_field`. > diff --git a/ckan/Dockerfile b/ckan/Dockerfile index 8ae18b31..1afab4a7 100644 --- a/ckan/Dockerfile +++ b/ckan/Dockerfile @@ -11,34 +11,35 @@ WORKDIR ${APP_DIR} # requirements.txt files fixed until next releases COPY req_fixes req_fixes -# Extensions -### XLoader - 1.0.1 ### -### Harvester - v1.5.1 ### -### Geoview - v0.0.20 ### -### Spatial - v2.0.0 ### fixed requirements.txt -### DCAT - v1.2.0-geodcatap (GeoDCAT-AP/NTI-RISP extended version) ### -### Scheming - release-3.0.0 ### -### Resource dictionary - v1.0.1 ### -### Pages - v0.5.2 ### -### PDFView - 0.0.8 ### -### Fluent - v1.0.1 (Forked stable version) ### -### Scheming DCAT - v2.0.0 (GeoDCAT-AP/NTI-RISP extended version) ### -### SPARQL Interface - 2.0.1 ### +# CKAN configuration & extensions +## XLoader - 1.0.1 ## +## Harvest - v1.5.6 (Worker with supervisor) ## +## Geoview - v0.0.20 ## +## Spatial - v2.1.1 ## +## DCAT - v1.2.0-geodcatap (GeoDCAT-AP/NTI-RISP extended version) ## +## Scheming - release-3.0.0 ## +## Resource dictionary - v1.0.1 ## +## Pages - v0.5.2 ## +## PDFView - 0.0.8 ## +## Fluent - v1.0.1 (Forked stable version) ## +## Scheming DCAT - v2.1.0 (GeoDCAT-AP/NTI-RISP extended version) ## RUN echo ${TZ} > /etc/timezone && \ - if ! [ /usr/share/zoneinfo/${TZ} -ef /etc/localtime ]; then cp /usr/share/zoneinfo/${TZ} /etc/localtime ; fi && \ + if ! [ /usr/share/zoneinfo/${TZ} -ef /etc/localtime ]; then cp /usr/share/zoneinfo/${TZ} /etc/localtime; fi && \ + # Remove apk cache + rm -rf /var/cache/apk/* && \ # Install CKAN extensions echo "ckan/ckanext-xloader" && \ pip3 install --no-cache-dir -e git+https://github.com/ckan/ckanext-xloader.git@1.0.1#egg=ckanext-xloader && \ pip3 install --no-cache-dir -r ${APP_DIR}/src/ckanext-xloader/requirements.txt && \ pip3 install --no-cache-dir -U requests[security] && \ echo "ckan/ckanext-harvest" && \ - pip3 install --no-cache-dir -e git+https://github.com/ckan/ckanext-harvest.git@v1.5.1#egg=ckanext-harvest && \ - pip3 install --no-cache-dir -r ${APP_DIR}/src/ckanext-harvest/pip-requirements.txt && \ + pip3 install --no-cache-dir -e git+https://github.com/ckan/ckanext-harvest.git@v1.5.6#egg=ckanext-harvest && \ + pip3 install --no-cache-dir -r ${APP_DIR}/src/ckanext-harvest/requirements.txt && \ echo "ckan/ckanext-geoview" && \ pip3 install --no-cache-dir -e git+https://github.com/ckan/ckanext-geoview.git@v0.0.20#egg=ckanext-geoview && \ echo "ckan/ckanext-spatial" && \ - pip3 install --no-cache-dir -e git+https://github.com/ckan/ckanext-spatial.git@v2.0.0#egg=ckanext-spatial && \ - pip3 install --no-cache-dir -r ${APP_DIR}/req_fixes/ckanext-spatial_requirements.txt && \ + pip3 install --no-cache-dir -e git+https://github.com/ckan/ckanext-spatial.git@v2.1.1#egg=ckanext-spatial && \ + pip3 install --no-cache-dir -r ${APP_DIR}/src/ckanext-spatial/requirements.txt && \ echo "mjanez/ckanext-dcat (GeoDCAT-AP extended version)" && \ pip3 install --no-cache-dir -e git+https://github.com/mjanez/ckanext-dcat.git@v1.2.0-geodcatap#egg=ckanext-dcat && \ pip3 install --no-cache-dir -r ${APP_DIR}/src/ckanext-dcat/requirements.txt && \ @@ -53,8 +54,8 @@ RUN echo ${TZ} > /etc/timezone && \ echo "mjanez/ckanext-fluent" && \ pip3 install --no-cache-dir -e git+https://github.com/mjanez/ckanext-fluent.git@v1.0.1#egg=ckanext-fluent && \ echo "mjanez/ckanext-scheming_dcat" && \ - pip3 install --no-cache-dir -e git+https://github.com/mjanez/ckanext-scheming_dcat.git@v2.0.0#egg=ckanext_scheming_dcat && \ - pip3 install --no-cache-dir -r https://raw.githubusercontent.com/mjanez/ckanext-scheming_dcat/v2.0.0/requirements.txt + pip3 install --no-cache-dir -e git+https://github.com/mjanez/ckanext-scheming_dcat.git@v2.1.0#egg=ckanext_scheming_dcat && \ + pip3 install --no-cache-dir -r https://raw.githubusercontent.com/mjanez/ckanext-scheming_dcat/v2.1.0/requirements.txt # Used to configure the container environment by setting environment variables, creating users, running initialization scripts, .etc COPY docker-entrypoint.d/* /docker-entrypoint.d/ @@ -66,11 +67,20 @@ COPY setup/who.ini ./ COPY patches patches RUN for d in $APP_DIR/patches/*; do \ - if [ -d $d ]; then \ + if [ -d $d ]; then \ for f in `ls $d/*.patch | sort -g`; do \ - cd $SRC_DIR/`basename "$d"` && echo "$0: Applying patch $f to $SRC_DIR/`basename $d`"; patch -p1 < "$f" ; \ - done ; \ - fi ; \ + cd $SRC_DIR/`basename "$d"` && echo "$0: Applying patch $f to $SRC_DIR/`basename $d`" && patch -p1 < "$f"; \ + done; \ + fi; \ done +# Workers +## Update start_ckan.sh with custom workers +COPY setup/start_ckan.sh.override ${APP_DIR}/start_ckan.sh +RUN chmod +x ${APP_DIR}/start_ckan.sh + +## Load workers supervisor configuration +COPY setup/workers/* /etc/supervisord.d/ + +# Start CKAN CMD ["/bin/sh", "-c", "$APP_DIR/start_ckan.sh"] \ No newline at end of file diff --git a/ckan/Dockerfile.dev b/ckan/Dockerfile.dev index 4ec5c849..97bbbf8d 100644 --- a/ckan/Dockerfile.dev +++ b/ckan/Dockerfile.dev @@ -1,14 +1,18 @@ FROM ghcr.io/mjanez/ckan-base-spatial:ckan-2.9.9-dev +LABEL maintainer="mnl.janez@gmail.com" # Set up environment variables ENV APP_DIR=/srv/app \ TZ=UTC \ SRC_EXTENSIONS_DIR=/srv/app/src_extensions +# Set working directory +WORKDIR ${APP_DIR} RUN echo ${TZ} > /etc/timezone && \ - set -ex && apk --no-cache add sudo && \ - # Make sure both files are not exactly the same - if ! [ /usr/share/zoneinfo/${TZ} -ef /etc/localtime ]; then cp /usr/share/zoneinfo/${TZ} /etc/localtime ; fi + if ! [ /usr/share/zoneinfo/${TZ} -ef /etc/localtime ]; then cp /usr/share/zoneinfo/${TZ} /etc/localtime; fi && \ + apk --no-cache add sudo && \ + # Remove apk cache + rm -rf /var/cache/apk/* # Install any extensions needed by your CKAN instance # - Make sure to add the plugins to CKAN__PLUGINS in the .env file @@ -50,26 +54,29 @@ RUN echo ${TZ} > /etc/timezone && \ COPY docker-entrypoint.d/* /docker-entrypoint.d/ # Update who.ini with PROXY_CKAN_LOCATION -COPY setup/who.ini ${APP_DIR}/ +COPY setup/who.ini ./ # Override start_ckan.sh with DEV sh -COPY setup/start_ckan_development.sh.override ${APP_DIR}/start_ckan_development.sh -RUN chmod +x ${APP_DIR}/start_ckan_development.sh +COPY setup/start_ckan_development.sh.override ./start_ckan_development.sh +RUN chmod +x ./start_ckan_development.sh + +## Load workers supervisor configuration +COPY setup/workers/* /etc/supervisord.d/ # Apply any patches needed to CKAN core or any of the built extensions (not the # runtime mounted ones) -COPY patches ${APP_DIR}/patches +COPY patches patches RUN for d in $APP_DIR/patches/*; do \ - if [ -d $d ]; then \ - for f in `ls $d/*.patch | sort -g`; do \ - if [ -d $SRC_DIR/`basename "$d"` ]; then \ - cd $SRC_DIR/`basename "$d"` && \ - echo "$0: Applying patch $f to $SRC_DIR/`basename $d`" && \ - patch -p1 < "$f" ; \ - else \ - echo "$0: Skipping patch $f because directory $SRC_DIR/`basename $d` does not exist. Built the extension: `basename $d`" ; \ - fi \ - done ; \ - fi ; \ -done \ No newline at end of file + if [ -d $d ]; then \ + for f in `ls $d/*.patch | sort -g`; do \ + if [ -d $SRC_DIR/`basename "$d"` ]; then \ + cd $SRC_DIR/`basename "$d"` && \ + echo "$0: Applying patch $f to $SRC_DIR/`basename $d`" && \ + patch -p1 < "$f" ; \ + else \ + echo "$0: Skipping patch $f because directory $SRC_DIR/`basename $d` does not exist. Built the extension: `basename $d`" ; \ + fi \ + done ; \ + fi ; \ + done \ No newline at end of file diff --git a/ckan/Dockerfile.ghcr b/ckan/Dockerfile.ghcr index 92614036..98430ee1 100644 --- a/ckan/Dockerfile.ghcr +++ b/ckan/Dockerfile.ghcr @@ -38,4 +38,13 @@ RUN for d in $APP_DIR/patches/*; do \ fi ; \ done +# Workers +## Update start_ckan.sh with custom workers +COPY setup/start_ckan.sh.override ${APP_DIR}/start_ckan.sh +RUN chmod +x ${APP_DIR}/start_ckan.sh + +## Load workers supervisor configuration +COPY setup/workers/* /etc/supervisord.d/ + +# Start CKAN CMD ["/bin/sh", "-c", "$APP_DIR/start_ckan.sh"] \ No newline at end of file diff --git a/ckan/docker-entrypoint.d/00_update_who.sh b/ckan/docker-entrypoint.d/00_update_who.sh index 7b767294..b214ffc6 100644 --- a/ckan/docker-entrypoint.d/00_update_who.sh +++ b/ckan/docker-entrypoint.d/00_update_who.sh @@ -1,7 +1,7 @@ #!/bin/bash # Update who.ini when exists PROXY_CKAN_LOCATION -echo "Update who.ini" +echo "[docker-entrypoint.00_update_who] Update who.ini" if [ -n "$PROXY_CKAN_LOCATION" ] && [ "$PROXY_CKAN_LOCATION" != "/" ]; then sed -i "s|\${WHO_LOCATION}|$PROXY_CKAN_LOCATION|g" "${APP_DIR}/who.ini"; else diff --git a/ckan/docker-entrypoint.d/01_setup_xloader.sh b/ckan/docker-entrypoint.d/01_setup_xloader.sh index 4f5e3e58..9d0e8946 100644 --- a/ckan/docker-entrypoint.d/01_setup_xloader.sh +++ b/ckan/docker-entrypoint.d/01_setup_xloader.sh @@ -9,22 +9,26 @@ TOKEN_IDS=$(ckan -c $CKAN_INI user token list ckan_admin | grep "$TOKEN_NAME" | # Revoke each previous token of xloader for TOKEN_ID in $TOKEN_IDS do - ckan -c $CKAN_INI user token revoke $TOKEN_ID + if [ -z "$TOKEN_ID" ]; then + echo "[docker-entrypoint.01_setup_xloader] No API Token to revoke" + continue + fi + ckan -c $CKAN_INI user token revoke -- $TOKEN_ID if [ $? -eq 0 ]; then - echo "API Token $TOKEN_ID has been revoked" + echo "[docker-entrypoint.01_setup_xloader] API Token $TOKEN_ID has been revoked" fi done # Add ckanext.xloader.api_token to the CKAN config file -echo "Loading ckanext-xloader settings in the CKAN config file" +echo "[docker-entrypoint.01_setup_xloader] Loading ckanext-xloader settings in the CKAN config file" ckan config-tool $CKAN_INI \ "ckanext.xloader.api_token=xxx" \ "ckanext.xloader.jobs_db.uri=$CKANEXT__XLOADER__JOBS__DB_URI" # Create ckanext-xloader API_TOKEN -echo "Set up ckanext.xloader.api_token in the CKAN config file" +echo "[docker-entrypoint.01_setup_xloader] Set up ckanext.xloader.api_token in the CKAN config file" ckan config-tool $CKAN_INI "ckanext.xloader.api_token=$(ckan -c $CKAN_INI user token add ckan_admin xloader | tail -n 1 | tr -d '\t')" #TODO: Setup worker background -#echo "Set up CKAN jobs worker" +#echo "[docker-entrypoint.01_setup_xloader] Set up CKAN jobs worker" #ckan -c $CKAN_INI jobs worker default \ No newline at end of file diff --git a/ckan/docker-entrypoint.d/02_setup_scheming.sh b/ckan/docker-entrypoint.d/02_setup_scheming.sh index e56877a6..018a04aa 100644 --- a/ckan/docker-entrypoint.d/02_setup_scheming.sh +++ b/ckan/docker-entrypoint.d/02_setup_scheming.sh @@ -1,10 +1,10 @@ #!/bin/bash # Update ckanext-scheming and ckanext-scheming_dcat settings defined in the env var -echo "Set up ckanext-scheming_dcat. Clear index" +echo "[docker-entrypoint.02_setup_scheming] Clear index" ckan -c $CKAN_INI search-index clear -echo "Loading ckanext-scheming and ckanext-scheming_dcat settings into ckan.ini" +echo "[docker-entrypoint.02_setup_scheming] Loading ckanext-scheming and ckanext-scheming_dcat settings into ckan.ini" ckan config-tool $CKAN_INI \ "scheming.dataset_schemas=$CKANEXT__SCHEMING_DCAT_DATASET_SCHEMA" \ "scheming.group_schemas=$CKANEXT__SCHEMING_DCAT_GROUP_SCHEMAS" \ @@ -15,5 +15,5 @@ ckan config-tool $CKAN_INI \ "scheming_dcat.group_custom_facets=$CKANEXT__SCHEMING_DCAT_GROUP_CUSTOM_FACETS" \ "scheming_dcat.geometadata_base_uri=$CKANEXT__SCHEMING_DCAT_GEOMETADATA_BASE_URI" -echo "ckanext-scheming_dcat. Rebuild index" +echo "[docker-entrypoint.02_setup_scheming] Rebuild index" ckan -c $CKAN_INI search-index rebuild \ No newline at end of file diff --git a/ckan/docker-entrypoint.d/03_setup_dcat.sh b/ckan/docker-entrypoint.d/03_setup_dcat.sh index 2a1efb9a..65d095de 100644 --- a/ckan/docker-entrypoint.d/03_setup_dcat.sh +++ b/ckan/docker-entrypoint.d/03_setup_dcat.sh @@ -1,7 +1,7 @@ #!/bin/bash # Add ckanext-dcat settings to the CKAN config file -echo "Loading ckanext-dcat settings in the CKAN config file" +echo "[docker-entrypoint.03_setup_dcat] Loading ckanext-dcat settings in the CKAN config file" ckan config-tool $CKAN_INI \ "ckanext.dcat.base_uri = $CKANEXT__DCAT__BASE_URI" \ "ckanext.dcat.catalog_endpoint = $CKANEXT__DCAT__DEFAULT_CATALOG_ENDPOINT" \ diff --git a/ckan/docker-entrypoint.d/04_setup_preview.sh b/ckan/docker-entrypoint.d/04_setup_preview.sh index 246a1ede..295f096e 100644 --- a/ckan/docker-entrypoint.d/04_setup_preview.sh +++ b/ckan/docker-entrypoint.d/04_setup_preview.sh @@ -3,7 +3,7 @@ #TODO: Correct views. # Add CKAN Resource views to the CKAN config file -echo "Loading resource views in the CKAN config file" +echo "[docker-entrypoint.04_setup_preview] Loading resource views in the CKAN config file" ckan config-tool $CKAN_INI \ "ckan.views.default_views = $CKAN__VIEWS__DEFAULT_VIEWS" \ "ckan.preview.json_formats = $CKAN__PREVIEW__JSON_FORMATS" \ @@ -12,7 +12,7 @@ ckan config-tool $CKAN_INI \ "ckan.preview.loadable = $CKAN__PREVIEW__LOADABLE" # Add CKAN Resource geoviews to the CKAN config file -echo "Loading geoviews in the CKAN config file" +echo "[docker-entrypoint.04_setup_preview] Loading geoviews in the CKAN config file" ckan config-tool $CKAN_INI \ "ckanext.geoview.ol_viewer.formats = $CKANEXT__GEOVIEW__OL_VIEWER__FORMATS" \ "ckanext.geoview.shp_viewer.srid = $CKANEXT__GEOVIEW__SHP_VIEWER__SRID" \ diff --git a/ckan/docker-entrypoint.d/05_setup_pages.sh b/ckan/docker-entrypoint.d/05_setup_pages.sh index 7f11cc0e..4f64e163 100644 --- a/ckan/docker-entrypoint.d/05_setup_pages.sh +++ b/ckan/docker-entrypoint.d/05_setup_pages.sh @@ -1,7 +1,7 @@ #!/bin/bash # Add pages CKAN config file (https://github.com/ckan/ckanext-pages#configuration) -echo "Loading pages config in the CKAN config file" +echo "[docker-entrypoint.05_setup_pages] Loading pages config in the CKAN config file" ckan config-tool $CKAN_INI \ "ckan.pages.allow_html = $CKANEXT__PAGES__ALOW_HTML" \ "ckanext.pages.organization = $CKANEXT__PAGES__ORGANIZATION" \ diff --git a/ckan/patches/ckanext-harvest/00_translates.patch b/ckan/patches/ckanext-harvest/00_translates.patch new file mode 100644 index 00000000..976ed463 --- /dev/null +++ b/ckan/patches/ckanext-harvest/00_translates.patch @@ -0,0 +1,94 @@ +diff --git a/ckanext/harvest/templates/source/new.html b/ckanext/harvest/templates/source/new.html +index b7feb3d..b773a44 100644 +--- a/ckanext/harvest/templates/source/new.html ++++ b/ckanext/harvest/templates/source/new.html +@@ -24,12 +24,18 @@ +
+

+ {% trans %} +- Harvest sources allow importing remote metadata into this catalog. +- Remote sources can be other catalogs such as other CKAN instances, CSW +- servers or Web Accessible Folders (WAF) (depending on the actual +- harvesters enabled for this instance). ++ Harvest sources allow importing remote metadata into this catalog. Remote sources can be other catalogs such as other CKAN instances, CSW servers, XML metadata files, XLSX with metadata records or Web Accessible Folder (WAF). + {% endtrans %} +

++ ++

++ {{ _('Depending on the actual harvesters enabled for this instance. eg: ') }} ++

++

+
+ + {% endblock %} +diff --git a/ckanext/harvest/templates/source/new_source_form.html b/ckanext/harvest/templates/source/new_source_form.html +index 324d012..37358fc 100644 +--- a/ckanext/harvest/templates/source/new_source_form.html ++++ b/ckanext/harvest/templates/source/new_source_form.html +@@ -8,7 +8,7 @@ + + {% call form.input('url', id='field-url', label=_('URL'), value=data.url, error=errors.url, classes=['control-full', 'control-large']) %} + +- {{ _('This should include the http:// part of the URL') }} ++ {{ _('This should include the http:// part of the URL') }} + + {% endcall %} + +@@ -26,7 +26,7 @@ + {{ form.markdown('notes', id='field-notes', label=_('Description'), value=data.notes, error=errors.notes) }} + +
+- ++ +
+ {% for harvester in h.harvesters_info() %} + {% set checked = False %} +@@ -46,7 +46,11 @@ + {{ form.select('frequency', id='field-frequency', label=_('Update frequency'), options=h.harvest_frequencies(), selected=data.frequency, error=errors.frequency) }} + + {% block extra_config %} +- {{ form.textarea('config', id='field-config', label=_('Configuration'), value=data.config, error=errors.config) }} ++ {% call form.textarea('config', id='field-config', label=_('Configuration'), value=data.config, error=errors.config) %} ++ ++ {{ _('You can validate the JSON at: ') }} {{ _('JSONLint') }} ++ ++ {% endcall %} + {% endblock extra_config %} + + {# if we have a default group then this wants remembering #} +diff --git a/ckanext/harvest/templates/source/search.html b/ckanext/harvest/templates/source/search.html +index d9ceeea..44d118b 100644 +--- a/ckanext/harvest/templates/source/search.html ++++ b/ckanext/harvest/templates/source/search.html +@@ -44,7 +44,26 @@ + + + +-{% block secondary_content %} ++ {% block secondary_content %} ++
++

{{ _('Harvest sources') }}

++
++

++ {% trans %} ++ Harvest sources allow importing remote metadata into this catalog. Remote sources can be other catalogs such as other CKAN instances, CSW servers, XML metadata files, XLSX with metadata records or Web Accessible Folder (WAF). ++ {% endtrans %} ++

++ ++

++ {{ _('Depending on the actual harvesters enabled for this instance. eg: ') }} ++

++

++
++
+ {% for facet in c.facet_titles %} + {{ h.snippet('snippets/facet_list.html', title=c.facet_titles[facet], name=facet, alternative_url=h.url_for('{0}.search'.format(c.dataset_type))) }} + {% endfor %} diff --git a/ckan/req_fixes/ckanext-spatial_requirements.txt b/ckan/req_fixes/ckanext-spatial_requirements.txt index 0c15f3c0..b86d5173 100644 --- a/ckan/req_fixes/ckanext-spatial_requirements.txt +++ b/ckan/req_fixes/ckanext-spatial_requirements.txt @@ -3,12 +3,10 @@ lxml>=2.3 argparse pyparsing>=2.1.10 requests>=1.1.0 -six - -# requirements pyproj fix: https://github.com/pyproj4/pyproj/issues/1321 +cython==0.29.36; python_version < '3.9' pyproj==2.6.1; python_version < '3.9' pyproj==3.6.1; python_version >= '3.9' Shapely==2.0.1 OWSLib==0.28.1 -geojson==3.0.1 \ No newline at end of file +geojson==3.0.1 diff --git a/ckan/setup/start_ckan.sh.override b/ckan/setup/start_ckan.sh.override index ce6eebde..84656952 100644 --- a/ckan/setup/start_ckan.sh.override +++ b/ckan/setup/start_ckan.sh.override @@ -1,22 +1,18 @@ #!/bin/sh -# Add ckan.datapusher.api_token to the CKAN config file (updated with corrected value later) -ckan config-tool $CKAN_INI ckan.datapusher.api_token=xxx - # Set up the Secret key used by Beaker and Flask # This can be overriden using a CKAN___BEAKER__SESSION__SECRET env var if grep -E "beaker.session.secret ?= ?$" ckan.ini then echo "Setting beaker.session.secret in ini file" ckan config-tool $CKAN_INI "beaker.session.secret=$(python3 -c 'import secrets; print(secrets.token_urlsafe())')" - ckan config-tool $CKAN_INI "WTF_CSRF_SECRET_KEY=$(python3 -c 'import secrets; print(secrets.token_urlsafe())')" JWT_SECRET=$(python3 -c 'import secrets; print("string:" + secrets.token_urlsafe())') ckan config-tool $CKAN_INI "api_token.jwt.encode.secret=${JWT_SECRET}" ckan config-tool $CKAN_INI "api_token.jwt.decode.secret=${JWT_SECRET}" fi # Run the prerun script to init CKAN and create the default admin user -sudo -u ckan -EH python3 prerun.py +python3 prerun.py # Run any startup scripts provided by images extending this one if [[ -d "/docker-entrypoint.d" ]] @@ -31,6 +27,14 @@ then done fi +# Create Harvester logs directory and change its ownership +mkdir -p $CKAN_LOGS_PATH/harvester +chown -R ckan:ckan $CKAN_LOGS_PATH/harvester + +# Create xloader logs directory and change its ownership +mkdir -p $CKAN_LOGS_PATH/xloader +chown -R ckan:ckan $CKAN_LOGS_PATH/xloader + # Set the common uwsgi options UWSGI_OPTS="--plugins http,python \ --socket /tmp/uwsgi.sock \ @@ -46,9 +50,18 @@ UWSGI_OPTS="--plugins http,python \ if [ $? -eq 0 ] then # Start supervisord + echo "[prerun.workers] Loading the CKAN workers with supervisord..." supervisord --configuration /etc/supervisord.conf & + + # Workers + ## Add harvester background procces to crontab + echo "[prerun.workers] Add harvester background procceses to crontab" + crontab -l | { cat; echo "*/15 * * * * /usr/bin/supervisorctl start ckan_harvester_run"; } | crontab - + ## Clean-up mechanism for the harvest log table. 'ckan.harvest.log_timeframe'. The default time frame is 30 days + crontab -l | { cat; echo "0 5 */30 * * /usr/bin/supervisorctl start ckan_harvester_clean_log"; } | crontab - + # Start uwsgi - sudo -u ckan -EH uwsgi $UWSGI_OPTS + uwsgi $UWSGI_OPTS else echo "[prerun] failed...not starting CKAN." -fi \ No newline at end of file +fi diff --git a/ckan/setup/start_ckan_development.sh.override b/ckan/setup/start_ckan_development.sh.override index 8dcc7465..b481c807 100644 --- a/ckan/setup/start_ckan_development.sh.override +++ b/ckan/setup/start_ckan_development.sh.override @@ -45,16 +45,12 @@ done echo "Enabling debug mode" ckan config-tool $CKAN_INI -s DEFAULT "debug = true" -# Add ckan.datapusher.api_token to the CKAN config file (updated with corrected value later) -ckan config-tool $CKAN_INI ckan.datapusher.api_token=xxx - # Set up the Secret key used by Beaker and Flask # This can be overriden using a CKAN___BEAKER__SESSION__SECRET env var if grep -E "beaker.session.secret ?= ?$" ckan.ini then echo "Setting beaker.session.secret in ini file" ckan config-tool $CKAN_INI "beaker.session.secret=$(python3 -c 'import secrets; print(secrets.token_urlsafe())')" - ckan config-tool $CKAN_INI "WTF_CSRF_SECRET_KEY=$(python3 -c 'import secrets; print(secrets.token_urlsafe())')" JWT_SECRET=$(python3 -c 'import secrets; print("string:" + secrets.token_urlsafe())') ckan config-tool $CKAN_INI "api_token.jwt.encode.secret=${JWT_SECRET}" ckan config-tool $CKAN_INI "api_token.jwt.decode.secret=${JWT_SECRET}" @@ -74,7 +70,7 @@ ckan config-tool $SRC_DIR/ckan/test-core.ini \ "ckan.redis.url = $TEST_CKAN_REDIS_URL" # Run the prerun script to init CKAN and create the default admin user -sudo -u ckan -EH python3 prerun.py +python3 prerun.py # Run any startup scripts provided by images extending this one if [[ -d "/docker-entrypoint.d" ]] @@ -89,8 +85,22 @@ then done fi +# Create Harvester logs directory and change its ownership +mkdir -p $CKAN_LOGS_PATH/harvester +chown -R ckan:ckan $CKAN_LOGS_PATH/harvester + +# Create xloader logs directory and change its ownership +mkdir -p $CKAN_LOGS_PATH/xloader +chown -R ckan:ckan $CKAN_LOGS_PATH/xloader + # Start supervisord -supervisord --configuration /etc/supervisord.conf & +#supervisord --configuration /etc/supervisord.conf & + +# Start the development server as the ckan user with automatic reload +su ckan -c "/usr/bin/ckan -c $CKAN_INI run -H 0.0.0.0" -# Start the development server with automatic reload -sudo -u ckan -EH ckan -c $CKAN_INI run -H 0.0.0.0 \ No newline at end of file +# Workers +# To start the Harvester worker +# ckan harvester run +# Clean-up mechanism for the harvest log table +# ckan harvester clean-harvest-log \ No newline at end of file diff --git a/ckan/setup/workers/harvester.conf b/ckan/setup/workers/harvester.conf new file mode 100644 index 00000000..cb1c2e3d --- /dev/null +++ b/ckan/setup/workers/harvester.conf @@ -0,0 +1,51 @@ +[program:ckan_gather_consumer] +command=ckan harvester gather-consumer +user=ckan +numprocs=1 +stdout_logfile=/var/log/harvester/gather_consumer.log +stdout_logfile_maxbytes=50MB +stderr_logfile=/var/log/harvester/gather_consumer.log +stderr_logfile_maxbytes=50MB +autostart=true +autorestart=true +startsecs=10 +priority=1 + +[program:ckan_fetch_consumer] +command=ckan harvester fetch-consumer +user=ckan +numprocs=1 +stdout_logfile=/var/log/harvester/fetch_consumer.log +stdout_logfile_maxbytes=50MB +stderr_logfile=/var/log/harvester/fetch_consumer.log +stderr_logfile_maxbytes=50MB +autostart=true +autorestart=true +startsecs=10 +priority=2 + +[program:ckan_harvester_run] +command=ckan harvester run +user=ckan +numprocs=1 +stdout_logfile=/var/log/harvester/ckan_harvester.log +stdout_logfile_maxbytes=25MB +stderr_logfile=/var/log/harvester/ckan_harvester.log +stderr_logfile_maxbytes=25MB +autostart=true +autorestart=false +startsecs=2 +priority=3 + +[program:ckan_harvester_clean_log] +command=ckan harvester clean-harvest-log +user=ckan +numprocs=1 +stdout_logfile=/var/log/harvester/ckan_harvester_clean_log.log +stdout_logfile_maxbytes=25MB +stderr_logfile=/var/log/harvester/ckan_harvester_clean_log.log +stderr_logfile_maxbytes=25MB +autostart=false +autorestart=false +startsecs=2 +priority=4 \ No newline at end of file diff --git a/ckan/setup/workers/xloader.conf b/ckan/setup/workers/xloader.conf new file mode 100644 index 00000000..c7749ddb --- /dev/null +++ b/ckan/setup/workers/xloader.conf @@ -0,0 +1,12 @@ +[program:ckan_xloader] +command=ckan jobs worker default +user=ckan +numprocs=1 +stdout_logfile=/var/log/harvester/ckan_xloader.log +stdout_logfile_maxbytes=100MB +stderr_logfile=/var/log/harvester/ckan_xloader.log +stderr_logfile_maxbytes=100MB +autostart=true +autorestart=true +startsecs=4 +priority=1 \ No newline at end of file diff --git a/docker-compose.yml b/docker-compose.yml index 33ffedf1..78e93a74 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -3,6 +3,7 @@ version: "3" volumes: ckan_storage: + ckan_logs: pg_data: solr_data: @@ -57,6 +58,7 @@ services: condition: service_healthy volumes: - ckan_storage:/var/lib/ckan + - ckan_logs:/var/log restart: unless-stopped healthcheck: test: ["CMD", "wget", "-qO", "/dev/null", "http://localhost:${CKAN_PORT}"] diff --git a/samples/.env.apache.example b/samples/.env.apache.example index 8615b731..bc9de228 100644 --- a/samples/.env.apache.example +++ b/samples/.env.apache.example @@ -89,6 +89,7 @@ CKAN_SYSADMIN_NAME=ckan_admin CKAN_SYSADMIN_PASSWORD=test1234 CKAN_SYSADMIN_EMAIL=your_email@example.com CKAN_STORAGE_PATH=/var/lib/ckan +CKAN_LOGS_PATH=/var/log CKAN_SMTP_SERVER=smtp.corporateict.domain:25 CKAN_SMTP_STARTTLS=True CKAN_SMTP_USER=user diff --git a/samples/.env.localhost b/samples/.env.localhost index 880f2a9a..25434e53 100644 --- a/samples/.env.localhost +++ b/samples/.env.localhost @@ -97,6 +97,7 @@ CKAN_SYSADMIN_NAME=ckan_admin CKAN_SYSADMIN_PASSWORD=test1234 CKAN_SYSADMIN_EMAIL=your_email@example.com CKAN_STORAGE_PATH=/var/lib/ckan +CKAN_LOGS_PATH=/var/log CKAN_SMTP_SERVER=smtp.corporateict.domain:25 CKAN_SMTP_STARTTLS=True CKAN_SMTP_USER=user diff --git a/samples/.env.nginx.example b/samples/.env.nginx.example index c80c8e8e..fcd5ad36 100644 --- a/samples/.env.nginx.example +++ b/samples/.env.nginx.example @@ -89,6 +89,7 @@ CKAN_SYSADMIN_NAME=ckan_admin CKAN_SYSADMIN_PASSWORD=test1234 CKAN_SYSADMIN_EMAIL=your_email@example.com CKAN_STORAGE_PATH=/var/lib/ckan +CKAN_LOGS_PATH=/var/log CKAN_SMTP_SERVER=smtp.corporateict.domain:25 CKAN_SMTP_STARTTLS=True CKAN_SMTP_USER=user diff --git a/samples/custom/.env.es.example b/samples/custom/.env.es.example index 1ae16c2b..10c9efc3 100644 --- a/samples/custom/.env.es.example +++ b/samples/custom/.env.es.example @@ -97,6 +97,7 @@ CKAN_SYSADMIN_NAME=ckan_admin CKAN_SYSADMIN_PASSWORD=test1234 CKAN_SYSADMIN_EMAIL=your_email@example.com CKAN_STORAGE_PATH=/var/lib/ckan +CKAN_LOGS_PATH=/var/log CKAN_SMTP_SERVER=smtp.corporateict.domain:25 CKAN_SMTP_STARTTLS=True CKAN_SMTP_USER=user