This is the main repo for the Knowledge Hub on Displaced Populations in the MENA Region. All of the CKAN customizations are added in this extension.
This extension requires CKAN 2.8.x version.
To install ckanext-knowledgehub:
- Activate your CKAN virtual environment, for example:
. /usr/lib/ckan/default/bin/activate
- Install the ckanext-knowdledgehub Python package into your virtual environment:
pip install ckanext-knowledgehub
- Initialize the tables:
knowledgehub -c /etc/ckan/default/production.ini db init
-
Add
knowledgehub
to theckan.plugins
setting in your CKAN config file (by default the config file is located at/etc/ckan/default/production.ini
). -
Restart CKAN. For example if you've deployed CKAN with Apache on Ubuntu:
sudo service apache2 reload
These are the required configuration options used by the extension:
- The number of themes shown per page
# (optional, default: 10).
ckanext.knowledgehub.themes_per_page = 20
- The number of sub-themes shown per page
# (optional, default: 10).
ckanext.knowledgehub.sub_themes_per_page = 20
- The number of dashboards shown per page
# (optional, default: 10).
ckanext.knowledgehub.dashboards_per_page = 20
-
Predictive Search
- Length of the sequence after which the model can start predict, recommended at least 10 chars long
# (optional, default: 10) ckanext.knowledgehub.rnn.sequence_length = 12
- Number of chars to be skipped in generation of next sentence
# (optional, default: 1) ckanext.knowledgehub.rnn.sentence_step = 2
- Number of predictions to return
# (optional, default: 3) ckanext.knowledgehub.rnn.number_predictions = 2
- Minimum length of the corpus after it should start to predict
# (optional, default: 10000) ckanext.knowledgehub.rnn.min_length_corpus = 300
- Maximum epochs to learn
# (optional, default: 50) ckanext.knowledgehub.rnn.max_epochs = 30
- Full path to the RNN weights model
# (optional, default: ./keras_model_weights.h5) ckanext.knowledgehub.rnn.model_weights = /home/user/model_weights.h5
- Full path to the RNN network model
# (optional, default: ./keras_model_network.h5) ckanext.knowledgehub.rnn.model_network = /home/user/model_network.h5
- Full path to the model history
# (optional, default: ./history.p) ckanext.knowledgehub.rnn.history = /home/user/history.p
-
Limit search facets
- Display only functional units, joint analysis and tags as facets(filters) on home page search
# ( mandatory ) search.facets = organization groups tags
- Which facets(filters) to show for research-questions, visualizations and dashboards
# (optional, default: 'organizations groups tags) knowledgehub.search.facets = organizations groups tags
-
HDX Configuration
# ( mandatory ) ckanext.knowledgehub.hdx.api_key = <API_KEY>
# ( mandatory ) ckanext.knowledgehub.hdx.site = <HDX_SITE:test or prod>
# ( mandatory ) ckanext.knowledgehub.hdx.owner_org = <HDX_OWNER_ORGANIZATION_ID>
# ( mandatory ) ckanext.knowledgehub.hdx.dataset_source = knowledgehub
# ( mandatory ) ckanext.knowledgehub.hdx.maintainer = <HDX_MAINTAINER>
-
Advanced solr search parameters
- Set the mm (minimum should match) DisMax query parameter
# ( optional, default: 1<30% ) ckanext.knowledgehub.search.mm = 1
To install ckanext-knowledgehub for development, activate your CKAN virtualenv and do:
git clone https://github.com/keitaroinc/ckanext-knowledgehub.git
cd ckanext-knowledgehub
python setup.py develop
pip install -r dev-requirements.txt
To initialize the tables do:
knowledgehub -c /etc/ckan/default/production.ini db init
All code MUST follow PEP8 Style Guide. Most editors have plugins or integrations and automatic checking for PEP8 compliance so make sure you use them.
You should add a pre-commit hook that will check for PEP8 errors. Follow the next steps to enable this check.
- Make sure you have installed the PEP8 style checker:
$ pip install pycodestyle
- In the
.git/hooks
folder which is located inside the project's root directory, create a file namedpre-commit
and inside put this code. - Make
pre-commit
executable by running this command:
$ chmod +x ckanext-knowledgehub/.git/hooks/pre-commit
Now, every time you commit code, the pre-commit hook will run and check for PEP8 errors.
This extension uses LESS for styles. All changes must be made in one of the LESS
files located in the ckanext-knowledgehub/ckanext/knowledgehub/fanstatic/less
folder.
Gulp.js is used for building CSS assets.
In order to build all CSS assets run node_modules/gulp/bin/gulp.js
once. Gulp.js is watching for changes in the source LESS assets and will rebuild CSS on each change. If you have gulp.js installed globally on the system, you can just type gulp
to run it.
To run the tests, first make sure that you have installed the required
development dependencies in CKAN, which can be done by running the following
command in the CKAN's src
directory:
pip install -r dev-requirements.txt
After that just type this command to actually run the tests in the extension.
nosetests --ckan --with-pylons=test.ini
To run the tests and produce a coverage report, first make sure you have coverage installed in your virtualenv (pip install coverage) then run:
nosetests \
-v \
-s \
--ckan \
--with-coverage \
--with-pylons=test.ini \
--cover-package=ckanext.knowledgehub \
--cover-inclusive \
--cover-erase \
--cover-tests \
--cover-html \
ckanext/knowledgehub/tests
The installed extension offers rebuilding of the Solr index for multiple types of indexed documents.
To rebuild the full index (CKAN core search index and the other document types such as dashboards, visualizations and research questions), run the following command:
knowledgehub -c /etc/ckan/default/production.ini search-index rebuild
If you need to rebuild the index for specific document type (model), then you specify the --model
parameter:
knowledgehub -c /etc/ckan/default/production.ini search-index rebuild --model dashboard
This would rebuild the index for dashboards.
Avalilable model types are:
ckan
- rebuilds the CKAN core (package) index,dashboard
- rebuilds the dasboards index,research-question
- rebuilds the research questions index andvisualization
- rebuilds the visualizations index.
If you specify --model=all
, all indexes will be rebuilt (same as not specifying --model
at all).
pip install -U setuptools # optional, only if there is an error about PEP 517 "BackendUnavailable"
pip install -U spacy
python -m spacy download en_core_web_sm
User intents are extracted from the user queries in a batch process that is run periodically.
The following command updates the latest user intents and should be run at least once a day:
knowledgehub -c /etc/ckan/default/production.ini intents update
The crontab should look something like this:
0 0 * * * knowledgehub -c /etc/ckan/default/production.ini intents update >/dev/null 2>&1
Data Quality is measured across the six primary dimensions for data quality assessment.
A lot more details are available in the dedicated documentation section.
The preditive search functinality predict the next n characters in the word or the next most possible word. The training data is consist of title and description of all entities on Knowledge hub including themes, sub-themes, research questions, datasets, visualizations and dashboards. Before it starts predict the machine learning model has to be trained. By default user should write 10 characters in the search box on home page before it starts to predict.
Run the command for predictive search periodically( daily or weekly is recommended ). This command will start training the model:
knowledgehub -c /etc/ckan/default/production.ini predictive_search train
There is a action that can run CLI commands for Knowledge Hub. This example shows how to run the above command through the API action:
curl -v 'http://hostname/api/3/action/run_command' -H'Authorization: API-KEY' -d '{"command": "predictive_search train"}'
Install the extension, following the instructions provided: https://github.com/conwetlab/ckanext-oauth2
In the .ini file, add the following settings:
ckan.oauth2.jwt.enable = true
ckan.oauth2.authorization_endpoint = <YOUR_AUTH_ENDPOINT>
ckan.oauth2.token_endpoint = <YOUR_TOKEN_ENDPOINT>
ckan.oauth2.profile_api_url = https://graph.microsoft.com/v1.0/me
ckan.oauth2.client_id = <YOUR_CLIENT_ID>
ckan.oauth2.client_secret = <YOUR_CLIENT_SECRET>
ckan.oauth2.scope = profile openid email
ckan.oauth2.profile_api_user_field = unique_name
ckan.oauth2.profile_api_fullname_field = name
ckan.oauth2.profile_api_mail_field = unique_name
ckan.oauth2.authorization_header = Authorization
Note: The authorization endpoint, as well as the token endpoint, can be found in the Azure AD. After you navigate to your application, in the Overview tab, there will be a button called Endpoints. After you click it, a new tab will be opened with the information you need. Alternatively, the data can be found in the OpenID Connect metadata document. Below this Overview tab, there will be one called Certificates and Secrets, where you can create a secret and add it in the .ini file.
For setting up full CKAN instance of Knowledgehub portal for production use, please refer to the Installation Guide.
The guide contains instruction on how to install and deploy the portal for production use on Ubuntu Server 18.04 or CentOS 7.