Skip to content

Commit

Permalink
docs: document tagging architecture decisions (#55)
Browse files Browse the repository at this point in the history
  • Loading branch information
pomegranited authored Jun 13, 2023
1 parent 025c9c4 commit f1b8579
Show file tree
Hide file tree
Showing 22 changed files with 399 additions and 6 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ requirements/private.txt
dev.db*

.vscode
venv/

# Media files (for uploads)
media/
43 changes: 43 additions & 0 deletions docs/decisions/0007-tagging-app.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
7. Tagging App structure
========================

Context
-------

We want the openedx_tagging app to be useful in different Django projects outside of just openedx-learning and edx-platform.


Decisions
---------

The openedx_tagging data structures and code will stand alone with no dependencies on other Open edX projects.

Classes which require dependencies on other Open edX projects should be defined within a ``tagging`` module inside those projects.

Taxonomy
~~~~~~~~

The ``openedx_tagging`` module defines ``openedx_tagging.core.models.Taxonomy``, whose data and functionality are self-contained to the ``openedx_tagging`` app. However in Studio, we need to be able to limit access to some Taxonomy by organization, using the same "course creator" access which limits course creation for an organization to a defined set of users.

So in edx-platform, we will create the ``openedx.features.tagging`` app, to contain ``models.OrgTaxonomy``. OrgTaxonomy subclasses ``openedx_tagging.core.models.Taxonomy``, employing Django's `multi-table inheritance`_ feature, which allows the base Tag class to keep foreign keys to the Taxonomy, while allowing OrgTaxonomy to store foreign keys into Studio's Organization table.

ObjectTag
~~~~~~~~~

Similarly, the ``openedx_tagging`` module defined ``openedx_tagging.core.models.ObjectTag``, also self-contained to the
``openedx_tagging`` app.

But to tag content in the LMS/Studio, we create ``openedx.features.tagging.models.ContentTag``, which subclasses ``ObjectTag``, and can then reference functionality available in the platform code.

Rejected Alternatives
---------------------

Embed in edx-platform
~~~~~~~~~~~~~~~~~~~~~

Embedding the logic in edx-platform would provide the content tagging logic specifically required for the MVP.

However, we plan to extend tagging to other object types (e.g. People) and contexts (e.g. Marketing), and so a generic, standalone library is preferable in the log run.


.. _multi-table inheritance: https://docs.djangoproject.com/en/3.2/topics/db/models/#multi-table-inheritance
53 changes: 53 additions & 0 deletions docs/decisions/0008-tagging-tree-data-arch.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
8. Tag tree data structure
==========================

Context
-------

A taxonomy groups a set of related tags under a namespace and a set of usage rules. Some taxonomies require tags to be selected from a list or nested tree of valid options.

Do we need a formal tree data structure to represent hierarchical tags?

Decision
--------

No, a simplistic tree structure is sufficient for the MVP and forseeable feature requests.

Existing tree data structures are designed to support very dynamic and deeply nested trees (e.g. forum threads) which are traversed frequently, and this feature set is overkill for taxonomy trees.

Taxonomy trees have a maximum depth of 3 levels, which limits the depth of node traversal, and simplifies the UI/UX required to tag or search filter content with nested tags.

Taxonomy trees only require simple operations, and infrequent traversals. Frequent operations (like viewing content tags) will only display the leaf tag value, not its full lineage, to minimize tree traversal. Full trees can be fetched quickly enough during content tag editing. Taxonomy tree changes themselves will also be infrequent.

Rejected Alternatives
---------------------

All taxonomies are trees
~~~~~~~~~~~~~~~~~~~~~~~~

We could use a tree structure for all taxonomies: flat taxonomies would have only 1 level of tags under the root, while nested taxonomies can be deeper.

To implement this, we'd link each taxonomy to a root tag, with the user-visible tags underneath.

It was simpler instead to link the tag to the taxonomy, which removes the need for the unseen root tag.

Closure tables
~~~~~~~~~~~~~~

https://coderwall.com/p/lixing/closure-tables-for-browsing-trees-in-sql

Implementing the taxonomy tree using closure tables allows for tree traversals in O(N) time or less, where N is the total number of tags in the taxonomy. So the tree depth isn't as much of a performance concern as the total number of tags.

Options include:

* `django-tree-queries <https://github.com/matthiask/django-tree-queries>`_

Simple, performant, and well-maintained tree data structure. However it uses RECURSIVE CTE queries, which aren't supported until MySQL 8.0.

* `django-mptt <https://github.com/django-mptt/django-mptt>`_

Already an edx-platform dependency, but no longer maintained. It can be added retroactively to an existing tree-like model.

* `django-closuretree <https://github.com/ocadotechnology/django-closuretree>`_

Another a good reference implementation for closure tables which can be added retroactively to an existing tree-like model. It is not actively maintained.
39 changes: 39 additions & 0 deletions docs/decisions/0009-tagging-administrators.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
9. Taxonomy administrators
==========================

Context
-------

Taxonomy Administrators have the right to create, edit, populate, and delete taxonomies available globably for a given instance, or for a specific organization.

How should these users be identified and their access granted?

Decision
--------

In the Studio context, a modified version of "course creator" access will be used to identify Taxonomy Administrators (ref `get_organizations`_):

#. Global staff and superusers can create/edit/populate/delete Taxonomies for the instance or for any org key.

#. Users who can create courses for "any organization" access can create/edit/populate/delete Taxonomies for the instance or for any org key.

#. Users who can create courses only for specific organizations can create/edit/populate/delete Taxonomies with only these org keys.


Permission #1 requires no external access, so can be enforced by the ``openedx_tagging`` app.

But because permissions #2 + #3 require access to the edx-platform CMS model `CourseCreator`_, this access can only be enforced in Studio, and so will live under `cms.djangoapps.tagging` along with the ``ContentTag`` class. Tagging MVP must work for libraries v1, v2 and courses created in Studio, and so tying these permissions to Studio is reasonable for the MVP.

Per `OEP-9`_, ``openedx_tagging`` will allow applications to use the standard Django API to query permissions, for example: ``user.has_perm('openedx_tagging.edit_taxonomy', taxonomy)``, and the appropriate permissions will be applied in that application's context.

Rejected Alternatives
---------------------

Django users & groups
~~~~~~~~~~~~~~~~~~~~~

This is a standard way to grant access in Django apps, but it is not used in Open edX.

.. _get_organizations: https://github.com/openedx/edx-platform/blob/4dc35c73ffa6d6a1dcb6e9ea1baa5bed40721125/cms/djangoapps/contentstore/views/course.py#L1958
.. _CourseCreator: https://github.com/openedx/edx-platform/blob/4dc35c73ffa6d6a1dcb6e9ea1baa5bed40721125/cms/djangoapps/course_creators/models.py#L27
.. _OEP-9: https://open-edx-proposals.readthedocs.io/en/latest/best-practices/oep-0009-bp-permissions.html
90 changes: 90 additions & 0 deletions docs/decisions/0010-taxonomy-enable-context.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
10. Taxonomy enabled for context
================================

Context
-------

The MVP specification says that taxonomies need to be able to be enabled/disabled for the following contexts: instance, organization, and course.

Taxonomy Administrators must be able to enable a taxonomy globally for all organizations in an instance, or to set a list of organizations who can use the taxonomy.

Content Authors must be able to turn taxonomies (instance and org-levels) on/off at the course level.

Decision
--------

When is a taxonomy field shown to course authors in a given course?

+-------------+-----------------------------+--------------------------+-------------------------------+
| tax.enabled | tax.enabled_for(course.org) | course enables all tax's | Is taxonomy shown for course? |
+=============+=============================+==========================+===============================+
| True | True | True | True |
+-------------+-----------------------------+--------------------------+-------------------------------+
| False | True | True | False |
+-------------+-----------------------------+--------------------------+-------------------------------+
| True | False | True | False |
+-------------+-----------------------------+--------------------------+-------------------------------+
| True | True | False | False |
+-------------+-----------------------------+--------------------------+-------------------------------+
| False | True | False | False |
+-------------+-----------------------------+--------------------------+-------------------------------+
| False | False | True | False |
+-------------+-----------------------------+--------------------------+-------------------------------+

.. _Course
Course
~~~~~~

We will add a Course `Advanced Settings`_ that allows course authors to enable/disable *all available taxonomies* for a given course.

In order for a given taxonomy to be "available to a course", it must be enabled in the :ref:`Instance` context and the course's :ref:`Organization` context.

Disabling taxonomies for a course will remove/hide the taxonomy fields from the course edit page and unit edit page(s), and tags will not be shown in Studio for that course. LMS use of tags is outside of this MVP.

Future versions may add more granularity to these settings, to be determined by user needs.

.. _Instance
Instance
~~~~~~~~

Taxonomy contains a boolean ``enabled`` field.

A Taxonomy can be disabled for all contexts by setting ``enabled = False``.
If ``enabled = True``, then the :ref:`Organization` and :ref:`Course` contexts determine whether a taxonomy will be shown to course authors.

.. _Organization
Organization
~~~~~~~~~~~~

OrgTaxonomy has a many-to-many relationship with the Organization model, accessed via the ``org_owners`` field. OrgTaxonomy lives under `cms.djangoapps.tagging` and so has access to the Organization model and logic in Studio.

An OrgTaxonomy is enabled for all organizations if ``org_owners == []``.
If there are any ``org_owners`` set, then the OrgTaxonomy is only enabled for those orgas, i.e. only courses in these orgs will see the taxonomy field in Studio.

Allowing multiple orgs to access a taxonomy reduces redundancy in data and maintenance.

Rejected Alternatives
---------------------

Single org per taxonomy
~~~~~~~~~~~~~~~~~~~~~~~

Having a single org on a taxonomy is simpler from an implementation perspective, but the UI/UX frames demonstrated that it is simpler for the user to maintain a single taxonomy for multiple orgs.

Course Waffle Flags
~~~~~~~~~~~~~~~~~~~

Use `Course Waffle Flags`_ to enable/disable all taxonomies for a given course.

Waffle flags can only be changed by instance superusers, but the MVP specifically requires that content authors have control over this switch.


Link courses to taxonomies
~~~~~~~~~~~~~~~~~~~~~~~~~~

Link individual courses as enabled/disabled to specific taxonomies.
This was deemed too granular for the MVP, and the data structures and UI can be simplified by using a broader on/off flag.


.. _Advanced Settings: https://github.com/openedx/edx-platform/blob/4dc35c73ffa6d6a1dcb6e9ea1baa5bed40721125/cms/djangoapps/models/settings/course_metadata.py#L28
.. _Course Waffle Flags: https://github.com/openedx/edx-platform/blob/4dc35c73ffa6d6a1dcb6e9ea1baa5bed40721125/openedx/core/djangoapps/waffle_utils/models.py#L14
35 changes: 35 additions & 0 deletions docs/decisions/0011-tag-changes.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
11. Taxonomy and tag changes
============================

Context
-------

Tagging content may be a labor-intensive, and the data produced is precious, both for human and automated users. Content tags should be structured and namespaced according to the needs of the instance's taxonomy administrators. But taxonomies themselves need to allow for changes: their tags can be overridden with a single import, they can be deleted, reworked, and their rules changed on the fly.

What happens to the existing content tags if a Taxonomy or Tag is renamed, moved, or deleted?

Decision
--------

Preserve content tag name:value pairs even if the associated taxonomy or tag is removed.
Reflect name:value changes from the linked taxonomy:tag immediately to the user.

Content tags (through their base ObjectTag class) store a foreign key to their Taxonomy and Tag (if relevant), but they also store a copy of the Taxonomy.name and Tag.value, which can be used if there is no Taxonomy or Tag available.

We consult the authoritative Taxonomy.name and Tag.value whenever displaying a content tag, so any changes are immediately reflected to the user.

If a Taxonomy or Tag is deleted, the linked content tags will remain, and cached copies of the name:value pair will be displayed.

This cached "value" field enables content tags (through their base ObjectTag class) to store free-text tag values, so that the free-text Taxonomy itself need not be modified when new free-text tags are added.

This extra storage also allows tagged content to be imported independently of a taxonomy. The taxonomy (and appropriate tags) can be constructed later, and content tags validated and re-synced by editing the content tag or by running a maintenance command.

Rejected Alternatives
---------------------

Require foreign keys
~~~~~~~~~~~~~~~~~~~~

Require a foreign key from a content tag to Taxonomy for the name, and Tag for the value.

Only using foreign keys puts the labor-intensive content tag data at risk during taxonomy changes, and requires free-text tags to be made part of a taxonomy.
1 change: 1 addition & 0 deletions openedx_tagging/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""Open edX Tagging app."""
Empty file.
6 changes: 6 additions & 0 deletions openedx_tagging/core/tagging/admin.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
""" Tagging app admin """
from django.contrib import admin

from .models import TagContent

admin.site.register(TagContent)
16 changes: 16 additions & 0 deletions openedx_tagging/core/tagging/apps.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
"""
tagging Django application initialization.
"""

from django.apps import AppConfig


class TaggingConfig(AppConfig):
"""
Configuration for the tagging Django application.
"""

name = "openedx_tagging.core.tagging"
verbose_name = "Tagging"
default_auto_field = 'django.db.models.BigAutoField'
label = "oel_tagging"
23 changes: 23 additions & 0 deletions openedx_tagging/core/tagging/migrations/0001_initial.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Generated by Django 3.2.19 on 2023-05-25 21:08

from django.db import migrations, models


class Migration(migrations.Migration):

initial = True

dependencies = [
]

operations = [
migrations.CreateModel(
name='TagContent',
fields=[
('id', models.BigAutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
('content_id', models.CharField(max_length=255)),
('name', models.CharField(max_length=255)),
('value', models.CharField(max_length=765)),
],
),
]
Empty file.
39 changes: 39 additions & 0 deletions openedx_tagging/core/tagging/models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
""" Tagging app data models """
from django.db import models


class TagContent(models.Model):
"""
TODO: This is a basic representation of TagContent, it may change
depending on the discovery in:
https://docs.google.com/document/d/13zfsGDfomSTCp_G-_ncevQHAb4Y4UW0d_6N8R2PdHlA/edit#heading=h.o6fm1hktwp7b
This is the representation of a tag that is linked to a content
Name-Value pair
-----------------
We use a Field-Value Pair representation, where `name` functions
as the constant that defines the field data set, and `value` functions
as the variable tag lineage that belong to the set.
Value lineage
---------------
`value` are stored as null character-delimited strings to preserve the tag's lineage.
We use the null character to avoid choosing a character that may exist in a tag's name.
We don't use the Taxonomy's user-facing delimiter so that this delimiter can be changed
without disrupting stored tags.
"""

# Content id to which this tag is associated
content_id = models.CharField(max_length=255)

# It's not usually large
name = models.CharField(max_length=255)

# Tag lineage value.
#
# The lineage can be large.
# TODO: The length is under discussion
value = models.CharField(max_length=765)
26 changes: 26 additions & 0 deletions openedx_tagging/core/tagging/readme.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
Tagging App
==============

The ``tagging`` app will enable content authors to tag pieces of content and quickly
filter for ease of re-use.

Motivation
----------

Tagging content is central to enable content re-use, facilitate the implementation
of flexible content structures different from the current implementation and
allow adaptive learning in the Open edX platform.

This service has been implemented as pluggable django app. Since it is necessary for
it to work independently of the content to which it links to.

Setup
---------

**TODO:** We need to wait the discussion of the `Taxonomy discovery <https://docs.google.com/document/d/13zfsGDfomSTCp_G-_ncevQHAb4Y4UW0d_6N8R2PdHlA/edit#heading=h.o6fm1hktwp7b>`_.
to build a proper setup for linking different pieces of content.

The current approach is to save the id of the content through a generic string,
so that the tag can be linked to any type of content, no matter what type of ID have.
For example, with this approach we can link standalone blocks and library blocks,
both on Denver (v3) and Blockstore Content Libraries (v2).
2 changes: 2 additions & 0 deletions projects/dev.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@
# REST API
"rest_framework",
"openedx_learning.rest_api.apps.RESTAPIConfig",
# Tagging Core Apps
"openedx_tagging.core.tagging.apps.TaggingConfig",

# Debugging
"debug_toolbar",
Expand Down
Loading

0 comments on commit f1b8579

Please sign in to comment.