Skip to content

Commit

Permalink
docs: upstream block ADR, take 2
Browse files Browse the repository at this point in the history
  • Loading branch information
kdmccormick committed Sep 3, 2024
1 parent 78f4926 commit d214f8f
Show file tree
Hide file tree
Showing 3 changed files with 375 additions and 91 deletions.
373 changes: 373 additions & 0 deletions docs/decisions/0020-upstream-block.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,373 @@
4. Upstream and downstream content
##################################

Status
******

Accepted.

Implementation in progress as of 2024-09-03.

Context
*******

We are replacing the existing Legacy ("V1") Content Libraries system, based on
ModuleStore, with a Relaunched ("V2") Content Libraries system, based on
Learning Core. V1 and V2 libraries will coexist for at least one release to
allow for migration; eventually, V1 libraries will be removed entirely.

Content from V1 libraries can only be included into courses using the
LibraryContentBlock (called "Randomized Content Module" in Studio), which works
like this:

* Course authors add a LibraryContentBlock to a Unit and configure it with a
library key and a count of N library blocks to select (or `-1` for "all
blocks").

* For each block in the chosen library, its *content definition* is copied into
the course as a child of the LibraryContentBlock, whereas its *settings* are
copied into a special "default" settings dictionary in the course's structure
document--this distinction will matter later. The usage key of each copied
block is derived from a hash of the original library block's usage key plus
the LibraryContentBlock's own usage key--this will also matter
later.

* The course author is free to override the content and settings of the
course-local copies of each library block.

* When any update is made to the library, the course author is prompted to
update the LibraryContentBlock. This involves re-copying the library blocks'
content definitions and default settings, which clobbers any overrides they
have made to content, but preserves any overrides they have made to settings.
Furthermore, any blocks that were added to the library are newly copied into
the course, and any blocks that were removed from the library are deleted
from the course. For all blocks, usage keys are recalculated using the same
hash derivation described above; for existing blocks, it is important that
this recalculation yields the same usage key so that student state is not
lost.

* Over in the LMS, when a learner loads LibraryContentBlock, they are shown a
list of N randomly-picked blocks from the library. Subsequent visits show
them the same list, *unless* children were added, children were removed, or N
changed. In those cases, the LibraryContentBlock tries to make the smallest
possible adjustment to their personal list of blocks while respecting N and
the updated list of children.

This system has several issues:

#. **Missing defaults after import:** When a course with a LibraryContentBlock
is imported into an Open edX instance *without* the referenced library, the
blocks' *content* will remain intact as will course-local *settings
overrides*. However, any *default settings* defined in the library will be
missing. This can result in content that is completely broken, especially
since critical fields like video URLs and LTI URLs are considered
"settings". For a detailed scenario, see `LibraryContentBlock Curveball 1`_.

#. **Strange behavior when duplicating content:** Typically, when a
block is duplicated or copy-pasted, the new block's usage key and its
children's usage keys are randomly generated. However, recall that when a
LibraryContentBlock is updated, its children's usage keys are rederived
using a hash function. That would cause the children's usage keys to change,
thus destroying any student state. So, we must work around this with a hack:
upon duplicating or pasting a LibraryContentBlock, we immediately update the
LibraryContentBlock, thus discarding the problematic randomly-generated keys
in favor of hash-derived keys. This works, but:

* it involves weird code hacks,
* it unexpectedly discards any content overrides the course author made to
the copied LibraryContentBlock's children,
* it unexpectedly uses the latest version of library content, regardless of
which version the copied LibraryContentBlock was using, and
* it fails if the library does not exist on the Open edX instance, which
can happen if the course was imported from another instance.

#. **Conflation of reference and randomization:** The LibraryContentBlock does
two things: it connects courses to library content, and it shows users a
random subset of content. There is no reason that those two features need to
be coupled together. A course author may want to randomize course-defined
content, or they may want to randomize content from multiple different
libraries. Or, they may want to use content from libraries without
randomizing it at all. While it is feasible to support all these things in a
single XBlock, trying to do so led to a `very complicated XBlock concept`_
which difficult to explain to product managers and other engineers.

#. **Unpredictable preservation of overrides:** Recall that *content
definitions* and *settings* are handled differently. This distinction is
defined in the code: every authorable XBlock field is either defined with
`Scope.content` or `Scope.settings`. In theory, XBlock developers would use
the content scope for fields that are core to the meaning of piece of
content, and they would only use the settings scope for fields that would be
reasonable to configure in a local copy of the piece of content. In
practice, though, XBlock developers almost always use `Scope.settings`. The
result of this is that customizations to blocks *almost always* survive
through library updates, except when they don't. Course authors have no way
to know (or even guess) when their customizations they will and won't
survive updates.

#. **General pain and suffering:** The relationship between courses and V1
libraries is confusing to content authors, site admins, and developers
alike. The behaviors above toe the line between "quirks" and "known bugs",
and they are not all documented. Past attempts to improve the system have
`triggered series of bugs`_, some of which led to permanent loss of learner
state. In other cases, past Content Libraries improvement efforts have
slowed or completely stalled out in code review due to the overwhelming
amount of context and edge cases that must be understood to safely make any
changes.

.. _LibraryContentBlock Curveball 1: https://openedx.atlassian.net/wiki/spaces/COMM/pages/3966795804/Fun+with+LibraryContentBlock+export+import+and+duplication#Curveball-1%3A-Import%2FExport
.. _LibraryContentBlock Curveball 2: https://openedx.atlassian.net/wiki/spaces/COMM/pages/3966795804/Fun+with+LibraryContentBlock+export+import+and+duplication#Curveball-2:-Duplication
.. _very complicated XBlock concept: https://github.com/openedx/edx-platform/blob/master/xmodule/docs/decisions/0003-library-content-block-schema.rst
.. _triggered series of bugs: https://openedx.atlassian.net/wiki/spaces/COMM/pages/3858661405/Bugs+from+Content+Libraries+V1

We are keen to use the Library Relaunch project to address all of these
problems. So, V2 libraries will interop with courses using a completely
different data model.


Decision
********

We will create a framework where a *downstream* piece of content (e.g. a course
block) can be *linked* to an *upstream* piece of content (e.g., a library
block) with the following properties:

* **Portable:** Links can refer to certain content on the current Open edX
instance, and in the future they may be able to refer to content on other
Open edX instances or sites. Links will never include information that is
internal to a particular Open edX instance, such as foreign keys.

* **Flat:** The *link* is a not a wrapper (like the LibraryContentBlock),
but simply a piece of metadata directly on the downstream content which
points to the upstream content. We will no longer rely on precarious
hash-derived usage keys to establish connection to upstream blocks;
like any other block, an upstream-link blocks can be granted whatever block
ID that the authoring environment assigns it, whether random or
human-readable.

* **Forwards-compatible:** If downstream content is created in a course on
an Open edX site that supports upstream and downstreams (e.g., a Teak
instance), and then it is exported and imported into a site that doesn't
(e.g., a Quince instance), the downstream content will simply act like
regular course content.

* **Independent:** Upstream content and downstream content exist separately
from one another:

* Modifying upstream content does not affect any downstream content (unless a
sync happens, more on that later).
* Deleting upstream content does not impact its downstream content. By
corollary, pieces of downstream content can completely and correctly render
on Open edX instances that are missing their linked upstream content.
* (Preserving a positive feature of the V1 LibraryContentBlock) The link
persists through export-import and copy-paste, regardless of whether the
upstream content actually exists. A "broken" link to upstream content is
seamlessly "repaired" if the upstream content becomes available again.

* **Customizable:** On an OLX level, authors can still override the value
of any field for a piece of downstream content. However, we will empower
Studio to be more prescriptive about what authors *can* override versus what
they *should* override:

* We define a set of *customizable* fields, with platform-level defaults
like display_name and a max_attempts, plus the ability for external
XBlocks to opt their own fields into customizability.
* Studio may use this list to provide an interface for customizing
downstream blocks, separate from the usual "Edit" interface that would
permit them to make unsafe overrides.
* Furthermore, downstream content will record which fields the user has
customized...

* even if the customization is to simply clear the value of the fields...
* and even if the customization is made redundant in a future version of
the upstream content. For example, if max_attempts is customized from 3
to 5 in the downstream content, but the next version of the upstream
content also changes max_attempts to 5, the downstream would still
consider max_attempts to be customized. If the following version of the
upstream content again changed max_attempts to 6, the downstream would
retain max_attempts to be 5.

* Finally, the downstream content will locally save the upstream value of
customizable fields, allowing the author to *revert* back to them
regardless of whether the upstream content is actually available.

* **Synchronizable, without surprises:** Downstream content can be *synced*
with updates that have been made to its linked upstream. This means that the
latest available upstream content field values will entirely replace all of
the downstream field values, *except* those which were customized, as
described in the previous item.

* **Concrete, but flexible:** The internal implementation of upstream-downstream
syncing will assume that:

* upstream content belongs to a V2 content library,
* downstream content belongs to a course on the same instance, and
* the link is the stringified usage key of the upstream library content.

This will allow us to keep the implementation straightforward. However, we
will *not* expose these assumptions in the Python APIs, the HTTP APIs, or in
the persisted fields, allowing us in the future to generalize to other
upstreams (such as externally-hosted libraries) and other downstreams (such
as a standalone enrollable sequence without a course).

* **Decoupled:** Upstream-downstream linking is not tied up with any other
courseware feature; in particular, it is unrelated to content randomization.
Randomized library content will be supported, but it will be a *synthesis* of
two features: (1) a RandomizationBlock that randomly selects a subset of its
children, where (2) some or all of those children are linked to upstream
blocks.

Consequences
************

We will implement the core logic in a new module
``cms/lib/xblock/upstream_sync.py``. Here is a truncated excerpt from that new
module, focusing on the new HTTP and Python APIs which will be added:

.. code-block:: python
###########################################################################
# cms/lib/xblock/upstream_sync.py
###########################################################################
class UpstreamSyncMixin(XBlockMixin):
"""
Allows an XBlock in the CMS to be associated & synced with an upstream.
Mixed into CMS's XBLOCK_MIXINS, but not LMS's.
"""
# Metadata related to upstream synchronization
upstream = String(
help=("""
The usage key of a block (generally within a content library)
which serves as a source of upstream updates for this block,
or None if there is no such upstream. Please note: It is valid
for this field to hold a usage key for an upstream block
that does not exist (or does not *yet* exist) on this instance,
particularly if this downstream block was imported from a
different instance.
"""),
default=None, scope=Scope.settings, hidden=True, enforce_type=True
)
upstream_version = Integer(
help=("""
Record of the upstream block's version number at the time this
block was created from it. If upstream_version is smaller
than the upstream block's latest version, then the user will be
able to sync updates into this downstream block.
"""),
default=None, scope=Scope.settings, hidden=True, enforce_type=True,
)
downstream_customized = List(
help=("""
Names of the fields which have values set on the upstream
block yet have been explicitly overridden on this downstream
block. Unless explicitly cleared by the user, these
customizations will persist even when updates are synced from
the upstream.
"""),
default=[], scope=Scope.settings, hidden=True, enforce_type=True,
)
# Store upstream defaults for customizable fields.
upstream_display_name = String(...)
upstream_max_attempts = List(...)
... # We will probably want to pre-define several more of these.
def get_upstream_field_names(cls) -> dict[str, str]:
"""
Mapping from each customizable field to field which stores its upstream default.
XBlocks outside of edx-platform can override this in order to set
up their own customizable fields.
"""
return {
"display_name": "upstream_display_name",
"max_attempts": "upstream_max_attempts",
}
def save(self, *args, **kwargs):
"""
Update `downstream_customized` when a customizable field is modified.
Uses `get_upstream_field_names` keys as the list of fields that are
customizable.
"""
...
@XBlock.handler
def sync_updates(self, request: Request, suffix=None) -> Response:
"""
POST handler which applies updates from the upstream block.
No request/response data.
Responses:
200: Success
400: Upstream is missing or invalid
405: Method other than POST
"""
...
def sync_from_upstream(self, *, user: User, apply_updates: bool) -> None:
"""
Python API for loading updates from upstream block.
Can choose whether or not to actually apply those updates...
apply_updates=False: Think "get fetch".
Use case: course import.
apply_updates=True: Think "git pull".
Use case: sync_updates handler.
Raises: InvalidKeyError, UnsupportedUpstreamKeyType, XBlockNotFoundError
"""
...
def get_upstream_info(self) -> UpstreamInfo | None:
"""
Python API for upstream metadata, or None.
Raises: InvalidKeyError, XBlockNotFoundError
"""
...
@dataclass(frozen=True)
class UpstreamInfo:
"""
Metadata about a block's relationship with an upstream.
"""
usage_key: UsageKey
current_version: int
latest_version: int | None
sync_url: str
error: str | None
@property
def sync_available(self) -> bool:
"""
Should the user be prompted to sync this block with upstream?
"""
return (
self.latest_version
and self.current_version < self.latest_version
and not self.error
)
def validate_upstream_key(usage_key: UsageKey | str) -> UsageKey:
"""
Raise an error if the provided key is not a valid upstream reference.
Instead of explicitly checking whether a key is a LibraryLocatorV2,
callers should validate using this function, and use an `except` clause
to handle the case where the key is not a valid upstream.
Raises: InvalidKeyError, UnsupportedUpstreamKeyType
"""
...
Here is what the OLX for a library-sourced Problem XBlock in a course might
look like:

.. code-block:: xml
<problem
display_name="A title that has been customized in the course"
max_attempts="2"
upstream="lb:myorg:mylib:problem:p1"
upstream_version="12"
downstream_customized="[&quot;display_name&quot;,&quot;max_attempts&quot;]"
upstream_display_name="The title that was defined in the library block"
upstream_max_attempts="3"
>
<!-- problem content would go here -->
</problem>
Loading

0 comments on commit d214f8f

Please sign in to comment.